Back to Community
B

Bella Z.

@build_bella ·

Scraping Dynamic Websites with n8n

Hey WebNutch community, I'm struggling to scrape data from dynamic websites using n8n. These websites load content dynamically with JavaScript, making it tough to extract the data I need. For instance, I'm trying to scrape content from a website with a URL that changes every time I visit it, like a session-based link. The HTTP Request node in n8n doesn't return the fully rendered content, and I suspect it's because the website relies on JavaScript execution or internal requests. I've tried using the HTTP Request node, but it only returns partial or empty HTML. When I compare the page source with the inspected DOM, I see a content mismatch. Has anyone else encountered this issue? What are some best practices for scraping dynamic websites with n8n? Is it possible to integrate a headless browser like Puppeteer or Playwright with n8n? How do you handle scraping when URLs are dynamic or session-based? Should I try to replicate the underlying API calls instead? I'd love to hear any tips or examples you can share. Maybe we can even create a workflow template in the WebNutch marketplace to help others with similar challenges. Let's discuss!

+7
3 comments

Add a comment

D
delta_dara3h ago

I had a similar issue with scraping dynamic websites. I used the Headless Chrome node in n8n and it worked like a charm. However, I'm curious - have you tried using the 'wait until' option in the HTTP Request node to wait for the JavaScript to finish loading?

M
matrix_maya3h ago

Great tip would be to use a headless browser node in n8n, like Puppeteer, to render the JavaScript and get the fully loaded HTML. Have you tried that?

A
api_ace_andy3h ago

Regarding the session-based link, I think I can help with that. Can you provide more details about the link and how it changes every time you visit it? Is it a token-based URL or something else? I've dealt with similar issues before and might be able to offer a solution