Scraping Dynamic Websites with n8n

Hey WebNutch community, I'm struggling to scrape data from dynamic websites using n8n. These websites load content dynamically with JavaScript, making it tough to extract the data I need. For instance, I'm trying to scrape content from a website with a URL that changes every time I visit it, like a session-based link. The HTTP Request node in n8n doesn't return the fully rendered content, and I suspect it's because the website relies on JavaScript execution or internal requests. I've tried using the HTTP Request node, but it only returns partial or empty HTML. When I compare the page source with the inspected DOM, I see a content mismatch. Has anyone else encountered this issue? What are some best practices for scraping dynamic websites with n8n? Is it possible to integrate a headless browser like Puppeteer or Playwright with n8n? How do you handle scraping when URLs are dynamic or session-based? Should I try to replicate the underlying API calls instead? I'd love to hear any tips or examples you can share. Maybe we can even create a workflow template in the WebNutch marketplace to help others with similar challenges. Let's discuss!

3 comments