Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (2024)

Introduction

Scraping data can be tedious, especially when dealing with dynamic content. However, it becomes a lot more convenient if you can instantly preview the data scraped in the UI. ToolJet, combined with Scraper API, enables exactly that. This tutorial shows how to set up a script to scrape data using ToolJet and display the results in real-time.

If you've worked with web scraping using Google Colab or tools like Selenium, you know the challenges. Here, we'll take a different approach using JavaScript web scraping, utilizing ToolJet’s visual app builder to manage the data flow.

Prerequisites:

  • ToolJet(https://github.com/ToolJet/ToolJet) : An open-source , low-code business application builder.Sign upfor a free ToolJet cloud account orrun ToolJet on your local machineusing Docker.
  • Basic knowledge ofJavaScript.

Begin by creating an application namedTJ Web Scraper.

Step 1: Designing the UI to Display the Scraped Data

Let's use ToolJet's visual app builder to quickly design the UI.

  • Drag and drop a Container component onto the canvas.
  • Inside the container, place an Icon component on the left for the logo.
  • Place a Text component next to it and enter "TJ Web Scraper" under its Data property.
  • Place another Text component on the right to display the total count of scraped products.
Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (1)

We are using blue (hex code: #075ab9) as the primary color for this application; change the color scheme of all the components accordingly.

  • Add a Table component below the header and a Button component on the bottom-right of the canvas.
Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (2)

With that, our UI is ready in just a couple of minutes.

Step 2: Writing the JavaScript Code to Scrape Data

In this step, we will utilize the Scraper API to scrape the data from a sample eCommerce website. Here's a preview of the website:

Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (3)

The website has a few products with their images, titles, and prices. Additionally, it has a load more button to dynamically load more content.

  • To begin, expand the Query Panel at the bottom, and click on the Add button to create a new Run JavaScript code query. Rename the query to scrapeData.
  • Set up the main scraping function: Begin by creating therunMainScript()function that will coordinate all the logic needed for scraping products.
function runMainScript() {const API_KEY = 'SCRAPER_API_KEY'; //Main logic goes here//.....}
  • Create a request helper: Build a helper functionmakeRequest()usingaxiosto handle API requests, manage responses, and deal with errors efficiently.
async function makeRequest(url) { try { const response = await axios.get('https://api.scraperapi.com/', { params: { api_key: API_KEY, url: url } }); return response.data; } catch (error) { if (error.response && error.response.status === 404) { throw new Error('404NotFound'); } console.error(`Error making request to ${url}: ${error}`); return null; }}
  • Extract product details: Define theparseProducts()function to gather relevant information (like title, price, and image) from the HTML content, filtering out incomplete data. This function uses the HTML selectors tailored to the target website.
function parseProducts(html) { const parser = new DOMParser(); const doc = parser.parseFromString(html, 'text/html'); const items = doc.querySelectorAll('.product-item'); return Array.from(items).map(item => ({ title: item.querySelector('.product-name')?.textContent.trim() || '', price: item.querySelector('.product-price')?.textContent.trim() || '', image: item.querySelector('img')?.src || 'N/A', url: item.querySelector('a')?.href || null })).filter(item => item.title && item.price);}
  • Handle dynamic loading: ThefetchProducts()function manages the initial page load and any additional AJAX requests, collecting all available products. It saves the total count in a ToolJet variable.
async function fetchProducts(pageUrl, ajaxUrl) { let products = []; let offset = 0; const initialPageHtml = await makeRequest(pageUrl); if (!initialPageHtml) return products; products = products.concat(parseProducts(initialPageHtml)); while (true) { const ajaxHtml = await makeRequest(`${ajaxUrl}?offset=${offset}`); if (!ajaxHtml) break; const newProducts = parseProducts(ajaxHtml); if (newProducts.length === 0) break; products = products.concat(newProducts); offset += 12; console.log(`Scraped ${products.length} products so far...`); //Save the length of the total products fetched in a ToolJet variable actions.setVariable('totalProductsScraped', products.length); } return products;}
  • Launch the scraping process: Implement thescrapeProducts()function to trigger the scraping and output the final count of products collected.
async function scrapeProducts() { const pageUrl = "https://www.scrapingcourse.com/button-click"; const ajaxUrl = "https://www.scrapingcourse.com/ajax/products"; let products = await fetchProducts(pageUrl, ajaxUrl); console.log(`\nTotal products scraped: ${products.length}`); return products;}
  • Run the script and handle results: Execute the scraping process, save the data, and log a sample of the products for review.
scrapeProducts().then(products => { //Save all the products fetched from the eCommerce website actions.setVariable('scrapedProducts', products); console.log("\nScraped products stored in 'scrapedProducts' variable."); console.log(`Total products: ${products.length}`); console.log("\nSample of scraped products:"); products.slice(0, 5).forEach(product => { console.log(`Title: ${product.title}`); console.log(`Price: ${product.price}`); console.log(`Image: ${product.image}`); console.log(`URL: ${product.url}`); console.log("---"); });}).catch(error => { actions.setVariable('scrapingError', error.message); console.error("An error occurred:", error);});
  • Finally, invoke therunMainScript()function to start the entire process.
function runMainScript() {//Main logic//.....}runMainScript();

The code to scrape the data is ready. The above code for web scraping will have to be updated if you are using a different target website.

Click on the Run button on the Query Panel and check all the logs that will appear in the browser console.

Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (4)

Step 3: Displaying the Scraped Data

With the UI and code set up, we can now focus on displaying the data on the Table component and triggering the code based on Button click.

  • Select the Button component, navigate to its properties and create a new event handler.
  • Select On click as the event, Run Query as the action, and scrapeData as the query.
Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (5)
  • Select the Table component, and under its Data property, enter {{variables.scrapedProducts}}.
  • Select the Text component in the header that is created to display the total count of the products that are scraped. Enter the {{"Total Products Scraped: " + variables.totalProductsScraped || 0 }} code under its Data property.

We've successfully linked the components with the query. Now, just click the Button component and watch as the data is scraped and displayed in the Table component.

Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (6)

Conclusion

Scraping data effectively requires overcoming challenges like dynamic content and pagination, especially when dealing with AJAX-loaded pages. Using ToolJet combined with Scraper API, you can simplify this process and gain the ability to instantly preview and manage your scraped data through a clean UI.

Unlike traditional approaches like Selenium web scraping or using Google Colab, this method integrates JavaScript web scraping seamlessly into your workflow with real-time visibility of your data. Building on this foundation, you can scale the tool to handle more complex scraping needs while maintaining an intuitive interface.

To learn more, check out ToolJet's official documentation or connect on Slack with any questions.

Build an Advanced Web Scraping Tool Using ToolJet and Scraper API (2024)
Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 6726

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.