Contact Information

7154 N University Dr #95,
Tamarac, FL 33321, USA

We Are Available 24/ 7. Email Us.


In this post, we will learn to scrape Google organic search results using Node JS.

Requirements

Before we start, we will install these packages, which we will use further in the tutorial:

  1. Node JS
  2. Unirest JS — To get our raw HTML data.
  3. Cheerio JS — To parse the HTML data.

Before starting, set up your Node JS project directory, as we will work on Node JS in this tutorial.
Then, install both the packages, Unirest JS and Cheerio JS, from the above link.

Goal

We will target to scrape the Javascript Search Results on Google.

 

Procedure

All the things which we need for our scraper had been set up. To extract our raw HTML data, we will make a GET request on our target URL by using an NPM library – Unirest JS to get our HTML data. Then we will use another NPM library – Cheerio JS for parsing our extracted HTML data.

Then we will search for the HTML tags for the respected titles, links, snippets, and displayed_links.

So, from the above image, we get the tag for the title as .yuRUbf > a > h3 , for the link as .yuRUbf > a , for the snippet as .g .VwiC3b and for the displayed_link as .g .yuRUbf .NJjxre .tjvczx.

Here is our code:

const unirest = require("unirest");
const cheerio = require("cheerio");

const getOrganicData = () => {
  return unirest
    .get("https://www.google.com/search?q=javascript&gl=us&hl=en")
    .headers({
      "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
    })
    .then((response) => {
      let $ = cheerio.load(response.body);

      let titles = [];
      let links = [];
      let snippets = [];
      let displayedLinks = [];

      $(".yuRUbf > a > h3").each((i, el) => {
        titles[i] = $(el).text();
      });
      $(".yuRUbf > a").each((i, el) => {
        links[i] = $(el).attr("href");
      });
      $(".g .VwiC3b ").each((i, el) => {
        snippets[i] = $(el).text();
      });
      $(".g .yuRUbf .NJjxre .tjvcx").each((i, el) => {
        displayedLinks[i] = $(el).text();
      });

      const organicResults = [];

      for (let i = 0; i < titles.length; i++) {
        organicResults[i] = {
          title: titles[i],
          links: links[i],
          snippet: snippets[i],
          displayedLink: displayedLinks[i],
        };
      }
      console.log(organicResults)
    });
};

getOrganicData();

Note: If you want to use a proxy server while making a GET request, then you can do this with Unirest JS like this:

return unirest
    .get("https://www.google.com/search?q=javascript&gl=us&hl=en")
    .headers({
      "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
    })
    .proxy("your proxy")

Here “your proxy” refers to the URL of your proxy server, which we will use for making requests to the target URL. The advantage of using the proxy server is that it will hide your IP address which means that the website you are scraping will not be able to identify your IP address while making the request. This way, you can save your IP from being blocked by Google.

 

This tutorial taught us to scrape the Google Organic Search Results using Node JS. If you have questions about the scraping process, feel free to ask me anything. Thanks for reading!



Source link

Share:

administrator

Leave a Reply

Your email address will not be published. Required fields are marked *