In this post, we will learn to scrape Google organic search results using Node JS.
Before we start, we will install these packages, which we will use further in the tutorial:
- Node JS
- Unirest JS — To get our raw HTML data.
- Cheerio JS — To parse the HTML data.
Before starting, set up your Node JS project directory, as we will work on Node JS in this tutorial.
Then, install both the packages, Unirest JS and Cheerio JS, from the above link.
All the things which we need for our scraper had been set up. To extract our raw HTML data, we will make a GET request on our target URL by using an NPM library – Unirest JS to get our HTML data. Then we will use another NPM library – Cheerio JS for parsing our extracted HTML data.
Then we will search for the HTML tags for the respected titles, links, snippets, and displayed_links.
So, from the above image, we get the tag for the title as
.yuRUbf > a > h3 , for the link as
.yuRUbf > a , for the snippet as
.g .VwiC3b and for the displayed_link as
.g .yuRUbf .NJjxre .tjvczx.
Here is our code:
Note: If you want to use a proxy server while making a GET request, then you can do this with Unirest JS like this:
Here “your proxy” refers to the URL of your proxy server, which we will use for making requests to the target URL. The advantage of using the proxy server is that it will hide your IP address which means that the website you are scraping will not be able to identify your IP address while making the request. This way, you can save your IP from being blocked by Google.
This tutorial taught us to scrape the Google Organic Search Results using Node JS. If you have questions about the scraping process, feel free to ask me anything. Thanks for reading!