Buy Negative Facebook Reviews

Buy Negative Facebook reviews are comments from critical customers who have received the services of a particular business and are not at all satisfied with the services received. At that time the…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




How did I scrape news article using Python ?

How can I extract all of them ?

Why am I writing this article ?
I really believe in ‘Give and Take’ ideology! In the past 4 months, I learned exponentially about Data Science from online sources like LinkedIn, Medium, Quora etc. Now, I wanted to share my little knowledge to the Data Science community as well and Medium is the best place to do so..!

Let’s deep dive into it ..!
For this program, I used:
1. JSON file - To read the news websites link
2. CSV file - To store the news articles

First Step : get all the required URLs
The first step was quite easy. I just had to get the URL link of the result page and tell my code to fetch the articles. I stored them into a JSON file.
Here is the snapshot of the JSON file:

So simple, isn’t it ? Once I did that, I had a JSON file with the input information. For each of them, I had:

Second Step: Reading a JSON file !

Packages/Libraries used in the below code*

#!pip install feedparser
#!pip install newspaper3k

import feedparser as fp
import json
import newspaper
from newspaper import Article
from time import mktime
from datetime import datetime
import csv

Reading JSON file

Limit variable is used to fetch as many no. of articles from a given link.

Third Step: Extracting News article !

RSS - It is a type of web feed which allows users to access updates to online content in a standardized, computer-readable format.
If a RSS link is provided in the JSON file, this will be the first choice. Reason for this is that, RSS feeds often give more consistent and correct data. If you do not want to scrape from the RSS-feed, just leave the RSS attribute empty in the JSON file.
Once I wrote the above code, now I had all the article’s information which includes:

Fourth Step: Patience, Baby…!

Credit: Giphy.com

Once, all the hard work is done, now is the time to relax. It depends on how many articles you want to fetch and what limit you have given for an article.

Fifth Step: And, Important Step !! Save all your hard work in a CSV file😄

Do you want to know the best part? I am pretty sure this logic is easily reusable for all the newspaper websites in the world!

Thanks for reading, this is my first story on Medium and I would be thrilled to know your opinion about it!

Add a comment

Related posts:

On COVID in the US

Relying on a vaccine to solve this is not unlike relying on Plan B, or abortion, instead of condoms to prevent having a child. But it is ok to take lives if it means you can fulfill your personal…

Humans for AI

I still remember Artificial Intelligence being offered as an elective subject during my postgrad in the early 90s — and, not many people signed up to study AI, as it was considered this mysterious…

The Four Levels of Communication

Have you ever been in a conversation with someone and felt a huge disconnect, as if you were talking on completely different levels? If so, the disconnect you felt was likely because you were…