Categories
APIs NLP The Text API

Ask NLP: What’s Going On in the News? (October 2021)

If you’ve been following along you already know that we’ve been doing a LOT of exploration of the news, especially the New York Times. This is because the New York Times provides easy access to their news headlines, I’m curious about what’s going on in the news since I don’t actually read it, and I’m also a Natural Language Processing FANATIC. The nice thing about news articles is that they’re all written so I can easily use a comprehensive NLP API like The Text API and extract tons of information through text processing and transformation. So far we’ve taken a look at whether or not COVID has made news headlines more negative, and the shockingly low proportion of climate being mentioned in the news over the last 13 years. Click here for the results.

Now that we’ve seen how text polarity and general computational techniques can be used to extract more information from the news, let’s check out what we can do with some text transformation. Specifically, I’m going to be using the summarization endpoint from The Text API to get a short summary of the news headlines for last month, October 2021. Click here to skip to the results.

Summarizing the News Headlines

Building your own summarizer isn’t a simple process, luckily for us, there’s already an online API that can summarize our text for us. Sign up for your free API key at The Text API. Scroll all the way down the page until you see the “Get Your Free API Key” button and click that to sign up.

Once you log in your API key will be front and center at the top of the page. Simply copy that and keep it somewhere safe. I keep mine in a `config.py` file. The only library we’ll need to install is the `requests` library which you can do like so in the command line:

pip install requests

Now that we’re fully set up for this project, let’s get into the code. First we’ll import our libraries and our API key. I also imported a `month_dict` which is just a dictionary that maps month numbers to their names.

import json
import requests
 
from archive import month_dict
from config import thetextapikey

The `month_dict` object looks like the code below. You can also find it in the article about downloading archived news headlines.

month_dict = {
    1: "January",
    2: "February",
    3: "March",
    4: "April",
    5: "May",
    6: "June",
    7: "July",
    8: "August",
    9: "September",
    10: "October",
    11: "November",
    12: "December"
}

In a moment we’re going to set up our API request, but first let’s set up a method to open up any of the JSON files we downloaded earlier. All we need to do is take a year and a month, look for the expected filename, open it and return it. We’ll enclose looking for and loading in the file in a try/except just in case the filename doesn’t exist.

def get_doc(year, month):
    filename = f"{year}/{month_dict[month]}.json"
    try:
        with open(filename, "r") as f:
            entries = json.load(f)
        return entries
    except:
        raise NameError("No Such File")

Setting up the API Request

Now let’s set up our API request. We’ll need a headers object with an `apikey` keyword set to the value of the API key we copied and stored in our config file earlier. The headers will also need to declare our content type. We also need to set up a URL endpoint to hit. I have split the URL endpoint setup into two lines because the `text_url` object is the prefix for all the text processing URLs and, for me, this file includes hitting a bunch of other endpoints too. I’m reusing it from the other NY Times analysis I’ve already posted about.

headers = {
    "Content-Type": "application/json",
    "apikey": thetextapikey
}
text_url = "https://app.thetextapi.com/text/"
summarizer_url = text_url+"summarize"

Once we’re set up to hit the endpoint, we’ll make a function that takes a month and a year as a parameter that will send off requests to the summarization endpoint to summarize our headlines. The first thing we’ll need to do is call that `get_doc` function we made earlier to open our JSON documents. Now you’ll notice that I made 4 strings for headlines here. That’s because I checked the number of headlines (not shown) and I saw that there were over 3000 headlines and that was almost 1 MB of data. That’s actually a pretty big request, and I opted to split it up into smaller chunks just for faster processing. You can try to send in the whole thing, but you may run into a network error because most terminals won’t keep the connection open that long and The Text API uses asynchronous calls.

Summarizing the Headlines

Once we split our headlines into four sets, we simply call the API four times with each set of headlines and we’ll get back four summaries. I’ve included a `proportion` parameter in my request body, this simply tells the API what proportion of the text we want to keep. I’ve decided that I only want to see a very small percentage of the total headlines since there are over 3000, in this case, we’ll keep 0.0025 or 2.5%. You can opt to print out the summaries here, but I’ve decided to save them to a JSON for further processing.

def summarize_headlines(year, month):
    entries = get_doc(year, month)
    headlines1 = ""
    headlines2 = ""
    headlines3 = ""
    headlines4 = ""
    for index, entry in enumerate(entries):
        headline = entry['headline']['main']
        headline = headline.replace('.', '')
        if index < 800:
            headlines1 += headline + ". "
        elif index < 1600:
            headlines2 += headline + ". "
        elif index < 2400:
            headlines3 += headline + ". "
        else:
            headlines4 += headline + ". "
    summaries = []
    for headlines in [headlines1, headlines2, headlines3, headlines4]:
        body = {
            "text": headlines,
            "proportion": 0.025
        }
        res = requests.post(summarizer_url, headers=headers, json=body)
        _dict = json.loads(res.text)
        summaries.append(_dict["summary"])
    with open(f"{year}/{month_dict[month]}_Summary.json", "w") as f:
        json.dump(summaries, f)

That’s all there is to getting a summary of our headlines, now let’s take a look at them.

Exploring Summarized News Headlines

If you simply printed out your text earlier, you can skip this next section about code. If not, let’s write some code that will load our JSON and print out the summarized headlines. Similarly to the `get_doc` function we made earlier, we’ll start out by opening up the document. For an initial exploration, we’ll simply print out our summarized headlines. I did a small replacement on our exact text because I noticed some headlines had an extra space and period after them

# now let's explore our summarized content
def explore_summarized(year, month):
    filename = f"{year}/{month_dict[month]}_Summary.json"
    try:
        with open(filename, "r") as f:
            entries = json.load(f)
    except:
        raise NameError("No Such File")
   
    for entry in entries:
        new = entry.replace(" .", "")
        print(new)
 
 
explore_summarized(2021, 10)

We should see a result that looks something like:

What They Saw in Ozy. Acknowledging the Missing and Those Who Try to Find Them. Facebook Is Weaker Than We Knew. ‘Profit Over Safety’. Show of Love. Have You Had a Job Recently? Sure, if You’re Careful. Going Down. ‘Is it Fair? No, It’s Not Fair Is It Fun? Absolutely’. You Got Lost and Had to Be Rescued Should You Pay? The Facebook Whistle-Blower Testifies. Watch This Next. How to Fix Facebook. ‘We All Know Where We Came From’. The Road Back: ‘How Am I Ever Going to Dance Again?’. Feeling Anxious? We’re Smarter About Facebook Now. Trump May Run in 2024 So Might They It’s Getting Awkward.

More Than Discouraged. Variety: Acrostic. The Hot New Back-to-School Accessory? Review: ‘What Have We Done With Democracy?’ Review: Put Off Until Later. Fight or Flight. Here’s What You’re Not Missing. How to Play Drunk. It’s Never Too Late to Fall in Love. ‘Are We Human?’ Blue Origin isn’t saying. Altruist or Schemer? Why Do You Tattoo? He Was Suddenly Sick and Shaking Violently What Was Going On? Traveling Alone, in Groups. Everything Is Getting More Expensive. Salt It Like You Mean It.

Walk It Off, Again: Atlanta Widens Lead Over Dodgers. Aack! Looking for a Star? Hello? : Don’t Wait Too Long. What are you leaving behind after the pandemic? So It Goes. October. 1971: Help! ‘We Did It Before’. ‘I Only Wish I Had Met Her Sooner’. It Just Got a Lot More Difficult. To Get Ahead at Work, Lawyers Find It Helps to Actually Be at Work. Why Is Everyone Else Quitting? Biden Said the US Would Protect Taiwan But 

It’s Not That Clear-Cut. Not Much. Do You Like Horror? It’s Time for COP26 Here’s Where We Stand. How Low Can You Go (Sofa-wise)? Haven’t I Seen You Somewhere? I’ve, Uh, Been Exposed. Facebook Renames Itself Meta. Maybe Not So Fast. Keep or replace? You Bet. Biden: What is the G20? It Has Two. Homes That Sold for Around $650,000. Saying Yes to Baseball Meant Leaving Football Behind

What do the results tell us?

Okay let’s start by doing a visual exploration of our results. It looks like there were quite a few articles on Facebook at the beginning and end of the month. This makes sense as Mark Zuckerberg has been testifying in Congress for the last like year. Facebook also announced a transition to becoming “Meta” recently – and there’s a whole article headline about this. It also looks like there’s quite a bit of sports news on football and baseball. Unfortunately, I’m not well versed in sports so I don’t really know what’s going on, but I do know that the NFL season has started and there’s some sort of series going on in the MLB so it doesn’t seem that surprising.

There’s some pretty funny headlines we go in here too:

  • “To Get Ahead at Work, Lawyers Find It Helps to Actually Be at Work”
    • Duh, thank you very much lawyers
  • “How to Play Drunk”
    • What are we playing? I don’t know, but we’ll be drunk for it
  • “Here’s What You’re Not Missing”
    • Why? Why do you want to tell people what they’re not missing? What’s that even mean???

As a final comment I can’t believe there’s literally NO MENTIONS about the climate that made it into the summary (which means there weren’t very many mentions of climate period) despite the fact that climate change is literally making the planet unlivable for us. Sure there’s a headline about COP26, but it’s just an article about how someone feels about it … Pretty disappointing to be honest. To learn more, feel free to reach out to me @yujian_tang on Twitter, follow the blog, or join our Discord.

that’s all folks!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly