Summarized by AI: November 2021 in Headlines

In early November, we covered October in Headlines. Now that it’s December, let’s see what November was like in the NY Times headlines. To learn how to do this yourself, you can check out How to Build a Text Summarizer in Python. First, I’m going to show you the actual headlines the AI summarizer picked. Click here to skip down to the tutorial, and click here for the source code on GitHub.

Summarized Headlines

“Where and how to vote in New York City. Is it too late to register to vote? Eric Adams Is Elected Mayor of New York City. Did Covid Change How We Dream?. ‘Everything About You Must Say Power’. How We Hug Now. Do Generations Matter? Well, Maybe. Reading Around New York. When Art and Music Collide. 262 at 50. Saucy and Cheesy. Huma Abedin Talks About ‘Both/And’. What to Expect at Work When You’re Expecting. ‘Much Obliged!’ Are You Anxious, Avoidant or Secure? Can It Be Both? This and That. Everyone.”,

“Co – Work and Home. ‘How Did We Let People Die This Way?’. To testify or not? It Could Be Yours. Here’s What You Need to Know. Veterans Day. Post Less, Chat More. Will Real Estate Ever Be Normal Again?. Mocking, Maybe. US Beats Mexico and Then Rubs It In. What does COP stand for?. It Could Be Yours’. Planning for Surgery? Can’t Say Who. . .”,

“\”I Quit\”‘. Just How Bad Is It Out There for Democrats?. He Cared About Me, So I Broke Up With Him. 1946: Is iBuying Here to Stay? Here’s What It Looked Like. It Really Would Help if People Learned to Email. ‘When Are You Getting Married?’ To Breed or Not to Breed?. How to Get Past ‘Did Not Finish’. That’s Just How They Roll. It’s All Around You. How to Stand Correctly. ‘Once We’re Gone, We’re Not Coming Back’. Open or closed on Thanksgiving? ‘I Can’t Just Quit’. More Than Right. “,  

“When No 1 Is a Lot Better Than No 2. How Do You Solve Spelling Bee?. What Does It Mean to ‘Yassify’ Anything? Thanksgiving, This Year vs Last. Why Shouldn’t Housing for the Homeless Be Beautiful?. Getting It Right the Second Time Around. Giving Thanks in Australia. What Can I Do? | Homes That Sold for Around $750,000. Scientists Are Racing to Find Out. It’s Not Just Old and Complex. Omicron: What Is Known — and Still Unknown. How do you say ‘Omicron’?”

NY Times November 2021

This post was created using the NY Times Archive API and The Text API. Go to those two links to get your free API keys to follow along with the tutorial. To get the NY Times Headlines from their Archive API you’ll need your API key and the requests library. We’ve already covered how to pull the Archived Headlines so I’m going to skip that part here and go directly into how we can use The Text API to create a summary. By this point, you should already have the requests library installed.

Set Up the HTTP Requests

We’re going to need to send an HTTP request to do this so we’ll set that up first. As always, we first need to import the libraries we’ll be using. In this case we’ll be using json to parse the JSON response, requests to send the request and get the JSON response, and sys to pull our api key from the parent folder. I also imported the month_dict item from the archive file. You can see that object and file in the post about how to pull archived news headlines above. Note that before we can import from the config, we have to append a ../.. to the system path so that the program can see the file we’re importing from.

After importing our libraries and other objects, we simply build the headers and set up the URL. The headers will tell the server that we’re sending a content type of JSON, and pass the API key. The text_url variable is the base URL for all the text processing needs from The Text API. We’ll set up the specific summarizer URL later. We set this up here as an extensible base URL in case we also want to do other analysis on our headlines.

import json
import requests
import sys
 
from archive import month_dict
sys.path.append("../..")
from nyt.config import thetextapikey
 
headers = {
    "Content-Type": "application/json",
    "apikey": thetextapikey
}
text_url = "https://app.thetextapi.com/text/"

Load and Split Archived Headlines

Next we’ll build the function to load the document and grab the headlines. This function will take two parameters – the year and the month. It will then open up the corresponding JSON file which we saved when we downloaded the archived data earlier. After loading the files, it will return the loaded responses. If the file doesn’t exist, it will throw an error.

# load a document
def get_doc(year, month):
    filename = f"{year}/{month_dict[month]}.json"
    try:
        with open(filename, "r") as f:
            entries = json.load(f)
        return entries
    except:
        raise NameError("No Such File")

Now we want to split the headlines because if we don’t split the headlines we’ll get a server timeout. It is not a trivial task to run an AI summarization, so we expect it to take some time. In order to not have our connection timeout while the server runs, we’ll split our headlines into four sets. These sets will also correspond quite well to the four-ish weeks of the month because they’re in chronological order.

# split headlines because 4000 headlines is way too many to handle at once
# will cause a timeout and the socket connect to close
def split_headlines(entries):
    headlines1 = ""
    headlines2 = ""
    headlines3 = ""
    headlines4 = ""
    for index, entry in enumerate(entries):
        headline = entry['headline']['main']
        headline = headline.replace('.', '')
        if index < 800:
            headlines1 += headline + ". "
        elif index < 1600:
            headlines2 += headline + ". "
        elif index < 2400:
            headlines3 += headline + ". "
        else:
            headlines4 += headline + ". "
    return [headlines1, headlines2, headlines3, headlines4]

Use AI to Summarize the Headlines

Now that we’ve set everything up, the only thing left to do is to call The Text API to get our summary of the headlines. This is where we use that base text_url we created earlier and append summarize to access the summarize endpoint. This function will also take two parameters – a year and a month. We’ll then use the get_doc and split_headlines function we created earlier to get the right headlines to pass to the summarizer. We’ll initialize an empty summaries list to hold the return summaries.

For each set of headlines, we’ll create a body to send to the summarizer endpoint. All we have to do is pass in the text and a proportion that we want. The default proportion is 30%, but I’m not trying to read 1200 headlines, so I chose 2.5%. You can adjust this to your liking. After we build the body, we just send the request and parse the response. Finally, we write our parsed response to a JSON file.

# summary
summarizer_url = text_url+"summarize"
def summarize_headlines(year, month):
    entries = get_doc(year, month)
    headlines_list = split_headlines(entries)
    summaries = []
    for headlines in headlines_list:
        body = {
            "text": headlines,
            "proportion": 0.025
        }
        res = requests.post(summarizer_url, headers=headers, json=body)
        _dict = json.loads(res.text)
        summaries.append(_dict["summary"])
    with open(f"{year}/{month_dict[month]}_Summary.json", "w") as f:
        json.dump(summaries, f)
 
summarize_headlines(2021, 11)

When we call our summarize_headlines function, we’ll get the JSON response written to a file and see the summarized headlines that were at the beginning of this post!

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang
Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

One thought on “Summarized by AI: November 2021 in Headlines

Leave a Reply

%d bloggers like this: