Categories
APIs General Python

Climate Mentions in the News? Shockingly Low

I don’t know about you, but as someone who lives on planet Earth, I believe that the climate crisis is an issue that needs to be solved. Personally, I would like to solve it through building more renewables as I outline on my climate blog. Since the climate crisis has been a pressing issue for the last, oh I don’t know, 50 years? 60 years? I’d expect that there’d be a good amount of news about climate related topics. I didn’t know, but I decided to find out.

To do this, I used the New York Times Archive API and pulled the headlines of articles from 2008 to 2021. For a detailed explanation of how to use the New York Times Archive API to download archived news headlines, please see How to Download Archived News HeadlinesFor those of you uninterested in the code, skip directly to the findings.

Checking Titles for Mentions of Climate

At this point, I’m going to assume that you’ve already downloaded all the data from the NY Times archive API into JSON format as I outlined in the article above. The first thing we’ll do for our project to track mentions of climate in the news over time is create a function that will extract the headlines for each month. To get started we’ll have to import the json library and the month_dict we created in the link on how to download archived titles above. It’s a relatively simple dictionary with the month number as the key and the month name as the value.

import json
from archive import month_dict

Now let’s make a function that checks each headline for mentions of the climate. This function will take two parameters, the year and the month that we’re interested in. The first thing we’ll do is open up our file and store the JSON information into a dictionary titled entries. We’ll enclose this in a try/except block just in case the file doesn’t exist. Next, we’ll save the length of our entries as total_headlines which represents the total number of news headlines in that month. From here we’ll start off our count, which I’m storing in a variable called cc for “climate count”, at 0. As we loop through each headline in the entry, if the headline contains the word “climate” we’ll increment cc by one. At the end, we’ll return a tuple of the total number of headlines and the count of the number of headlines that contain the word “climate”.

# checks headlines for climate change
def cc_finder(year, month):
    filename = f"{year}/{month_dict[month]}.json"
    try:
        with open(filename, "r") as f:
            entries = json.load(f)
    except:
        print("No such file")
        return
    # get every headline
    # check if it has climate change in it
    total_headlines = len(entries)
    # print(total_headlines)
    cc = 0
    for entry in entries:
        headline = entry['headline']['main']
        if "climate" in headline.lower():
            cc += 1
    # print(cc)
    return (total_headlines, cc)

Alright now that we’ve created a function to return the number of articles per month and the number of article headlines containing the word “climate” over time, let’s graph our findings. By the way, it’s not necessary to return this as a tuple, we could also just return the proportion, but I’ve chosen to return it as a tuple because I’d like to actually see the numerical comparisons as well. Now let’s take a look at how we can graph our findings.

Create the Function to Graph How Often Climate is Mentioned in the News

We’ll define a function, graph_cc that stands for “graph climate count”. I graph three figures here, but if you only care about the first one, simply replace the ratio.show() line with plt.show() and feel free to delete the rest of the code. I’ve already downloaded all the data for the years 2008 through November (so far) of 2021. Since we’re only a few days into November let’s keep in mind that the article counts for November will be low. Also, since it is around COP26 right now, we should expect a higher ratio and count for climate articles, and we’ll come back to this again in a few months to investigate if the numbers are artificially inflated or not.

In our function we’ll start by creating a list of years that will contain the years 2008 through 2022. I’ve defined four lists, xs is the x values that represent months since January 2008 (starting at 0), ys is the y values that are in ratio form, ys_total is the y values for total number of articles in a month, and ys_cc is the values for total number of articles containing the word ‘climate’ in that month. We also have a variable called months_since_2008 that we increment every time we loop through a month. Then we set up a nested for loop to loop through each month for all the years we defined in our year list and get all the x and y values we need. Finally, we plot each of our findings. Notice that I end with an input() statement, that’s to keep the program running long enough to actually see the plots.

def graph_cc():
    years = list(range(2008, 2022))
    xs = []
    ys = []
    ys_total = []
    ys_cc = []
    months_since_2008 = 0
    for year in years:
        for i in range(1,13):
            if year == 2021 and i > 11:
                continue
            total, cc = cc_finder(year, i)
            ratio = cc/total
            xs.append(months_since_2008)
            months_since_2008 += 1
            ys.append(ratio)
            ys_total.append(total)
            ys_cc.append(cc)
       
    ratio = plt.figure(1)
    plt.plot(xs, ys)
    plt.xlabel("Months since January 2008")
    plt.ylabel("Proportion of News Headlines about Climate")
    plt.title("Climate in the News Over Time")
    plt.show()
 
    total = plt.figure(2)
    plt.plot(xs, ys_total)
    plt.xlabel("Months since January 2008")
    plt.ylabel("Total Number of Articles per Month")
    plt.title("Number of NY Times Articles per Month over Time")
    total.show()
 
    cc = plt.figure(3)
    plt.plot(xs, ys_cc)
    plt.xlabel("Months since January 2008")
    plt.ylabel("Total Number of Articles Mentioning Climate")
    plt.title("Number of NY Times Articles Mentioning Climate per Month over Time")
    cc.show()
 
    input()

Climate In the News Over Time: Graphed Findings

Once we run this we should see the following plot: (I’ll leave the other two to the appendix)

This is both disheartening and quite interesting. We can see that climate has gotten almost no mentions in the news since 2008 up until literally October of 2021. The average is around 0.004 over this time when we include the last couple months. That’s INSANE! That means, on average, less than half a percent of news (NY Times anyway) articles in the last 13 years have mentioned climate. THIS IS THE MOST IMPORTANT ISSUE OF OUR GENERATION! I’ve included the last two graphs after this, but like WHAT IN THE WORLD?? HOW? WHY? We need to focus more on climate change and how to fight it, and for that, we’ll need the media’s help.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Appendix (the other two images)

These other two images are kind of interesting too, but I wanted to wrap up after showing the ratio of climate news to total news because that’s CRAZY to me. The number of NY Times news articles per month have been trending down over time, who knew? Also, there was a weird dip between 2010 and 2012, I wonder why? I’ll have to do some snooping to find out. One slightly positive note is that it does look like even though the total number of articles per month have been trending down, the number of climate articles have remained relatively consistent and are even currently trending up!