Build Your Own AI Text Summarizer in Python

For this example, we’re going to build a naive extractive summarizer in 25 lines of Python. An extractive summary is a summary of a document that is directly extracted from the text. For more information on AI summaries, check out this article on What is AI Summarization and How Can I Use It? In this post, we’re going to go over how to build a simple extractive summarizer two ways. First with spaCy, then with The Text API. spaCy is one of the open source Python libraries for Natural Language Processing. The Text API is the best comprehensive sentiment analysis API online.

Build an AI Text Summarizer in Under 30 Lines of Python

Before we can get started with the code we’ll need to install spaCy and download a model. We can do this in the terminal with the following two commands. The “en_core_web_sm” model is the smallest model and the fastest to get started with. You can also download “en_core_web_md”, “en_core_web_lg”, and “en_core_web_trf” for other, larger English language models.

pip install spacy
python -m spacy download en_core_web_sm

Let’s get started with the code! First, we’ll import spacy and load up the language model we downloaded earlier.

import spacy
 
nlp = spacy.load("en_core_web_sm")

For this tutorial, we’ll be building a simple extractive summarizer based purely on the words in the text and how often they’re mentioned. We’re going to break down this summarizer into a few simple steps. First we’re going to create a word dictionary to keep track of word count. Then we’re going to score each sentence based on how often each word in that sentence appears. After that, we’re going to sort the sentences based on their score. Finally, we’ll take the top three scoring sentences and return them in the same order they originally appeared in the text.

Before we get into all that let’s load up our text and turn it into a spaCy Document. You can use whatever text you want. The text provided is just an example that talks about me and this blog.

# extractive summary by word count
text = """This is an example text. We will use seven sentences and we will return 3. This blog is written by Yujian Tang. Yujian is the best software content creator. This is a software content blog focused on Python, your software career, and Machine Learning. Yujian's favorite ML subcategory is Natural Language Processing. This is the end of our example."""
# tokenize
doc = nlp(text)

Getting All the Word Counts

Now that we have our text in Doc form, we can get all our word counts. You can actually do this before by splitting the string on spaces, but this is easier and we’ll need the Doc again later anyway. First let’s create a word dictionary. Then we’ll loop through the text and check if each word is in the dictionary. If the word is in the dictionary we’ll increment its counter, if not we’ll set its counter to one. We’ll save every word in lowercase format.

# create dictionary
word_dict = {}
# loop through every sentence and give it a weight
for word in doc:
    word = word.text.lower()
    if word in word_dict:
        word_dict[word] += 1
    else:
        word_dict[word] = 1

Scoring the Sentences

Once we’ve gathered all the word counts, we can use those to score our sentences. We’ll create a list of tuples. Each tuple contains information we need about the sentence – the sentences text, the sentences score, and the original sentence index. We’ll loop through each index and sentence the enumerated Doc sentences. The enumerate command returns an index and the element at that index for any iterable. For each word in the sentence, we’ll add the word score to the sentence score. At the end of looping through all the words in the sentence, we’ll append the sentence text, the sentence score normalized by length, and the original index.

# create a list of tuple (sentence text, score, index)
sents = []
# score sentences
sent_score = 0
for index, sent in enumerate(doc.sents):
    for word in sent:
        word = word.text.lower()
        sent_score += word_dict[word]
    sents.append((sent.text.replace("\n", " "), sent_score/len(sent), index))

Sorting the Sentences

Now that our list of sentences is created, we’ll have to sort them so that we get the highest scored sentences in our summary. First we’ll use a lambda function to sort by the negative version of the score. Why negative? Because the automatic sort function sorts from smallest to largest. After we’ve sorted by score, we take the top 3 and then re-sort those by index so that our summary is in order. You can take however many sentences you’d like and even change the number of sentences you want based on the length of the text.

# sort sentence by word occurrences
sents = sorted(sents, key=lambda x: -x[1])
# return top 3
sents = sorted(sents[:3], key=lambda x: x[2])

Returning the Summary

All we have to do to get our resulting summary is take the list of sorted sentences and put them together, separated by a space. Finally, we’ll print it out to take a look.

# compile them into text
summary_text = ""
for sent in sents:
    summary_text += sent[0] + " "
 
print(summary_text)

Once we run our program, we should see an example like the one below.

example ai text summarizer output
example text summarizer output

Build an AI Text Summarizer in 15 Lines of Python

So we’ve covered how to build an AI text summarizer in under 30 lines of code, let’s also do it in 15. For this part of the tutorial we only need to send an HTTP request. Before we get started with that we’ll have to go to The Text API and register for a free API key. Once you’ve registered for a key, let’s install the requests library.

pip install requests

We’ll import the libraries we need to get started. We’ll use requests to send our HTTP request and json to parse the response.

import requests
import json
 
from config import apikey

Setting Up the Request

Let’s set up the request. The text we’ll summarize is a description of The Text API and what it can do. We’ll also need to set up some headers, the body, and the URL endpoint. The headers will tell the server that the content we’re sending is in JSON format and also pass the API key we got earlier. The body will simply pass in the text we have as the “text” attribute. The URL will be the summarize endpoint from The Text API.

text = "The Text API is easy to use and useful for anyone who needs to do text processing. It's the best Text Processing web API. The Text API allows you to do amazing NLP without having to download or manage any models. The Text API provides many NLP capabilities. These capabilities range from custom Named Entity Recognition (NER) to Summarization to extracting the Most Common Phrases. NER and Summarizations are both commonly used endpoints with business use cases. Use cases include identifying entities in articles, summarizing news articles, and more. The Text API is built on a transformer model."
 
headers = {
    "Content-Type": "application/json",
    "apikey": apikey
}
body = {
    "text": text
}
url = "https://app.thetextapi.com/text/summarize"

Sending the Request and Parsing the Response

After setting up the request, all we have to do is send the request and then parse our response via JSON. The request will return both a user item and a summary item. We only need the value of the summary item. 

response = requests.post(url, headers=headers, json=body)
summary = json.loads(response.text)["summary"]
print(summary)

Let’s run this and see our response. It should look something like the response below.

example text summarizer output
example text summarizer output

You can read more about other NLP concepts such as Named Entity Recognition (NER), Part of Speech (POS) Tagging, and more on this blog.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang
Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

30 thoughts on “Build Your Own AI Text Summarizer in Python

Leave a Reply

%d bloggers like this: