Categories
Machine Learning NLP spaCy The Text API

Build Your Own AI Text Summarizer in Python

For this example, we’re going to build a naive extractive text summarizer in 25 lines of Python. An extractive summary is a summary of a document that is directly extracted from the text. For more information on AI summaries, check out this article on What is AI Text Summarization and How Can I Use It?

We will build an AI text summarizer in two ways. First with spaCy, then with The Text API. spaCy is one of the open source Python libraries for Natural Language Processing. The Text API is the best comprehensive sentiment analysis API online.

In this post on how to build an AI Text Summarizer in Python, we will cover:

Build an AI Text Summarizer in Under 30 Lines of Python

Before we can get started with the code we need to install spaCy and download a model. We can do this in the terminal with the following two commands. The en_core_web_sm model is the smallest model and the fastest to get started with. You can also download en_core_web_md, en_core_web_lg, and en_core_web_trf for other, larger English language models.

pip install spacy
python -m spacy download en_core_web_sm

Let’s get started with the code for our text summarizer! First, we’ll import spacy and load up the language model we downloaded earlier.

import spacy
 
nlp = spacy.load("en_core_web_sm")

For this tutorial, we’ll be building a simple extractive text summarizer based purely on the words in the text and how often they’re mentioned. We’re going to break down this text summarizer into a few simple steps.

First we’re going to create a word dictionary to keep track of word count. Then we’re going to score each sentence based on how often each word in that sentence appears. After that, we’re going to sort the sentences based on their score. Finally, we’ll take the top three scoring sentences and return them in the same order they originally appeared in the text.

Before we get into all that let’s load up our text and turn it into a spaCy Document. You can use whatever text you want. The text provided is just an example that talks about me and this blog.

# extractive summary by word count
text = """This is an example text. We will use seven sentences and we will return 3. This blog is written by Yujian Tang. Yujian is the best software content creator. This is a software content blog focused on Python, your software career, and Machine Learning. Yujian's favorite ML subcategory is Natural Language Processing. This is the end of our example."""
# tokenize
doc = nlp(text)

Getting All the Word Counts

Now that we have our text in Doc form, we can get all our word counts. You can actually do this before by splitting the string on spaces, but this is easier and we’ll need the Doc again later anyway.

First let’s create a word dictionary. Next, we’ll loop through the text and check if each word is in the dictionary. If the word is in the dictionary we’ll increment its counter, if not we’ll set its counter to one. We’ll save every word in lowercase format.

# create dictionary
word_dict = {}
# loop through every sentence and give it a weight
for word in doc:
    word = word.text.lower()
    if word in word_dict:
        word_dict[word] += 1
    else:
        word_dict[word] = 1

Scoring the Sentences for Our AI Text Summarizer

Once we’ve gathered all the word counts, we can use those to score our sentences. We’ll create a list of tuples. Each tuple contains information we need about the sentence – the sentences text, the sentences score, and the original sentence index. We’ll loop through each index and sentence in the enumerated Doc sentences.

The enumerate command returns an index and the element at that index for any iterable. For each word in the sentence, we’ll add the word score to the sentence score. At the end of looping through all the words in the sentence, we append the sentence text, the sentence score normalized by length, and the original index.

# create a list of tuple (sentence text, score, index)
sents = []
# score sentences
sent_score = 0
for index, sent in enumerate(doc.sents):
    for word in sent:
        word = word.text.lower()
        sent_score += word_dict[word]
    sents.append((sent.text.replace("\n", " "), sent_score/len(sent), index))

Sorting the Sentences for the Text Summarizer

Now that our list of sentences is created, we’ll have to sort them so that we get the highest scored sentences in our summary. First we’ll use a lambda function to sort by the negative version of the score.

Why negative? Because the automatic sort function sorts from smallest to largest. After we’ve sorted by score, we take the top 3 and then re-sort those by index so that our summary is in order. You can take however many sentences you’d like and even change the number of sentences you want based on the length of the text.

# sort sentence by word occurrences
sents = sorted(sents, key=lambda x: -x[1])
# return top 3
sents = sorted(sents[:3], key=lambda x: x[2])

Returning the Summary

All we have to do to get our resulting summary is take the list of sorted sentences and put them together, separated by a space. Finally, we’ll print it out to take a look.

# compile them into text
summary_text = ""
for sent in sents:
    summary_text += sent[0] + " "
 
print(summary_text)

Once we run our program, we should see an example like the one below. That’s all there is to building a simple text summarizer in Python with spaCy!

example text summarizer output

Build an AI Text Summarizer in 15 Lines of Python

So we’ve covered how to build an AI text summarizer in under 30 lines of code, let’s also do it in 15. For this part of the tutorial we only need to send an HTTP request. Before we get started with that we’ll have to go to The Text API and register for a free API key. Once you’ve registered for a key, you’ll need to install the requests library.

pip install requests

We’ll import the libraries we need to get started. We’ll use requests to send our HTTP request and json to parse the response.

import requests
import json
 
from config import apikey

Setting Up the API Request

Let’s set up the request. The text we’ll summarize is a description of The Text API and what it can do. We’ll also need to set up some headers, the body, and the URL endpoint. The headers will tell the server that the content we’re sending is in JSON format and also pass the API key we got earlier. The body will simply pass in the text we have as the “text” attribute. The URL will be the summarize endpoint from The Text API.

text = "The Text API is easy to use and useful for anyone who needs to do text processing. It's the best Text Processing web API. The Text API allows you to do amazing NLP without having to download or manage any models. The Text API provides many NLP capabilities. These capabilities range from custom Named Entity Recognition (NER) to Summarization to extracting the Most Common Phrases. NER and Summarizations are both commonly used endpoints with business use cases. Use cases include identifying entities in articles, summarizing news articles, and more. The Text API is built on a transformer model."
 
headers = {
    "Content-Type": "application/json",
    "apikey": apikey
}
body = {
    "text": text
}
url = "https://app.thetextapi.com/text/summarize"

Parsing the AI Text Summarizer Response

After setting up the request, all we have to do is send the request and then parse our response via JSON. The request will return both a user item and a summary item. We only need the value of the summary item. 

response = requests.post(url, headers=headers, json=body)
summary = json.loads(response.text)["summary"]
print(summary)

Let’s run this and see our response. It should look something like the response below.

example text summarizer output

You can read more about other NLP concepts such as Named Entity Recognition (NER), Part of Speech (POS) Tagging, and more on this blog.

Summary of Building an AI Text Summarizer

In this post we looked at two ways you can build an AI text summarizer. First, we used the popular text library, spaCy. We built a simple extractive AI text summarizer on top of a spaCy model using basic logic. Second, we built a text summarizer using The Text API, a comprehensive and easy to use web API.

Unlike many functions such as voice to text transcription or speech recognition, summarization is more subjective. The true value of an AI text summarizer lies in how effective it is for the end user’s requirements. So be aware of the end user’s use case when you think about creating your AI text summarizer.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly