For this example, we’re going to build a naive extractive text summarizer in 25 lines of Python. An extractive summary is a summary of a document that is directly extracted from the text. For more information on AI summaries, check out this article on What is AI Text Summarization and How Can I Use It?
We will build an AI text summarizer in two ways. First with spaCy, then with The Text API. spaCy is one of the open source Python libraries for Natural Language Processing. The Text API is the best comprehensive sentiment analysis API online.
In this post on how to build an AI Text Summarizer in Python, we will cover:
- Building an AI Text Summarizer in Under 30 Lines of Python
- Getting the Count of each Word in the Text
- Scoring the Sentences for the Text Summarizer
- Sorting the Sentences for Our AI Text Summarizer
- Returning the Text Summarization
- Building an AI Text Summarizer in Under 15 Lines of Python
- Setting Up the API Request to the AI Text Summarizer
- Parsing the AI Text Summarizer Response
- Summary of Building an AI Text Summarizer
Build an AI Text Summarizer in Under 30 Lines of Python
Before we can get started with the code we need to install spaCy and download a model. We can do this in the terminal with the following two commands. The en_core_web_sm
model is the smallest model and the fastest to get started with. You can also download en_core_web_md
, en_core_web_lg
, and en_core_web_trf
for other, larger English language models.
pip install spacy
python -m spacy download en_core_web_sm
Let’s get started with the code for our text summarizer! First, we’ll import spacy
and load up the language model we downloaded earlier.
import spacy
nlp = spacy.load("en_core_web_sm")
For this tutorial, we’ll be building a simple extractive text summarizer based purely on the words in the text and how often they’re mentioned. We’re going to break down this text summarizer into a few simple steps.
First we’re going to create a word dictionary to keep track of word count. Then we’re going to score each sentence based on how often each word in that sentence appears. After that, we’re going to sort the sentences based on their score. Finally, we’ll take the top three scoring sentences and return them in the same order they originally appeared in the text.
Before we get into all that let’s load up our text and turn it into a spaCy Document. You can use whatever text you want. The text provided is just an example that talks about me and this blog.
# extractive summary by word count
text = """This is an example text. We will use seven sentences and we will return 3. This blog is written by Yujian Tang. Yujian is the best software content creator. This is a software content blog focused on Python, your software career, and Machine Learning. Yujian's favorite ML subcategory is Natural Language Processing. This is the end of our example."""
# tokenize
doc = nlp(text)
Getting All the Word Counts
Now that we have our text in Doc form, we can get all our word counts. You can actually do this before by splitting the string on spaces, but this is easier and we’ll need the Doc again later anyway.
First let’s create a word dictionary. Next, we’ll loop through the text and check if each word is in the dictionary. If the word is in the dictionary we’ll increment its counter, if not we’ll set its counter to one. We’ll save every word in lowercase format.
# create dictionary
word_dict = {}
# loop through every sentence and give it a weight
for word in doc:
word = word.text.lower()
if word in word_dict:
word_dict[word] += 1
else:
word_dict[word] = 1
Scoring the Sentences for Our AI Text Summarizer
Once we’ve gathered all the word counts, we can use those to score our sentences. We’ll create a list of tuples. Each tuple contains information we need about the sentence – the sentences text, the sentences score, and the original sentence index. We’ll loop through each index and sentence in the enumerated Doc sentences.
The enumerate
command returns an index and the element at that index for any iterable. For each word in the sentence, we’ll add the word score to the sentence score. At the end of looping through all the words in the sentence, we append the sentence text, the sentence score normalized by length, and the original index.
# create a list of tuple (sentence text, score, index)
sents = []
# score sentences
sent_score = 0
for index, sent in enumerate(doc.sents):
for word in sent:
word = word.text.lower()
sent_score += word_dict[word]
sents.append((sent.text.replace("\n", " "), sent_score/len(sent), index))
Sorting the Sentences for the Text Summarizer
Now that our list of sentences is created, we’ll have to sort them so that we get the highest scored sentences in our summary. First we’ll use a lambda function to sort by the negative version of the score.
Why negative? Because the automatic sort function sorts from smallest to largest. After we’ve sorted by score, we take the top 3 and then re-sort those by index so that our summary is in order. You can take however many sentences you’d like and even change the number of sentences you want based on the length of the text.
# sort sentence by word occurrences
sents = sorted(sents, key=lambda x: -x[1])
# return top 3
sents = sorted(sents[:3], key=lambda x: x[2])
Returning the Summary
All we have to do to get our resulting summary is take the list of sorted sentences and put them together, separated by a space. Finally, we’ll print it out to take a look.
# compile them into text
summary_text = ""
for sent in sents:
summary_text += sent[0] + " "
print(summary_text)
Once we run our program, we should see an example like the one below. That’s all there is to building a simple text summarizer in Python with spaCy!
Build an AI Text Summarizer in 15 Lines of Python
So we’ve covered how to build an AI text summarizer in under 30 lines of code, let’s also do it in 15. For this part of the tutorial we only need to send an HTTP request. Before we get started with that we’ll have to go to The Text API and register for a free API key. Once you’ve registered for a key, you’ll need to install the requests
library.
pip install requests
We’ll import the libraries we need to get started. We’ll use requests
to send our HTTP request and json
to parse the response.
import requests
import json
from config import apikey
Setting Up the API Request
Let’s set up the request. The text we’ll summarize is a description of The Text API and what it can do. We’ll also need to set up some headers, the body, and the URL endpoint. The headers will tell the server that the content we’re sending is in JSON format and also pass the API key we got earlier. The body will simply pass in the text we have as the “text” attribute. The URL will be the summarize
endpoint from The Text API.
text = "The Text API is easy to use and useful for anyone who needs to do text processing. It's the best Text Processing web API. The Text API allows you to do amazing NLP without having to download or manage any models. The Text API provides many NLP capabilities. These capabilities range from custom Named Entity Recognition (NER) to Summarization to extracting the Most Common Phrases. NER and Summarizations are both commonly used endpoints with business use cases. Use cases include identifying entities in articles, summarizing news articles, and more. The Text API is built on a transformer model."
headers = {
"Content-Type": "application/json",
"apikey": apikey
}
body = {
"text": text
}
url = "https://app.thetextapi.com/text/summarize"
Parsing the AI Text Summarizer Response
After setting up the request, all we have to do is send the request and then parse our response via JSON. The request will return both a user
item and a summary
item. We only need the value of the summary
item.
response = requests.post(url, headers=headers, json=body)
summary = json.loads(response.text)["summary"]
print(summary)
Let’s run this and see our response. It should look something like the response below.
You can read more about other NLP concepts such as Named Entity Recognition (NER), Part of Speech (POS) Tagging, and more on this blog.
Summary of Building an AI Text Summarizer
In this post we looked at two ways you can build an AI text summarizer. First, we used the popular text library, spaCy. We built a simple extractive AI text summarizer on top of a spaCy model using basic logic. Second, we built a text summarizer using The Text API, a comprehensive and easy to use web API.
Unlike many functions such as voice to text transcription or speech recognition, summarization is more subjective. The true value of an AI text summarizer lies in how effective it is for the end user’s requirements. So be aware of the end user’s use case when you think about creating your AI text summarizer.
Further Reading
- Floyd Warshall in Python
- Why Programming is Easy but Software Engineering is Hard
- Nested Lists in Python
- Dijkstra’s Algorithm in Python
- How to Run Functions in Parallel in Python
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly
why this always return last three lines of input paragraph
thank you so much for this awe-inspiring internet site me and my class admired this content and penetration
glad you enjoyed it and it was helpful