I’m interested in analyzing the Tweets of a bunch of famous people so I can learn from them. I’ve built a program that will do this by pulling a list of recent tweets and doing some NLP on them. In this post we’re going to go over:
- Get all the Text for a Search Term on Twitter
- NLP Techniques to Run on Tweets
- Summarization
- Most Common Phrases
- Named Entity Recognition
- Sentiment Analysis
- Running all the NLP Techniques Concurrently
- Further Text Processing
- Finding the Most Commonly Named Entities
- Orchestration
- A Summary
To follow along you’ll need a free API key from The Text API and to install the requests
and aiohttp
library with the following line in your terminal:
pip install requests aiohttp
Overview of Project Structure
In this project we’re going to create multiple files and folders. We’re going to create a file for getting all the text called pull_tweets.py
. We’ll create a totally separate folder for the text processing, and we’ll have three files in there. Those three files are async_pool.py
for sending the text processing requests, ner_processing.py
for further text processing after doing NER, and a text_orchestrator.py
for putting the text analysis together.
Get all the Text for a Search Term on Twitter
We went over how to Scrape the Text from All Tweets for a Search Term in a recent post. For the purposes of this program, we’ll do almost the exact same thing with a twist. I’ll give a succinct description of what we’re doing in the code here. You’ll have to go read that post for a play-by-play of the code. This is the pull_tweets.py
file.
First we’ll import our libraries and bearer token. Then we’ll set up the request and headers and create a function to search Twitter. Our function will check if our search term is a user or not by checking to see if the first character is the “@” symbol. Then we’ll create our search body and send off the request. When we get the request back, we’ll parse it into JSON and compile all the Tweets into one string. Finally, we’ll return that string.
import requests
import json
from twitter_config import bearertoken
search_recent_endpoint = "https://api.twitter.com/2/tweets/search/recent"
headers = {
"Authorization": f"Bearer {bearertoken}"
}
# automatically builds a search query from the requested term
# looks for english tweets with no links that are not retweets
# returns the tweets
def search(term: str):
if term[0] == '@':
params = {
"query": f'from:{term[1:]} lang:en -has:links -is:retweet',
'max_results': 25
}
else:
params = {
"query": f'{term} lang:en -has:links -is:retweet',
'max_results': 25
}
response = requests.get(url=search_recent_endpoint, headers=headers, params=params)
res = json.loads(response.text)
tweets = res["data"]
text = ". ".join( for tweet in tweets])
return text
NLP Techniques to Run on Tweets
There’s a ton of different NLP techniques we can run, we can do Named Entity Recognition, analyze the text for polarity, summarize the text, and much more. Remember what we’re trying to do here. We’re trying to get some insight from these Tweets. With this in mind, for this project we’ll summarize the tweets, find the most common phrases, do named entity recognition, and run sentiment analysis.
We’re going to run all of these concurrently with asynchronous API requests. In the following sections we’re just going to set up the API requests. The first thing we’ll do is set up the values that are constant among all the tweets. This is creating the async_pool.py
file.
Setup Constants
Before we can set up our requests, we have to set up the constants for them. We’ll also do the imports for the rest of the async_pool.py
function. First, we’ll import the asyncio
, aiohttp
, and json
libraries. We’ll use the asyncio
and aiohttp
libraries for the async API calls later. We’ll also import our API key that we got earlier from The Text API.
We need to set up the headers for our requests. The headers will tell the server that we’re sending JSON data and also pass the API key. Then we’ll set up the API endpoints. The API endpoints that we’re hitting are the summarize
, ner
, most_common_phrases
, and text_polarity
API endpoints.
import asyncio
import aiohttp
import json
from .text_config import apikey
# configure request constants
headers = {
"Content-Type": "application/json",
"apikey": apikey
}
text_url = "https://app.thetextapi.com/text/"
summarize_url = text_url+"summarize"
ner_url = text_url+"ner"
mcp_url = text_url+"most_common_phrases"
polarity_url = text_url+"text_polarity"
Summarize the Tweets
We’ll set up a function to return these bodies so we can use them later. We only need one parameter for this function, the text
that we’re going to send. The first thing we’ll do in this function is set up an empty dictionary. Next we’ll set up the body to send to the summarize
endpoint. The summarize
endpoint will send the text and tell the server that we want a proportion of 0.1
of the Tweets.
def configure_bodies(text: str):
_dict = {}
_dict[summarize_url] = {
"text": text,
"proportion": 0.1
}
Find Most Common Phrases
After setting up the summarization body, we will set up the most_common_phrases
body. This request will send the text and set the number of phrases to 5.
_dict[mcp_url] = {
"text": text,
"num_phrases": 5
}
Named Entity Recognition
Now we’ve set up the summarization and most common phrases request bodies. After those, we’ll set up the NER request body. The NER request body will pass the text and tell the server that we’re sending an “ARTICLE” type. The “ARTICLE” type returns people, places, organization, locations, and times.
_dict[ner_url] = {
"text": text,
"labels": "ARTICLE"
}
Sentiment Analysis
We’ve now set up the summarization, most common phrases, and named entity recognition request bodies. Next, is the sentiment analysis or text polarity body. Those terms are basically interchangeable. This request will just send the text in the body. We don’t need to specify any other optional parameters here. We’ll return the dictionar we created after setting this body.
_dict[polarity_url] = {
"text": text
}
return _dict
Full Code for Configuring Requests
Here’s the full code for configuring the request bodies.
# configure request bodies
# return a dict of url: body
def configure_bodies(text: str):
_dict = {}
_dict[summarize_url] = {
"text": text,
"proportion": 0.1
}
_dict[ner_url] = {
"text": text,
"labels": "ARTICLE"
}
_dict[mcp_url] = {
"text": text,
"num_phrases": 5
}
_dict[polarity_url] = {
"text": text
}
return _dict
Run All NLP Techniques Concurrently
For a full play-by-play of this code check out how to send API requests asynchronously. I’ll go over an outline here. This is almost the exact same code with a few twists. This consists of three functions, gather_with_concurrency
, post_async
, and pool
.
First, we’ll look at the gather_with_concurrency
function. This function takes two parameters, the number of concurrent tasks, and the list of tasks. All we’ll do in this function is set up a semaphore to asynchronously execute these tasks. At the end of the function, we’ll return the gathered tasks.
Next we’ll create the post_async
function. This function will take four parameters, the url
, session
, headers
, and body
for the request. We’ll asynchronously use the session
passed in to execute a request. We’ll return the text after getting the response back.
Finally, we’ll create a pool
function to execute all of the requests concurrently. This function will take one parameter, the text
we want to process. We’ll create a connection and a session and then use the configure_requests
function to get the request bodies. Next, we’ll use the gather_with_concurrency
and post_async
function to execute all the requests asynchronously. Finally, we’ll close the session and return the summary, most common phrases, recognized named entities, and polarity.
# configure async requests
# configure gathering of requests
async def gather_with_concurrency(n, *tasks):
semaphore = asyncio.Semaphore(n)
async def sem_task(task):
async with semaphore:
return await task
return await asyncio.gather(*(sem_task(task) for task in tasks))
# create async post function
async def post_async(url, session, headers, body):
async with session.post(url, headers=headers, json=body) as response:
text = await response.text()
return json.loads(text)
async def pool(text):
conn = aiohttp.TCPConnector(limit=None, ttl_dns_cache=300)
session = aiohttp.ClientSession(connector=conn)
urls_bodies = configure_bodies(text)
conc_req = 4
summary, ner, mcp, polarity = await gather_with_concurrency(conc_req, *[post_async(url, session, headers, body) for url, body in urls_bodies.items()])
await session.close()
return summary["summary"], ner["ner"], mcp["most common phrases"], polarity["text polarity"]
Full Code for Asynchronously Executing all NLP techniques
Here’s the full code for async_pool.py
.
import asyncio
import aiohttp
import json
from .text_config import apikey
# configure request constants
headers = {
"Content-Type": "application/json",
"apikey": apikey
}
text_url = "https://app.thetextapi.com/text/"
summarize_url = text_url+"summarize"
ner_url = text_url+"ner"
mcp_url = text_url+"most_common_phrases"
polarity_url = text_url+"text_polarity"
# configure request bodies
# return a dict of url: body
def configure_bodies(text: str):
_dict = {}
_dict[summarize_url] = {
"text": text,
"proportion": 0.1
}
_dict[ner_url] = {
"text": text,
"labels": "ARTICLE"
}
_dict[mcp_url] = {
"text": text,
"num_phrases": 5
}
_dict[polarity_url] = {
"text": text
}
return _dict
# configure async requests
# configure gathering of requests
async def gather_with_concurrency(n, *tasks):
semaphore = asyncio.Semaphore(n)
async def sem_task(task):
async with semaphore:
return await task
return await asyncio.gather(*(sem_task(task) for task in tasks))
# create async post function
async def post_async(url, session, headers, body):
async with session.post(url, headers=headers, json=body) as response:
text = await response.text()
return json.loads(text)
async def pool(text):
conn = aiohttp.TCPConnector(limit=None, ttl_dns_cache=300)
session = aiohttp.ClientSession(connector=conn)
urls_bodies = configure_bodies(text)
conc_req = 4
summary, ner, mcp, polarity = await gather_with_concurrency(conc_req, *[post_async(url, session, headers, body) for url, body in urls_bodies.items()])
await session.close()
return summary["summary"], ner["ner"], mcp["most common phrases"], polarity["text polarity"]
Further Text Processing
After doing the initial NLP we’ll still get some text back. We can continue doing some NLP on the summarization, most common phrases, and the named entities. Let’s go back to what we’re trying to do – get insights. The summary will help us get a general idea, the most common phrases will tell us what the most commonly said things are, but the NER is a little too broad still. Let’s further process the NER by finding the most commonly named entities.
Most Commonly Named Entities
For a play-by-play of this code, read the post on how to Find the Most Common Named Entities of Each Type. I’m going to give a high-level overview here. We’re going to build two functions, build_dict
to split the named entities into each type, and most_common
to sort that dictionary.
The build_dict
function will take one parameter, ners
, a list of lists. We’ll start off this function by creating an empty dictionary. Then we’ll loop through the list of ners
and add those to the dictionary based on whether or not we’ve seen the type and name of the ner
.
The most_common
function will take one parameter as well, ners
, a list of lists. The first thing we’ll do with this function is call build_dict
to create the dictionary. Then, we’ll initialize an empty dictionary. Next, we’ll loop through the dictionary and sort each list of NER types. Finally, we’ll add the most common names in each type to the initialized dictionary and return that.
# build dictionary of NERs
# extract most common NERs
# expects list of lists
def build_dict(ners: list):
outer_dict = {}
for ner in ners:
entity_type = ner[0]
entity_name = ner[1]
if entity_type in outer_dict:
if entity_name in outer_dict[entity_type]:
outer_dict[entity_type][entity_name] += 1
else:
outer_dict[entity_type][entity_name] = 1
else:
outer_dict[entity_type] = {
entity_name: 1
}
return outer_dict
# return most common entities after building the NERS out
def most_common(ners: list):
_dict = build_dict(ners)
mosts = {}
for ner_type in _dict:
sorted_types = sorted(_dict[ner_type], key=lambda x: _dict[ner_type][x], reverse=True)
mosts[ner_type] = sorted_types[0]
return mosts
Orchestration
Finally, we’ll orchestrate our functions. First, we’ll start by importing the asyncio
library and the two functions we’ll need to orchestrate, pool
, and most_common
. We’ll create one function, orchestrate_text_analysis
, which will take one parameter, text
.
The first thing we’ll do in our orchestrator is get the summary, NERs, most common phrases, and text polarity using asyncio
to execute the four NLP techniques concurrently. Then, we’ll do more text processing on the NERs. We’ll also replace the newlines in the summary to make it more readable. Finally, we’ll return the summary, most common entities, most common phrases, and sentiment.
import asyncio
from .async_pool import pool
from .ner_processing import most_common
def orchestrate_text_analysis(text:str):
"""Step 1"""
# task to execute all requests
summary, ner, mcp, polarity = asyncio.get_event_loop().run_until_complete(pool(text))
"""Step 2"""
# do NER analysis
most_common_ners = most_common(ner)
summary = summary.replace("\n", "")
return summary, most_common_ners, mcp, polarity
Summary
In this post we went over how to pull Tweets for a search term and transform that into a text. Then, we went over how to asynchronously call four APIs to run NLP on the Tweets. Next, we went over how to do some further text processing. Finally, we went over how to orchestrate the NLP on the text. I’ll be using this program to get insights from some people I want to be like on Twitter.
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly