The Best Way to do Named Entity Recognition (NER)

Named Entity Recognition (NER) is a common Natural Language Processing technique. It’s so often used that it comes in the basic pipeline for spaCy. NER can help us quickly parse out a document for all the named entities of many different types. For example, if we’re reading an article, we can use named entity recognition to immediately get an idea of the who/what/when/where of the article.

In this post we’re going to cover three different ways you can implement NER in Python. We’ll be going over:

What is Named Entity Recognition?

Named Entity Recognition, or NER for short, is the Natural Language Processing (NLP) topic about recognizing entities in a text document or speech file. Of course, this is quite a circular definition. In order to understand what NER really is, we’ll have to define what an entity is. For the purposes of NLP, an entity is essentially a noun that defines an individual, group of individuals, or a recognizable object. While there is not a TOTAL consensus on what kinds of entities there are, I’ve compiled a rather complete list of the possible types of entities that popular NLP libraries such as spaCy or Natural Language Toolkit (NLTK) can recognize. You can find the GitHub repo here.

List of Common Named Entities

Entity TypeDescription of the NER object
PERSONA person – usually a recognized as a first and last name
NORPNationalities or Religious/Political Groups
FACThe name of a Facility
ORGThe name of an Organization
GPEThe name of a Geopolitical Entity
LOCA location
PRODUCTThe name of a product
EVENTThe name of an event
WORK OF ARTThe name of a work of art
LAWA law that has been published (US only as far as I know)
LANGUAGEThe name of a language
DATEA date, doesn’t have to be an exact date, could be a relative date like “a day ago”
TIMEA time, like date it doesn’t have to be exact, it could be like “middle of the day”
PERCENTA percentage
MONEYAn amount of money, like “$100”
QUANTITYMeasurements of weight or distance
CARDINALA number, similar to quantity but not a measurement
ORDINALA number, but signifying a relative position such as “first” or “second”

How Can I Implement NER in Python?

Earlier, I mentioned that you can implement NER with both spaCy and NLTK. The difference between these libraries is that NLTK is built for academic/research purposes and spaCy is built for production purposes. Both are free to use open source libraries. NER is extremely easy to implement with these open source libraries. In this article I will show you how to get started implementing your own Named Entity Recognition programs.

spaCy Named Entity Recognition (NER)

We’ll start with spaCy, to get started run the commands below in your terminal to install the library and download a starter model.

pip install spacy
python -m spacy download en_core_web_sm

We can implement NER in spaCy in just a few lines of code. All we need to do is import the spacy library, load a model, give it some text to process, and then call the processed document to get our named entities. For this example we’ll be using the “en_core_web_sm” model we downloaded earlier, this is the “small” model trained on web text. The text we’ll use is just some random sentence I made up, we should expect the NER to identify Molly Moon as a Person (NER isn’t advanced enough to detect that she is a cow), to identify the United Nations’ as an organization, and the Climate Action Committee as a second organization.

import spacy
 
nlp = spacy.load("en_core_web_sm")
 
text = "Molly Moon is a cow. She is part of the United Nations Climate Action Committee."
 
doc = nlp(text)
 
for ent in doc.ents:
    print(ent.text, ent.label_)

After we run this we should see a result like the one below. We see that this spaCy model is unable to separate the United Nations and its Climate Action Committee as separate orgs.

named entity recognition python spacy results
named entity recognition spacy results

Named Entity Recognition with NLTK

Let’s take a look at how to implement NER with NLTK. As with spaCy, we’ll start by installing the NLTK library and also downloading the extensions we need.

pip install nltk

After we run our initial pip install, we’ll need to download four extensions to get our Named Entity Recognition program running. I recommend simply firing up Python in your terminal and running these commands as the libraries only need to be downloaded once to work, so including them in your NER program will only slow it down.

python
>>> import nltk
>>> nltk.download(“punkt”)
>>> nltk.download(“averaged_perceptron_tagger”)
>>> nltk.download(“maxent_ne_chunker”)
>>> nltk.download(“words”)

Punkt is a tokenizer package that recognizes punctuation. Averaged Perceptron Tagger is the default part of speech tagger for NLTK. Maxent NE Chunker is the Named Entity Chunker for NLTK. The Words library is an NLTK corpus of words. We can already see here that NLTK is far more customizable, and consequently also more complex to set up. Let’s dive into the program to see how we can extract our named entities.

Once again we simply start by importing our library and declaring our text. Then we’ll tokenize the text, tag the parts of speech, and chunk it using the named entity chunker. Finally, we’ll loop through our chunks and display the ones that are labeled.

import nltk
 
text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."
 
tokenized = nltk.word_tokenize(text)
pos_tagged = nltk.pos_tag(tokenized)
chunks = nltk.ne_chunk(pos_tagged)
for chunk in chunks:
    if hasattr(chunk, 'label'):
        print(chunk)

When you run this program in your terminal you should see an output like the one below.

named entity recognition python results - nltk
named entity recognition results – nltk

Notice that NLTK has identified “Climate Action Committee” as a Person and Moon as a Person. That’s clearly incorrect, but this is all on pre trained data. Also this time, I let it print out the entire chunk, and it shows the parts of speech. NLTK has tagged all of these as “NNP” which signals a proper noun.

A Simpler and More Accurate NER Implementation

Alright, now that we’ve discussed how to implement NER with open source libraries, let’s take a look at how we can do it without ever having to download extra packages and machine learning models! We can simply ping a web API that already has a pre-trained model and pipeline for tons of text processing needs. We’ll be using the open beta of the The Text API, scroll down to the bottom of the page and get your API key.

The only library we need to install is the requests library, and we only need to be able to send an API request as outlined in How to Send a Web API Request. So, let’s take a look at the code.

All we need is to construct a request to send to the endpoint, send the request, and parse the response. The API key should be passed in the headers as “apikey” and also we should specify that the content type is json. The body simply needs to pass the text in. The endpoint that we’ll hit is “https://app.thetextapi.com/text/ner”. Once we get our request back, we’ll use the json library (native to Python) to parse our response.

import requests
import json
from config import apikey
 
text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."
headers = {
    "Content-Type": "application/json",
    "apikey": apikey
}
body = {
    "text": text
}
url = "https://app.thetextapi.com/text/ner"
 
response = requests.post(url, headers=headers, json=body)
ner = json.loads(response.text)["ner"]
print(ner)

Once we send this request, we should see an output like the one below.

named entity recognition with the text api
named entity recognition with the text api

Woah! Our API actually recognizes all three of the named entities successfully! Not only is using The Text API simpler than downloading multiple models and libraries, but in this use case, we can see that it’s also more accurate.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang
Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
%d bloggers like this: