Named Entity Recognition (NER) is a common Natural Language Processing technique. It’s so often used that it comes in the basic pipeline for spaCy. NER can help us quickly parse out a document for all the named entities of many different types. For example, if we’re reading an article, we can use named entity recognition to immediately get an idea of the who/what/when/where of the article.
In this post we’re going to cover three different ways you can implement NER in Python. We’ll be going over:
- What is Named Entity Recognition (NER)
- A list of recognizable named entities
- How Can I Implement NER?
- spaCy Named Entity Recognition
- Named Entity Recognition with NLTK
- The Best Way to Implement NER
What is Named Entity Recognition?
Named Entity Recognition, or NER for short, is the Natural Language Processing (NLP) topic about recognizing entities in a text document or speech file. Of course, this is quite a circular definition. In order to understand what NER really is, we’ll have to define what an entity is. For the purposes of NLP, an entity is essentially a noun that defines an individual, group of individuals, or a recognizable object. While there is not a TOTAL consensus on what kinds of entities there are, I’ve compiled a rather complete list of the possible types of entities that popular NLP libraries such as spaCy or Natural Language Toolkit (NLTK) can recognize. You can find the GitHub repo here.
List of Common Named Entities
Entity Type | Description of the NER object |
PERSON | A person – usually a recognized as a first and last name |
NORP | Nationalities or Religious/Political Groups |
FAC | The name of a Facility |
ORG | The name of an Organization |
GPE | The name of a Geopolitical Entity |
LOC | A location |
PRODUCT | The name of a product |
EVENT | The name of an event |
WORK OF ART | The name of a work of art |
LAW | A law that has been published (US only as far as I know) |
LANGUAGE | The name of a language |
DATE | A date, doesn’t have to be an exact date, could be a relative date like “a day ago” |
TIME | A time, like date it doesn’t have to be exact, it could be like “middle of the day” |
PERCENT | A percentage |
MONEY | An amount of money, like “$100” |
QUANTITY | Measurements of weight or distance |
CARDINAL | A number, similar to quantity but not a measurement |
ORDINAL | A number, but signifying a relative position such as “first” or “second” |
How Can I Implement NER in Python?
Earlier, I mentioned that you can implement NER with both spaCy and NLTK. The difference between these libraries is that NLTK is built for academic/research purposes and spaCy is built for production purposes. Both are free to use open source libraries. NER is extremely easy to implement with these open source libraries. In this article I will show you how to get started implementing your own Named Entity Recognition programs.
spaCy Named Entity Recognition (NER)
We’ll start with spaCy, to get started run the commands below in your terminal to install the library and download a starter model.
pip install spacy
python -m spacy download en_core_web_sm
We can implement NER in spaCy in just a few lines of code. All we need to do is import the spacy library, load a model, give it some text to process, and then call the processed document to get our named entities. For this example we’ll be using the “en_core_web_sm” model we downloaded earlier, this is the “small” model trained on web text. The text we’ll use is just some random sentence I made up, we should expect the NER to identify Molly Moon as a Person (NER isn’t advanced enough to detect that she is a cow), to identify the United Nations’ as an organization, and the Climate Action Committee as a second organization.
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Molly Moon is a cow. She is part of the United Nations Climate Action Committee."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
After we run this we should see a result like the one below. We see that this spaCy model is unable to separate the United Nations and its Climate Action Committee as separate orgs.
Named Entity Recognition with NLTK
Let’s take a look at how to implement NER with NLTK. As with spaCy, we’ll start by installing the NLTK library and also downloading the extensions we need.
pip install nltk
After we run our initial pip install, we’ll need to download four extensions to get our Named Entity Recognition program running. I recommend simply firing up Python in your terminal and running these commands as the libraries only need to be downloaded once to work, so including them in your NER program will only slow it down.
python
>>> import nltk
>>> nltk.download(“punkt”)
>>> nltk.download(“averaged_perceptron_tagger”)
>>> nltk.download(“maxent_ne_chunker”)
>>> nltk.download(“words”)
Punkt is a tokenizer package that recognizes punctuation. Averaged Perceptron Tagger is the default part of speech tagger for NLTK. Maxent NE Chunker is the Named Entity Chunker for NLTK. The Words library is an NLTK corpus of words. We can already see here that NLTK is far more customizable, and consequently also more complex to set up. Let’s dive into the program to see how we can extract our named entities.
Once again we simply start by importing our library and declaring our text. Then we’ll tokenize the text, tag the parts of speech, and chunk it using the named entity chunker. Finally, we’ll loop through our chunks and display the ones that are labeled.
import nltk
text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."
tokenized = nltk.word_tokenize(text)
pos_tagged = nltk.pos_tag(tokenized)
chunks = nltk.ne_chunk(pos_tagged)
for chunk in chunks:
if hasattr(chunk, 'label'):
print(chunk)
When you run this program in your terminal you should see an output like the one below.
Notice that NLTK has identified “Climate Action Committee” as a Person and Moon as a Person. That’s clearly incorrect, but this is all on pre trained data. Also this time, I let it print out the entire chunk, and it shows the parts of speech. NLTK has tagged all of these as “NNP” which signals a proper noun.
A Simpler and More Accurate NER Implementation
Alright, now that we’ve discussed how to implement NER with open source libraries, let’s take a look at how we can do it without ever having to download extra packages and machine learning models! We can simply ping a web API that already has a pre-trained model and pipeline for tons of text processing needs. We’ll be using the open beta of the The Text API, scroll down to the bottom of the page and get your API key.
The only library we need to install is the requests library, and we only need to be able to send an API request as outlined in How to Send a Web API Request. So, let’s take a look at the code.
All we need is to construct a request to send to the endpoint, send the request, and parse the response. The API key should be passed in the headers as “apikey” and also we should specify that the content type is json. The body simply needs to pass the text in. The endpoint that we’ll hit is “https://app.thetextapi.com/text/ner”. Once we get our request back, we’ll use the json library (native to Python) to parse our response.
import requests
import json
from config import apikey
text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."
headers = {
"Content-Type": "application/json",
"apikey": apikey
}
body = {
"text": text
}
url = "https://app.thetextapi.com/text/ner"
response = requests.post(url, headers=headers, json=body)
ner = json.loads(response.text)["ner"]
print(ner)
Once we send this request, we should see an output like the one below.
Woah! Our API actually recognizes all three of the named entities successfully! Not only is using The Text API simpler than downloading multiple models and libraries, but in this use case, we can see that it’s also more accurate.
Further Reading
- Firebase Auth + FastAPI with Pyrebase and Firebase Admin
- Natural Language Processing – What is Text Polarity?
- Neural Network Code in Python from Scratch
- How to Create a High Level Design Document
- Build Your Own AI Text Summarizer with Python
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly
16 thoughts on “The Best Way to do Named Entity Recognition (NER)”