POS & NER Part 4: Assessment

Chitra's Playground
3 min readSep 18, 2024

--

Now you’ve learned about part of speech and name entity recognitions in NLP. It’s time to review everything we have learned in this chapter.

This time I will use a txt file containing the story of The Tale of Peter Rabbit by Beatrix Potter that you can find by clicking on the title. Copy and paste it into Notepad then save it in txt format and import it into the Google Colab environment. Now let’s begin with our assessment.

As usual, import the spacy and displacy modules and import the language library as well.

import spacy
from spacy import displacy

nlp = spacy.load('en_core_web_sm')

Next create a doc object for the peter rabbit story.

with open('/content/peterrabbit.txt') as f:
doc = nlp(f.read())

Now, let’s try to list every tokens, POS, tag and also the tag description from the third sentence of the document.

for token in list(doc.sents)[2]:
print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {str(spacy.explain(token.tag_))}")

After you run the code, you can see a list of tokens along with the POS, tag and tag description.

Next, let’s try to count how many times each POS appears from the entire document.

pos_counts = doc.count_by(spacy.attrs.POS)

for k,v in sorted(pos_counts.items()):
print(f"{k}. {doc.vocab[k].text:{8}}: {v}")')

Here is the list of all POS from the entire documents and also the key from each pos and the number of appearances.

We’re going to count how many nouns there are compared to the total number of words. In the picture, ‘NOUN’ had the key value 92.

100*pos_counts[92]/len(doc)

Let’s create a diagram that shows how the words in the third sentence relate to each other.

displacy.render(list(doc.sents)[2], style='dep', jupyter=True)

Let’s point out the first two specific people, organizations, or locations that are named in the text.

for ent in doc.ents[:2]:
print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

We’ll not only point out the first two specific people, organizations, or locations, but we’ll also tell you what kind of names they are and what they refer to.

Now I’d like to find out how many sentences contain named entities

list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

And lastly, try to visualize the named entity from the previous task.

displacy.render(list_of_sents[0], style='ent', jupyter=True)

Congratulations! You’ve mastered POS & NER! You learned how to import necessary libraries, process text data, and extract valuable information from documents. We practiced these techniques on the story of “The Tale of Peter Rabbit” by Beatrix Potter. You successfully identified individual words (tokens), their grammatical functions (POS), and named entities (people, organizations, locations) within the text. You even visualized the relationships between words and identified the proportion of nouns within the story.

--

--

Chitra's Playground
Chitra's Playground

Written by Chitra's Playground

Tech enthusiast with a passion for machine learning & eating chicken. Sharing insights on my learning journey.

No responses yet