POS & NER Part 1 : Part of Speech Tagging
POS tagging is assigning each word in a sentence its corresponding part-of-speech such as noun, verb, adjective, etc to understand the structure and meaning of a sentence by specifying the relationship between each words. For example the word read can be a present tense or past tense, depending on the context of the sentence.
Now, we’ll demonstrate part-of-speech tagging in a Google Colab environment. First import spacy library and the language dictionary.
import spacy
nlp = spacy.load('en_core_web_sm')
Next, let’s create a variable called “doc” to store the sentences we'll use for this experiment.
doc = nlp(u'Yesterday, I visited my friend’s house, where we discussed our plans for the upcoming weekend. We had been planning this get-together for weeks, and now everything is finally coming together. As we talked, we realized how much we have changed since we started working together five years ago. By this time next year, we will have achieved many of our goals. Right now, I am feeling excited about what the future holds, and I know we will continue to grow personally and professionally.')
Show all the individual words in the sentences, along with their part-of-speech, tag and a short tag explanation. There is a long list of fine-grained POS tags that you can find here.
for token in doc:
print(f"{token.text:{15}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}")
Next, we’ll calculate the frequency of different part-of-speech tags in doc variable. Counting POS tag is identifying the frequency of different POS. For instance, counting how many nouns or verbs are there in a document.
pos_count = doc.count_by(spacy.attrs.POS)
for pos_count in pos_count.items():
print(f'{pos_count[0]:{5}}. {doc.vocab[pos_count[0]].text:{6}}: {pos_count[1]}')
Based on the analysis of the “doc” variable, nouns and punctuation marks occurred 13 times each, while pronouns appeared 16 times. You can see a complete breakdown of all tags in the results below.
Aside from counting part-of-speech tags, you can use the code above to count different fine-grained tags or dependencies by changing the attribute from (spacy.attrs.POS) to (spacy.attrs.TAG) or (spacy.attrs.DEP).
As a last step, we will create a visual representation of the POS tags using displacy.
from spacy import displacy
displacy.render(doc,style='dep',jupyter=True)
Below is a partial image of the POS visualization results. The full image is too wide to fit on the screen.
In summary, POS tagging is a key task in NLP that helps machines understand text by labeling words with their part of speech. We have learned on how to use Spacy in Google Colab environment to extract POS tags, count the frequency, and visualize the results.