NLP Part 4: Stop Words
Stop words are common words that are often removed from text before analysis, as they don’t provide much meaningful information. Stop words can be insignificant words, like “the,” “and,” and “in,” that need to be removed from text data to focus on more important terms.
print(nlp.Defaults.stop_words)
The code will show you a list of common words, known as stop words, that are often ignored in text analysis. You can easily add or remove words from this list. Let’s verify if “btw” is part of the stop words list before adding it.
nlp.vocab['btw'].is_stop
We looked through the list of stop words and couldn’t find “btw,” so we can go ahead and include it.
#add stop words
nlp.Defaults.stop_words.add('btw')
nlp.vocab['btw'].is_stop = True
nlp.vocab['btw'].is_stop
This code adds “btw” to the list of stop words. The first line tells the program to do this. The second line actually puts “btw” on the list. The last line checks if “btw” is now on the list. When you run it, it should say “True” because “btw” was added.
Now, let’s try removing “ca” from the list.
#remove stop words
nlp.Defaults.stop_words.remove('ca')
nlp.vocab['ca'].is_stop = False
nlp.vocab['ca'].is_stop
The first line says we want to take “ca” off the stop words list. The second line tells the program that “ca” is no longer a stop word. The third line checks if “ca” is still on the list. It should say “False” because we removed it.
As you can see, working with stop words in natural language processing is a straightforward process. By understanding how to add and remove stop words, you can customize the analysis of your text data to better suit your specific needs. This can be particularly useful when dealing with specialized domains or languages where certain words might be more or less relevant than in general-purpose text.