NLP Part 2: Stemming

Chitra's Playground
2 min readSep 6, 2024

--

Stemming is like trimming words down to their core. For example, words ending in “-ATIONAL” can often be reduced to their root form ending in “-ATE”. “Relational” becomes “relate” after stemming.

Unfortunately, Spacy doesn’t have a built-in stemmer. We’ll use NLTK for this. First, make sure you’ve installed the NLTK library in your environment (like Google Colab).

!pip install nltk
import nltk

NLTK offers several popular stemmers. I’ll use Snowball, Porter, and Lancaster in this example, but you can experiment with others if you prefer.

from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.lancaster import LancasterStemmer

Let’s choose a word to test our stemming techniques.

sample = ['eat', 'ate', 'eaten', 'run', 'running', 'runner', 'runs', 'basically', 'normally', 'easily', 'fairly', 'generous', 'generation', 'generously', 'generate']

Let’s start with the Porter stemmer. It’s a popular algorithm that works by adding or removing suffixes to find the word’s stem. This is helpful for finding relevant information. However, it sometimes creates stems that aren’t actual words. Here’s how to use it:

porter = PorterStemmer()

for word in sample:
print(word + ' ------> ' + porter.stem(word))
The Result of Stemming Using Porter Stemmer

Snowball stemmer is like an upgraded version of Porter. It can work with languages beyond English, including French, Spanish, Dutch, and even non-Roman scripts like Russian. Here’s how to use Snowball with English:

snowball = SnowballStemmer(language='english')

for word in sample:
print(word + ' -----> ' + snowball.stem(word))
The Result of Stemming Using Snowball Stemmer

Lancaster stemmer is the last one. It works by repeatedly shortening words, making it the most aggressive stemmer. This can sometimes lead to over-stemming, creating roots that don’t make sense. Here’s an example:

lancaster = LancasterStemmer()

for word in sample:
print(word + ' -----> ' + lancaster.stem(word))z
The Result of Stemming Using Lancaster Stemmer

Want to learn more about different stemming techniques? Check out this link.

Happy exploring!

--

--

Chitra's Playground
Chitra's Playground

Written by Chitra's Playground

Tech enthusiast with a passion for machine learning & eating chicken. Sharing insights on my learning journey.

No responses yet