Text Classification Basics Part 3: Confusion Matrix

2 min readSep 25, 2024

When testing a classification problem, there are two key categories to consider: actual and predicted.

Actual refers to the true label of a test sample, matching real-world conditions (e.g., a text message is SPAM).
Predicted refers to the output generated by the machine learning model (e.g., the model predicts SPAM).

After testing, the results will fall into one of four classes:
1. Correctly classified to class 1: True HAM
2. Correctly classified to class 2: True SPAM
3. Incorrectly classified to class 1: False HAM
4. Incorrectly classified to class 2: False SPAM

Confusion Matrix for SPAM Message Detection

Terminology:

True Positive (TP): When the actual HAM is correctly predicted as HAM by the model.
False Negative (FN): When the actual HAM is incorrectly predicted as SPAM.
False Positive (FP) (Type 1 Error): When the actual SPAM is incorrectly predicted as HAM.
True Negative (TN) (Type 2 Error): When the actual SPAM is correctly predicted as SPAM.

Example: Confusion Matrix

Using the confusion matrix above, let’s calculate the accuracy rate and error rate.

Accuracy Rate: This measures how often the model makes correct predictions. It is calculated as:

Error Rate: This measures the frequency of incorrect predictions. It is calculated as:

In summary, the accuracy rate tells us how well the model performs overall, while the error rate highlights how often the model makes incorrect predictions.

Text Classification Basics Part 3: Confusion Matrix

Terminology:

Example: Confusion Matrix

Written by Chitra's Playground