Text Classification Basics Part 3: Confusion Matrix

Chitra's Playground
2 min readSep 25, 2024

--

When testing a classification problem, there are two key categories to consider: actual and predicted.

  • Actual refers to the true label of a test sample, matching real-world conditions (e.g., a text message is SPAM).
  • Predicted refers to the output generated by the machine learning model (e.g., the model predicts SPAM).

After testing, the results will fall into one of four classes:
1. Correctly classified to class 1: True HAM
2. Correctly classified to class 2: True SPAM
3. Incorrectly classified to class 1: False HAM
4. Incorrectly classified to class 2: False SPAM

Confusion Matrix for SPAM Message Detection

Terminology:

  • True Positive (TP): When the actual HAM is correctly predicted as HAM by the model.
  • False Negative (FN): When the actual HAM is incorrectly predicted as SPAM.
  • False Positive (FP) (Type 1 Error): When the actual SPAM is incorrectly predicted as HAM.
  • True Negative (TN) (Type 2 Error): When the actual SPAM is correctly predicted as SPAM.

Example: Confusion Matrix

Using the confusion matrix above, let’s calculate the accuracy rate and error rate.

  • Accuracy Rate: This measures how often the model makes correct predictions. It is calculated as:
  • Error Rate: This measures the frequency of incorrect predictions. It is calculated as:

In summary, the accuracy rate tells us how well the model performs overall, while the error rate highlights how often the model makes incorrect predictions.

--

--

Chitra's Playground
Chitra's Playground

Written by Chitra's Playground

Tech enthusiast with a passion for machine learning & eating chicken. Sharing insights on my learning journey.