Our first real classifier — and the tools to know whether it's actually working. Both are critical foundations for the rest of the course.
Bayes' Rule tells us how to compute the probability of a hypothesis (like "this email is spam") given some observed evidence (the words in the email).
The "naive" part means we assume all features (words) are conditionally independent given the class. In reality, word co-occurrences aren't independent ("machine" often appears with "learning"), but this assumption makes the math tractable.
Without this assumption, computing P(X|Y) = P(word1, word2, ..., wordN | Y) requires calculating a joint probability over thousands of words — computationally impossible. With independence, it becomes a simple product.
A table that compares what the model predicted against what the labels actually are. Each cell shows how many times the model got it right or wrong — and how.
Accuracy alone can be misleading. If 99% of emails are not-spam, a model that always predicts "not-spam" gets 99% accuracy but is completely useless! The confusion matrix exposes this.
| Predicted: Positive | Predicted: Negative | |
| Actual: Positive | TP ✓ True Positive |
FN ✗ False Negative |
| Actual: Negative | FP ✗ False Positive |
TN ✓ True Negative |
The ROC (Receiver Operating Characteristic) curve plots True Positive Rate (Recall) vs. False Positive Rate as you vary the classification threshold. AUC = Area Under the Curve.
Classifiers like Logistic Regression output a probability (e.g., 0.73 = 73% spam). We need a threshold (e.g., 0.5) to convert to a label. The ROC curve shows which threshold gives the best trade-off.
Q1. In text classification with Naive Bayes, what is P(class | document) called?
Q2. Which statement does NOT hold true for Naive Bayes?
Q3. Accuracy may not be informative when evaluating highly imbalanced data — True or False?
Q4. Which of these statements about the confusion matrix is CORRECT?