Three powerful linear classifiers, each with a different philosophy. Master these and you understand the DNA of modern ML.
| Algorithm | Core Idea | Output | Type |
|---|---|---|---|
| Perceptron | Find ANY line that separates classes | Hard label (±1) | Hard classification |
| Logistic Regression | Find a line + convert to probability | Probability [0,1] | Soft classification |
| SVM | Find the line with MAXIMUM margin | Hard label (±1) | Hard classification (max-margin) |
The sigmoid (logistic) function squishes any real number into the range [0, 1], making it interpretable as a probability.
Linear combination of features (θᵀX) can be any number from -∞ to +∞. For classification, we need a probability between 0 and 1. The sigmoid provides exactly this transformation while being mathematically derived from the log-odds ratio.
Unlike Naive Bayes (just counting), Logistic Regression must be optimized iteratively. We find θ that maximizes the log-likelihood of our training data.
The margin is the distance between the decision boundary and the nearest data points of each class (the "support vectors"). SVM finds the hyperplane that maximizes this margin.
The Perceptron finds any separating line — but there are infinitely many. Which is most reliable for new data? The one with the largest gap between classes, because it's most robust to noise and new examples near the boundary.
Sometimes data is not linearly separable. The kernel trick implicitly maps data to a higher-dimensional space where it IS separable — without ever computing the high-dimensional vectors.
Computing dot products in a huge feature space is expensive. The kernel function computes the inner product in the transformed space directly from the original space — efficient and powerful.
The Perceptron is a binary classifier that makes hard predictions (±1) and updates its weights whenever it makes a mistake, until all points are correctly classified.
The Perceptron is literally the building block of neural networks. Each "neuron" in a neural network IS a perceptron with a different activation function. Understanding it deeply means understanding deep learning.
Test point: x = [2, 1], true label y = +1
| Property | Perceptron | Logistic Regression | SVM |
|---|---|---|---|
| Output type | Hard (sign) | Soft (probability) | Hard (sign) |
| Unique solution? | No — any separating line | Yes (global optimum) | Yes — max-margin line |
| Handles non-separable? | No (won't converge) | Yes | Yes (soft-margin SVM) |
| Objective | Classify all points correctly | Maximize likelihood | Maximize margin |
| Optimization | Simple update rule | Gradient descent (iterative) | Quadratic programming (convex) |
| Type | Discriminative | Discriminative | Discriminative |
| Probabilistic? | No | Yes | No |
Q1. Which of the following is NOT a discriminative model?
Q2. What is the best way to select a threshold for Logistic Regression's sigmoid output?
Q3. The separating hyperplane produced by SVM and Perceptron are both unique — True or False?
Q4. For the Perceptron, it is typical to iterate through data one point at a time — True or False?
Classification is the entry point. But NLP research spans a rich ecosystem of structured tasks — NLI, SRL, coreference, MT, QA, relation extraction — that define the benchmarks used to evaluate every frontier model. Understanding them is essential for reading papers and doing research in 2026.
Task: Given a premise and a hypothesis, classify the relationship as ENTAILMENT NEUTRAL or CONTRADICTION.
Premise: "A woman is walking her dog in the park."
H1: "Someone is outside with an animal." → ENTAILMENT
H2: "The woman is a professional dog trainer." → NEUTRAL
H3: "There are no animals in the park." → CONTRADICTION
SRL answers the question: "Who did what to whom, where, when, and how?" It identifies the predicate and labels its semantic arguments using PropBank frame schema.
Sentence: "The scientist published a groundbreaking paper in Nature."
ARG0 (Agent/Publisher): The scientist
V (Predicate): published
ARG1 (Theme/Published): a groundbreaking paper
ARGM-LOC (Locative): in Nature
Dataset: OntoNotes 5.0, CoNLL-2009. Models: BERT + span-based SRL (Shi & Lin 2019) achieves >85 F1. Use in 2026: knowledge extraction pipelines, question answering, event detection, LLM evaluation for factual consistency.
Determines which mentions in a text refer to the same real-world entity. Critical for document understanding, summarization, and dialogue.
"Dr. Chen won the award. She had worked on the project for 10 years. It was finally recognized."
Clusters: {Dr. Chen, She} · {the project, It}
| Task | What It Does | Gold Dataset | SOTA Model (2024) | Metric |
|---|---|---|---|---|
| NLI | Premise → hypothesis relationship | MultiNLI, SNLI | DeBERTa-v3, GPT-4 | Accuracy |
| SRL | Who did what to whom | OntoNotes, CoNLL-09 | BERT + span predictor | F1 |
| Coreference | Which mentions = same entity | OntoNotes 5.0 | SpanBERT, LingMess | Avg F1 (MUC/B³/CEAF) |
| RE | Extract (entity, relation, entity) | TACRED, DocRED | REBEL (seq2seq) | F1 |
| MT | Translate between languages | WMT, FLORES | NLLB-200, GPT-4 | BLEU, chrF, COMET |
| QA (Extractive) | Find answer span in passage | SQuAD 2.0 | BERT, DeBERTa | EM / F1 |
| QA (Abstractive) | Generate answer from knowledge | NaturalQuestions, TriviaQA | RAG + LLM | EM / ROUGE |
| Summarization | Compress document to summary | CNN/DM, XSum | PEGASUS, GPT-4 | ROUGE-1/2/L |
Given: "Elon Musk founded SpaceX in 2002."
Extract: (Elon Musk, founder_of, SpaceX) and (SpaceX, founded_in, 2002)
Datasets: TACRED (42 relation types), DocRED (96 types, document-level), Re-TACRED (revised, cleaner labels)