Logistic Regression

Created
TagsBasic Concepts

Introduction

Logistic Regression is a classification algorithm to predict discrete set of classes.

Logistic Regression is a statistical method used for binary classification tasks, which can also be extended to multiclass classification. It models the probability that a given input belongs to a particular category. Despite its name suggesting a regression analysis, logistic regression is a classification algorithm, making it a fundamental tool in the toolbox of machine learning.

How Logistic Regression Works

Logistic Regression uses the logistic function (also known as the sigmoid function) to convert linear combinations of features into values between 0 and 1, which are interpreted as probabilities. The logistic function is defined as:

σ(z)=11+ez \sigma(z) = \frac{1}{1 + e^{-z}} 

where \(z\) is the linear combination of the input features (\(X\)) and weights (\(w\)), plus a bias term (\(b\)), i.e., z=wTX+bz = w^TX + b). The output of the sigmoid function is then used to determine the probability of the input belonging to the positive class (usually denoted as "1").

Model Training

Training a logistic regression model involves finding the set of weights (\(w\)) that minimize a cost function, which is typically the binary cross-entropy loss, also known as the log loss, for binary classification tasks. The cost function is given by:

J(w)=1Ni=1N[yilog(y^i)+(1yi)log(1y^i)] J(w) = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] 

where \(N\) is the number of training examples, \(y_i\) is the actual label of the \(i\)th example, and \(y^i\hat{y}_i) is the predicted probability that the \(i\)th example belongs to the positive class. The optimization of \(J(w)\) is typically performed using gradient descent or other optimization algorithms.

Decision Boundary

The decision boundary in logistic regression is a property that makes it very intuitive. It's the set of points where the probability of belonging to the positive class is 50%, leading to a linear equation wTX+b=0w^TX + b = 0. For higher-dimensional data, this boundary can be a plane or a hyperplane.

Multiclass Classification

For multiclass classification tasks, logistic regression can be extended using techniques such as "One-vs-Rest" (OvR) or "One-vs-One" (OvO), or by using the softmax function in place of the sigmoid for the direct multiclass classification (Multinomial Logistic Regression).

Python Implementation Example

Implementing logistic regression for a binary classification task using scikit-learn is straightforward:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = (data.target != 0) * 1  # Convert to binary classification problem

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
print(f"Model accuracy: {accuracy * 100:.2f}%")

Pros and Cons of Logistic Regression

Pros:

Cons:

Logistic regression remains a popular choice for binary classification problems, especially as a baseline model, due to its simplicity, interpretability, and efficiency.

Comparison between linear regression and logistic regression

Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then mapped to two or more discrete classes

Log loss based on maximum likelihood estimation

Python code example for implementing Logistic Regression from scratch:

import numpy as np

class LogisticRegression:
    def __init__(self, learning_rate=0.01, num_iterations=1000):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def fit(self, X, y):
        # Initialize weights and bias to zeros
        self.weights = np.zeros(X.shape[1])
        self.bias = 0

        for i in range(self.num_iterations):
            # Linear combination
            model = np.dot(X, self.weights) + self.bias
            # Sigmoid activation
            predictions = self.sigmoid(model)
            # Compute gradient
            dw = (1 / X.shape[0]) * np.dot(X.T, (predictions - y))
            db = (1 / X.shape[0]) * np.sum(predictions - y)
            # Update weights and bias
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        model = np.dot(X, self.weights) + self.bias
        predictions = self.sigmoid(model)
        return [1 if i > 0.5 else 0 for i in predictions]

# Assuming X_train, X_test, y_train, y_test are already defined
model = LogisticRegression(learning_rate=0.01, num_iterations=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Evaluate accuracy
accuracy = np.mean(predictions == y_test)
print(f"Model accuracy: {accuracy}")

This class initializes with a learning rate and a number of iterations. The fit method trains the model using gradient descent, and the predict method outputs predictions for given input features. Note: This is a basic example and lacks features like regularization and convergence checking for simplicity.

Multiclass Logistic Regression

Procedure:

  1. Divide the problem into n+1 binary classification problems
  1. For each class...
  1. Predict the prbability the observations are in that single class
  1. Prediction = <math> max (probability of the classes)
  1. For each sub-problem, we select one class (Yes) and lump all the others into a second class (No). Then we take the class with the highest predicted value.

Softmax activation: