Grid search and random search
Created | |
---|---|
Tags | Metrics |
Grid search builds a model for every combination of hyperparameters specified and evaluates each model.
Random search is similar to grid search but only uses a random subset of all parameter combinations.
Grid search and random search are techniques used for hyperparameter tuning in machine learning models, particularly in the context of optimizing the performance of a model by finding the best combination of hyperparameters.
Grid Search:
- Definition: Grid search is an exhaustive search technique where a predefined set of hyperparameters is specified, and the model is trained and evaluated for all possible combinations of hyperparameters within the predefined grid.
- Implementation:
- Grid search iterates over all combinations of hyperparameters, making it computationally expensive, especially for large hyperparameter spaces.
- For each combination, the model is trained and evaluated using cross-validation to estimate its performance.
- The combination of hyperparameters that yields the best performance metric (e.g., accuracy, F1 score) is selected as the optimal hyperparameter configuration.
- Pros:
- Exhaustive search guarantees that the best hyperparameters within the specified grid are found.
- It is straightforward to implement and understand.
- Cons:
- Grid search can be computationally expensive, especially for large hyperparameter spaces.
- It may not be efficient when hyperparameters have a minor impact on model performance, as it equally explores all combinations.
Random Search:
- Definition: Random search is a technique where hyperparameters are randomly sampled from predefined distributions, and the model is trained and evaluated for a fixed number of random combinations of hyperparameters.
- Implementation:
- Random search samples hyperparameters randomly from predefined distributions (e.g., uniform, normal) for a specified number of iterations.
- For each random combination, the model is trained and evaluated using cross-validation to estimate its performance.
- The combination of hyperparameters that yields the best performance metric (e.g., accuracy, F1 score) is selected as the optimal hyperparameter configuration.
- Pros:
- Random search is more efficient than grid search when exploring large hyperparameter spaces because it samples hyperparameters randomly.
- It is less computationally expensive than grid search, especially when hyperparameters have different impacts on model performance.
- Cons:
- Random search may not guarantee finding the optimal hyperparameters, but it often finds good solutions with fewer iterations compared to grid search.
Comparison:
- Grid Search vs. Random Search:
- Grid search exhaustively explores all combinations of hyperparameters within a predefined grid, while random search randomly samples hyperparameters.
- Grid search guarantees finding the best hyperparameters within the specified grid, while random search may find good solutions with fewer iterations.
- Random search is more efficient when exploring large hyperparameter spaces, while grid search is more suitable for smaller hyperparameter spaces or when the impact of hyperparameters is known.
Python Implementation (using scikit-learn):
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# Create a random forest classifier
rf_classifier = RandomForestClassifier()
# Define hyperparameter grids
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
# Grid search
grid_search = GridSearchCV(rf_classifier, param_grid, cv=5)
grid_search.fit(X_train, y_train)
# Random search
random_search = RandomizedSearchCV(rf_classifier, param_distributions=param_grid, n_iter=10, cv=5)
random_search.fit(X_train, y_train)
# Best parameters and score from grid search
print("Grid Search - Best Parameters:", grid_search.best_params_)
print("Grid Search - Best Score:", grid_search.best_score_)
# Best parameters and score from random search
print("Random Search - Best Parameters:", random_search.best_params_)
print("Random Search - Best Score:", random_search.best_score_)
In this example, we perform grid search and random search for hyperparameter tuning of a random forest classifier using scikit-learn's GridSearchCV
and RandomizedSearchCV
classes, respectively. We define a hyperparameter grid, specify the number of iterations for random search, and fit the search objects to the training data. Finally, we print the best parameters and scores obtained from both grid search and random search.