Feature Crosses
Created | |
---|---|
Tags | Basic Concepts |
Cross Feature (Feature Crosses)
Cross features, also known as feature crosses, are a powerful machine learning technique used to capture interactions between different features in a dataset. By combining two or more features, a new synthetic feature is created that can provide a model with additional information it wouldn't have from the individual features alone. This is particularly useful in linear models, where non-linear relationships cannot be captured directly.
How It Works
Feature crosses involve taking the Cartesian product of two or more feature sets. For example, if you have two features, A
and B
, a feature cross would create a new feature AB
that represents the combined effect of A
and B
. This is especially useful for categorical features but can be applied to numerical features as well.
Example: Consider a dataset with two features: country
and device_type
. A feature cross of country
and device_type
could create new combined features like country_device_type
with values like USA_mobile
, USA_desktop
, Canada_mobile
, etc.
Cross feature. Source: developers.google.com

Cross features are also very common in recommendation systems. In practice, we can also use wide and deep architecture to combine many dense features and sparse features. You can see one concrete example in section Wide and Deep [sec-wide-and-deep].
Pros and Cons
Pros:
- Model Interpretability: Feature crosses can make models more interpretable by explicitly representing interactions between features.
- Improved Accuracy: They can significantly improve the model's accuracy, especially for linear models, by introducing non-linearity and capturing interactions between features.
- Simplicity: Easily implemented and integrated into most machine learning pipelines without requiring complex modifications to the model architecture.
Cons:
- Dimensionality Increase: The main downside of using feature crosses is the potential explosion in feature space, leading to increased model complexity and risk of overfitting.
- Computation Cost: With a large number of features, the computational cost to create and manage feature crosses can be significant.
- Selection of Features: Determining which features to cross can be non-trivial and may require domain knowledge or empirical testing.
Applications
Feature crosses are widely used in linear regression, logistic regression, and other linear models to introduce non-linearity. They are particularly popular in applications like recommender systems, where capturing the interaction between items and user properties is crucial, and in any domain where the interaction between features significantly impacts the outcome.
Python Implementation
Implementing a basic feature cross between two categorical features can be straightforward. Here's a simple example in Python:
def create_feature_cross(features1, features2):
"""
Creates a feature cross between two lists of categorical features.
Args:
- features1: List of values for the first feature.
- features2: List of values for the second feature.
Returns:
- A list containing the feature crosses.
"""
return [f"{f1}_{f2}" for f1 in features1 for f2 in features2]
# Example usage
countries = ['USA', 'Canada']
device_types = ['mobile', 'desktop']
crossed_features = create_feature_cross(countries, device_types)
print(crossed_features)
This function simply iterates over each feature in features1
and features2
and combines them, creating a new list of crossed features. In practice, for handling numerical features or more complex scenarios (like high cardinality categorical features), you might use more sophisticated techniques or tools like TensorFlow's feature_column
API, which supports feature crosses directly.
