Feature Crosses

Created
TagsBasic Concepts

Cross Feature (Feature Crosses)

Cross features, also known as feature crosses, are a powerful machine learning technique used to capture interactions between different features in a dataset. By combining two or more features, a new synthetic feature is created that can provide a model with additional information it wouldn't have from the individual features alone. This is particularly useful in linear models, where non-linear relationships cannot be captured directly.

How It Works

Feature crosses involve taking the Cartesian product of two or more feature sets. For example, if you have two features, A and B, a feature cross would create a new feature AB that represents the combined effect of A and B. This is especially useful for categorical features but can be applied to numerical features as well.

Example: Consider a dataset with two features: country and device_type. A feature cross of country and device_type could create new combined features like country_device_type with values like USA_mobile, USA_desktop, Canada_mobile, etc.

Cross feature. Source: developers.google.com

Cross features are also very common in recommendation systems. In practice, we can also use wide and deep architecture to combine many dense features and sparse features. You can see one concrete example in section Wide and Deep [sec-wide-and-deep].

Pros and Cons

Pros:

Cons:

Applications

Feature crosses are widely used in linear regression, logistic regression, and other linear models to introduce non-linearity. They are particularly popular in applications like recommender systems, where capturing the interaction between items and user properties is crucial, and in any domain where the interaction between features significantly impacts the outcome.

Python Implementation

Implementing a basic feature cross between two categorical features can be straightforward. Here's a simple example in Python:

def create_feature_cross(features1, features2):
    """
    Creates a feature cross between two lists of categorical features.

    Args:
    - features1: List of values for the first feature.
    - features2: List of values for the second feature.

    Returns:
    - A list containing the feature crosses.
    """
    return [f"{f1}_{f2}" for f1 in features1 for f2 in features2]

# Example usage
countries = ['USA', 'Canada']
device_types = ['mobile', 'desktop']
crossed_features = create_feature_cross(countries, device_types)
print(crossed_features)

This function simply iterates over each feature in features1 and features2 and combines them, creating a new list of crossed features. In practice, for handling numerical features or more complex scenarios (like high cardinality categorical features), you might use more sophisticated techniques or tools like TensorFlow's feature_column API, which supports feature crosses directly.