Feature engineering is the process of selecting, transforming, and creating relevant features from raw data to improve the performance of machine learning models. It involves techniques such as handling missing data, scaling features, encoding categorical variables, creating interaction terms, and more. Example code:
# Feature engineering example using pandas and scikit-learn
import pandas as pd
from sklearn.preprocessing import StandardScaler
# Load the dataset
data = pd.read_csv('data.csv')
# Drop irrelevant columns
data = data.drop(['id', 'date', 'zipcode'], axis=1)
# Handle missing data# Fill missing values with the mean
data['column_name'].fillna(data['column_name'].mean(), inplace=True)
# Scale numeric features
scaler = StandardScaler()
data['numeric_feature'] = scaler.fit_transform(data['numeric_feature'].values.reshape(-1, 1))
# Encode categorical variables
data = pd.get_dummies(data, columns=['categorical_variable'])
# Create interaction terms
data['interaction_term'] = data['feature1'] * data['feature2']