The Complete Guide to Time Series ML

0 0 6 minutes read

The success of machine learning pipelines depends on feature engineering as their key foundation. Two of the most powerful ways to handle time series data are smoothing features and folding features, depending on your advanced techniques. The ability to use these methods will improve the performance of your sales forecasting model, stock price forecasting, and demand planning functions.

This guide explains the lag and rolling features by showing their importance and providing Python implementation methods and potential implementation challenges with working code examples.

What is Feature Engineering in Time Series?

Time series feature engineering creates new input variables through the process of transforming raw temporal data into features that enable machine learning models to detect temporal patterns more effectively. Time series data differs from static datasets in that it maintains a sequential structure, requiring viewers to understand that previous observations influence future ones.

Common machine learning models XGBoost, LightGBM, and Random Forests do not have built-in real-time processing capabilities. The system needs certain indicators that need to show past events that have happened before. The implementation of lag features and rolling features serves this purpose.

What are the characteristics of Lag?

The lag factor is simply the past value of the variable that has been moved forward in time until it matches the current data point. Today’s sales forecast relies on different sources of sales information, including yesterday’s sales data and seven-day and thirty-day sales data.

Why Lag Features Matter

They represent connections between different periods in which variables reflect their past values.
The method allows seasonal and cyclical patterns to be encoded without requiring complex transformations.
The method provides simple calculations and clear results.
The program works with all machine learning models that use tree structures and linear methods.

Using LAG Features in Python

import pandas as pd
import numpy as np
# Create a sample time series dataset
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)
# Create lag features
df['lag_1'] = df['sales'].shift(1)
df['lag_3'] = df['sales'].shift(3)
df['lag_7'] = df['sales'].shift(7)
print(df.head(12))

Output:

The first appearance of NaN values indicates the type of data loss that occurs due to slicing. This factor becomes important in determining the number of lags to be created.

Choosing the Right Lag Values

The selection process for optimal lags requires scientific methods that eliminate random selection as an option. The following methods have shown effective results in practice:

Domain information is very helpful, like weekly sales data? Add lags at 7, 14, 28 days. Hourly power data? Try for 24 to 48 hours.
The Autocorrelation Function ACF enables users to determine which lags show significant correlations with their target variable by using its statistical detection method.
The model will identify which lags hold the highest value after completing the training process.

What are the features of Rolling (Window)?

Wrapping features act as window features that work by moving through time to calculate variable values. The program provides aggregate statistics, including the mean, median, standard deviation, minimum, and maximum values for the last N periods instead of showing you a single past value.

Why Are Rolling Features Important?

The following features provide excellent capabilities to perform their designated tasks:

The process eliminates noise factors while revealing underlying growth patterns.
The system allows users to see temporary price fluctuations that occur during certain periods.
The system allows users to see temporary price fluctuations that occur during certain periods.
The program identifies unusual behavior when current values deviate from the default winding ratio.

The following figures establish their existence as a common practice in rolling windows:

The most common trend-smoothing method uses the rolling mean as its main method.
The convolutional standard deviation function calculates the degree of variability that exists within a specified time window.
The continuous minimum and maximum functions identify the highest and lowest values that occur within a defined time/period.
The rolling median function provides accurate results for data that includes outliers and exhibits high levels of noise.
The rolling total function helps track the total volume or total count over time.

Using Rolling Features in Python

import pandas as pd
import numpy as np
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)
# Rolling features with window size of 3 and 7
df['roll_mean_3'] = df['sales'].shift(1).rolling(window=3).mean()
df['roll_std_3'] = df['sales'].shift(1).rolling(window=3).std()
df['roll_max_3'] = df['sales'].shift(1).rolling(window=3).max()
df['roll_mean_7'] = df['sales'].shift(1).rolling(window=7).mean()
print(df.round(2))

Output:

The .shift(1) function must be executed before the .rolling() function because it creates an important connection between the two functions. The program needs this process because it will create rolling calculations that rely exclusively on historical data without using any current data.

Combining Lag and Rolling Features: A Production-Ready Example

For a real machine learning workflow time series, the researchers created their own hybrid feature set, which includes both lag features and rolling features. We provide you with a complete feature of engineering work, which you can use for any project.

import pandas as pd
import numpy as np
def create_time_features(df, target_col, lags=[1, 3, 7], windows=[3, 7]):
"""
Create lag and rolling features for time series ML.
Parameters:
df : DataFrame with datetime index
target_col : Name of the target column
lags : List of lag periods
windows : List of rolling window sizes
Returns:
DataFrame with new features
"""
df = df.copy()
# Lag features
for lag in lags:
df[f'lag_{lag}'] = df[target_col].shift(lag)
# Rolling features (shift by 1 to avoid leakage)
for window in windows:
shifted = df[target_col].shift(1)
df[f'roll_mean_{window}'] = shifted.rolling(window).mean()
df[f'roll_std_{window}'] = shifted.rolling(window).std()
df[f'roll_max_{window}'] = shifted.rolling(window).max()
df[f'roll_min_{window}'] = shifted.rolling(window).min()
return df.dropna() # Drop rows with NaN from lag/rolling

# Sample usage
np.random.seed(0)
dates = pd.date_range('2024-01-01', periods=60, freq='D')
sales = 200 + np.cumsum(np.random.randn(60) * 5)
df = pd.DataFrame({'sales': sales}, index=dates)
df_features = create_time_features(df, 'sales', lags=[1, 3, 7], windows=[3, 7])
print(f"Original shape: {df.shape}")
print(f"Engineered shape: {df_features.shape}")
print(f"nFeature columns:n{list(df_features.columns)}")
print(f"nFirst few rows:n{df_features.head(3).round(2)}")

Output:

Common Mistakes and How to Avoid Them

The most serious error in time series feature engineering occurs when data leakage, which exposes incoming data to test features, leads to misleading model performance.

Important mistakes to watch out for:

The process needs a .shift(1) command before starting the .rolling() function. The current view will be part of the scrolling window because scrolling requires the first view to be removed.
Data loss occurs with the addition of lags because each lag creates NaN rows. A dataset of 100 rows will lose 30% of its data because 30 lags require 30 NaN rows to create.
The process requires different window size tests because different features require different window sizes. The process requires testing short windows, ranging from 3 to 5, and long windows, ranging from 14 to 30.
The production environment requires you to calculate dynamic and comfort factors from real historical data, which you will use during the decision period instead of using your training data.

When to use Lag vs. Rolling Features

Use Case	Recommended Features
Strong automatic integration into data	Lag Features (lag-1, lag-7)
A noisy signal, needs smoothing	Wrapping means
Seasonal patterns (weekly)	Lag-7, lag-14, lag-28
Trend detection	Wrap means over tall windows
A mysterious discovery	Deviation from the rolling mean
Captures diversity / risk	Rolling standard deviation, rolling range

The conclusion

The time series machine learning infrastructure uses lagging features and moving features as its key components. These two methods establish a path from raw sequential data to the structured data format required by machine learning models for their training process. Methods have the highest predictive accuracy when users perform them through accurate data management and windowing methods, as well as their understanding of the context of a particular field.

The best part? They provide clear explanations that require minimal computing resources and work with any machine learning model. These features will benefit you whether you’re using XGBoost to get the required prediction, LSTM to find anomalies, or linear regression for basic models.

Gen AI Intern at Analytics Vidhya
Department of Computer Science, Vellore Institute of Technology, Vellore, India

I currently work as a Gen AI Intern at Analytics Vidhya, where I contribute to innovative AI-driven solutions that empower businesses to use data effectively. As a final year Computer Science student at Vellore Institute of Technology, I bring a strong foundation in software development, data analysis, and machine learning to my role.

Feel free to contact me at [email protected]