'ML in Finance' Series: Time Series Classification for Momentum Analysis

Introduction

Simplifiedzone

~7 min read · August 7, 2025 (Updated: August 7, 2025) · Free: No

Welcome to this tutorial on using classification models for time series momentum analysis. In the last article, we focused on predicting the value of future returns using regression. Now, we will shift our focus to predicting the direction of the market. Will it go up or down? This is a classic classification problem and a fundamental skill for any quant. This tutorial will guide you through building a logistic regression model, properly scaling your data, and evaluating the model's performance.

Learning Objectives

By the end of this tutorial, you will be able to:

Transform a regression problem into a binary classification problem.
Understand and apply feature scaling as a critical preprocessing step.
Build and train a logistic regression model to predict market direction.
Evaluate a classifier's performance using accuracy, a confusion matrix, and the ROC curve.
Interpret classification metrics to understand the true predictive power of a model.

Ready to move from theory to practice? Build your Quantitative Finance portfolio with my hands-on, end-to-end projects. Discover both free and paid options on Gumroad.

Prerequisites

A foundational understanding of Python programming.
Familiarity with the concepts from the regression part of this tutorial series.

Core Concepts (The Theory)

Let's review the key concepts for this classification task.

Classification: A type of supervised learning where the goal is to predict a categorical label rather than a continuous value. Our labels will be "market up" (+1) and "market down" (-1).
Logistic Regression: A fundamental algorithm used for binary classification. Unlike linear regression which outputs a value, logistic regression outputs a probability (between 0 and 1) . A threshold (typically 0.5) is then used to convert this probability into a class prediction.
Feature Scaling: The process of standardizing the range of input features. Many machine learning algorithms, including logistic regression, perform better when numerical input features are on a similar scale. We'll use Min-Max scaling, which scales the data to a fixed range, usually 0 to 1 (or -1 to 1).
Confusion Matrix: A powerful tool for evaluating a classifier. It's a table that breaks down the predictions into four categories:
True Positives (TP): Correctly predicted positive cases.
True Negatives (TN): Correctly predicted negative cases.
False Positives (FP): Incorrectly predicted positive cases (a "false alarm").
False Negatives (FN): Incorrectly predicted negative cases (a "miss").
ROC Curve (Receiver Operating Characteristic): A graph that shows the performance of a classification model at all classification thresholds. It plots the True Positive Rate against the False Positive Rate. The Area Under the Curve (AUC) is a single number summary of the ROC curve's performance. A model with an AUC of 0.5 is no better than random guessing, while a model with an AUC of 1.0 is a perfect classifier.

Step-by-Step Walkthrough (The Hands-On Practice)

Let's begin building our classification model.

Step 1: Import Libraries and Fetch Data

We'll start by importing all the necessary libraries and fetching our data, just as we did for the regression task.

import numpy as np import pandas as pd import yfinance as yf import matplotlib.pyplot as plt from sklearn import linear_model from sklearn.metrics import mean_squared_error, r2_score from sklearn.model_selection import train_test_split, GridSearchCV, RepeatedKFold from sklearn.preprocessing import MinMaxScaler # Getting historical market data from SPY (ETF) df = yf.download("SPY", start="2000-01-01", end="2025-01-01")

Step 2: Create Features

The feature engineering process remains the same. We'll create our lagged return features.

df["Ret"] = df["Close"].pct_change() name = "Ret" df["Ret10_i"] = (df[name].rolling(10).apply(lambda x: 100 * ((np.prod(1 + x)) ** (1 / 10) - 1))) df["Ret25_i"] = (df[name].rolling(25).apply(lambda x: 100 * ((np.prod(1 + x)) ** (1 / 25) - 1))) df["Ret60_i"] = (df[name].rolling(60).apply(lambda x: 100 * ((np.prod(1 + x)) ** (1 / 60) - 1))) df["Ret120_i"] = (df[name].rolling(120).apply(lambda x: 100 * ((np.prod(1 + x)) ** (1 / 120) - 1))) df["Ret240_i"] = (df[name].rolling(240).apply(lambda x: 100 * ((np.prod(1 + x)) ** (1 / 240) - 1))) # Clean up the dataframe del df["Open"] del df["Close"] del df["High"] del df["Low"] del df["Volume"] df = df.dropna()

Step 3: Define the Classification Target Variable

This is a key step. Instead of predicting the future return value, we will now predict its sign. We'll convert the Ret25 column into a binary Output column, where 1 represents a positive or zero return and -1 represents a negative return.

# First, create the shifted return column as before df["Ret25"] = df["Ret25_i"].shift(-25) df = df.dropna() # Now, create the binary Output column df["Output"] = df["Ret25"].apply(np.sign) # We no longer need the continuous return column del df["Ret25"] df.tail(10)

This code transforms our target variable into a categorical one, suitable for a classification model.

Step 4: Split and Scale the Data

Before training, we'll split our data into training and testing sets. Then, we'll apply Min-Max scaling to our features. It's crucial to fit the scaler only on the training data to avoid data leakage from the test set.

X, y = df.iloc[:, 0:-1], df.iloc[:, -1] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=int(len(y) * 0.5), shuffle=False ) # Initialize and fit the scaler on the training data scaler_input = MinMaxScaler(feature_range=(-1, 1)) scaler_input.fit(X_train) # Transform both the training and testing data X_train = scaler_input.transform(X_train) X_test = scaler_input.transform(X_test)

This ensures our model learns the scaling parameters only from the training data and then applies that same transformation to the unseen test data.

Step 5: Build and Train the Logistic Regression Model

Now we are ready to train our classifier.

from sklearn.linear_model import LogisticRegression # All parameters not specified are set to their defaults logisticRegr = LogisticRegression() logisticRegr.fit(X_train, y_train) # Make predictions on the test set predictions = logisticRegr.predict(X_test)

Step 6: Evaluate the Classification Model

Simply looking at accuracy can be misleading. We need to use proper evaluation metrics for classification.

Accuracy Score

First, let's get the overall accuracy.

score = logisticRegr.score(X_test, y_test) print("Accuracy Score: {0}".format(score))
Accuracy Score: 0.7071713147410359

While our model achieves ~70% accuracy, this number can be deceptive if the classes are imbalanced (i.e., the market goes up more often than it goes down).

Confusion Matrix

A confusion matrix gives us a much clearer picture of the model's performance.

cm = metrics.confusion_matrix(y_test, predictions) plt.figure(figsize=(9, 9)) sns.heatmap(cm, annot=True, fmt=".3f", linewidths=0.5, square=True, cmap="Blues_r") plt.ylabel("Actual label") plt.xlabel("Predicted label") all_sample_title = "Accuracy Score: {0}".format(score) plt.title(all_sample_title, size=15)

This visualization shows us exactly where the model is making correct and incorrect predictions for each class. We can see that the model overwhelmingly predicts the "up" class, which explains the high accuracy score.

ROC Curve and AUC

Finally, the ROC curve helps us assess the model's ability to distinguish between classes across all possible thresholds.

fpr, tpr, thresholds = metrics.roc_curve(y_test, predictions) roc_auc = metrics.auc(fpr, tpr) display = metrics.RocCurveDisplay( fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name="Logistic Regression" ) display.plot() plt.show()

The Area Under the Curve (AUC) for our model is close to 0.5, indicating that its predictive ability is not much better than random chance.

This isn't a failure; it's a successful test that has invalidated a hypothesis. As a quant, this is a valuable finding that directs your next steps. Here's what to do next:

1. Feature Engineering

The most likely culprit is that your features (past returns) don't contain enough "signal" for a simple model to use. Before trying more complex models, you should try creating better features.

Volatility Measures: Calculate rolling standard deviation of returns. High or low volatility periods might be more predictable.
Interaction Terms: Create features that are products of existing features (e.g., Ret10_i * Ret60_i).
Technical Indicators: Introduce classic indicators like Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), or Bollinger Bands.

2. Try More Complex Models

Since our model is underfitting, the next logical step is to use models that can capture complex relationships.

Tree-Based Models: These are often very effective in finance.
Random Forest: An ensemble of decision trees that is robust and less prone to overfitting.
Gradient Boosting (like XGBoost or LightGBM): Often state-of-the-art for tabular data. They build trees sequentially, with each new tree correcting the errors of the previous one.
Support Vector Machines (SVM): A powerful classifier that works by finding the optimal hyperplane to separate the classes.

3. Address Class Imbalance

Confusion matrix showed the model heavily favored the majority class (market up). This imbalance can prevent the model from learning to predict the minority class (market down).

Adjust Class Weights: In our model, you can give a higher weight to the minority class to force the model to pay more attention to it. Most scikit-learn classifiers have a class_weight='balanced' parameter.

Resampling Techniques:

Oversampling (e.g., SMOTE): Synthetically create more examples of the minority class.
Undersampling: Remove examples from the majority class.

In this tutorial, we reframed our momentum prediction problem from regression to classification. You learned how to preprocess data using feature scaling and how to build a logistic regression model. Most importantly, you learned how to critically evaluate a classifier's performance. You used tools like the confusion matrix and ROC curve. This revealed that a high accuracy score doesn't always mean a good model.

Originally published at http://simplified-zone.com on August 7, 2025.