Skip to content

Ariel-Gal/predictStockMachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Price Predictor

A full-stack educational machine-learning project for predicting the next trading day's stock closing price. The project trains regression models on historical market data, exposes predictions through a Flask API, and provides a simple browser-based client for entering ticker symbols and viewing predicted price movement.

Disclaimer
This project is intended for learning, experimentation, and portfolio demonstration only. It is not financial advice and should not be used as the sole basis for investment decisions.

Table of Contents

Overview

The system predicts the next-day closing price of a stock using historical OHLCV data, S&P 500 market movement, and technical indicators. The default training flow creates one pooled global model across selected tickers and saves it as GLOBAL.pkl. The web client then calls the API with global=true and uses that global model for predictions.

The project also supports training separate per-ticker models such as AAPL.pkl, MSFT.pkl, and TSLA.pkl.

Key Features

  • Next trading day stock closing-price prediction.
  • Historical market-data download with yfinance.
  • S&P 500 daily return as a market-context feature.
  • Technical indicators including moving averages, RSI, Bollinger Bands, volume ratios, spreads, and short-term returns.
  • Default Quantile Gradient Boosting model with prediction ranges.
  • Optional MLP neural-network regressor.
  • Global pooled model across many tickers, with compact ticker identity hash features.
  • Optional per-ticker model training.
  • Flask API with CORS support.
  • Lightweight static HTML/CSS/JavaScript client.
  • Model evaluation using MAE, RMSE, and Pinball Loss for quantile models.

Tech Stack

Layer Technologies
Machine Learning Python, scikit-learn, NumPy, pandas
Market Data yfinance
Backend API Flask, Flask-CORS
Frontend HTML, CSS, JavaScript
Model Storage Pickle files (.pkl)
Optional Tuning Optuna

Project Structure

predictStockMachineLearning-main/
├── Client/
│   ├── CSS/
│   │   └── style.css
│   ├── JS/
│   │   └── script.js
│   └── index.html
├── ModelTraining/
│   ├── features.py
│   ├── model.py
│   ├── predict.py
│   └── train.py
├── Server/
│   ├── requirements.txt
│   └── server.py
├── requirements.txt
├── .gitignore
└── README.md

Main Components

Path Purpose
ModelTraining/features.py Builds the feature set used during both training and prediction.
ModelTraining/model.py Contains model wrappers and metric functions.
ModelTraining/train.py Trains global or per-ticker models and saves them as .pkl files.
ModelTraining/predict.py Loads trained models and generates next-day predictions.
Server/server.py Exposes the prediction API on localhost:8080.
Client/index.html Browser UI for submitting ticker symbols.
Client/JS/script.js Calls the Flask API and renders prediction results.

How It Works

  1. Historical stock data is downloaded from Yahoo Finance through yfinance.
  2. S&P 500 historical data is downloaded and converted into daily returns.
  3. Technical indicators are calculated from each ticker's historical price and volume data.
  4. The training script creates supervised examples where today's features are mapped to tomorrow's closing price or tomorrow's return.
  5. A model is trained and saved under ModelTraining/models/.
  6. The Flask server loads the trained model and exposes prediction endpoints.
  7. The web client sends ticker requests to the API and displays current price, predicted price, expected change, and model metrics.

Getting Started

Prerequisites

  • Python 3.10 or newer recommended.
  • Internet connection for downloading market data.
  • A modern browser for the frontend client.

1. Clone the Repository

git clone <your-repository-url>
cd predictStockMachineLearning-main

2. Create and Activate a Virtual Environment

On macOS/Linux:

python -m venv .venv
source .venv/bin/activate

On Windows PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1

3. Install Dependencies

pip install -r requirements.txt

4. Train a Demo Global Model

This trains one global model on a small demo set: AAPL, MSFT, GOOGL, and TSLA.

python ModelTraining/train.py --demo --target return

The trained model is saved to:

ModelTraining/models/GLOBAL.pkl

5. Start the API Server

python Server/server.py

The API will run at:

http://localhost:8080

6. Open the Web Client

Open this file directly in your browser:

Client/index.html

Enter a ticker symbol such as AAPL, MSFT, GOOGL, or TSLA and click Predict.

Training Models

Train the Default Global Model

By default, if no --tickers or --demo flag is provided, the script attempts to train on the full S&P 500 list.

python ModelTraining/train.py

For a faster demo run:

python ModelTraining/train.py --demo

Train a Global Model on Specific Tickers

python ModelTraining/train.py --tickers AAPL MSFT NVDA AMZN --global-model

Train Separate Per-Ticker Models

python ModelTraining/train.py --tickers AAPL MSFT TSLA --per-ticker

This creates files such as:

ModelTraining/models/AAPL.pkl
ModelTraining/models/MSFT.pkl
ModelTraining/models/TSLA.pkl

Train on Tomorrow's Return Instead of Tomorrow's Price

python ModelTraining/train.py --demo --target return

Training on return can sometimes produce more stable behavior than predicting absolute prices directly. During prediction, the return is converted back into an estimated price.

Use the MLP Neural Network Model

python ModelTraining/train.py --demo --model mlp

Custom hidden layers can be passed as a comma-separated list:

python ModelTraining/train.py --demo --model mlp --mlp-hidden 64,32 --mlp-max-iter 1200

Use Walk-Forward Validation

python ModelTraining/train.py --tickers AAPL MSFT --per-ticker --walk-forward

Tune Gradient Boosting Hyperparameters with Optuna

python ModelTraining/train.py --demo --optuna-trials 25

Optuna is included in the root requirements.txt. Hyperparameter tuning is currently supported for the Quantile Gradient Boosting model.

Running the API Server

Start the server from the project root:

python Server/server.py

The server exposes:

GET http://localhost:8080/health
GET http://localhost:8080/stock?ticker=AAPL&global=true

The server loads models from:

ModelTraining/models/

Using the Web Client

The frontend is a static client located in Client/index.html. It sends requests to:

http://localhost:8080/stock?ticker=<TICKER>&global=true

Because the client uses global=true, make sure ModelTraining/models/GLOBAL.pkl exists before using the UI.

API Reference

Health Check

GET /health

Example response:

{
  "status": "ok"
}

Predict Stock Price

GET /stock?ticker=AAPL&global=true

Query parameters:

Parameter Required Description
ticker Yes Stock ticker symbol, for example AAPL.
global No Use the global model when set to true, 1, yes, or y. If omitted, the server attempts to load a per-ticker model.

Example response:

{
  "ticker": "AAPL",
  "last_close": 195.64,
  "last_date": "2026-06-08",
  "prediction": 197.21,
  "range_low": 192.10,
  "range_high": 201.45,
  "change": 1.57,
  "change_pct": 0.80,
  "mae": 3.42,
  "rmse": 4.91
}

Response fields:

Field Description
ticker Normalized ticker symbol.
last_close Latest available closing price.
last_date Date of the latest available market data.
prediction Predicted next-day closing price.
range_low Lower quantile prediction, when available.
range_high Upper quantile prediction, when available.
change Difference between prediction and latest close.
change_pct Percentage change between prediction and latest close.
mae Mean Absolute Error measured during validation.
rmse Root Mean Squared Error measured during validation.

Modeling Details

Default Model

The default model is a Quantile Gradient Boosting regressor. It trains separate models for multiple quantiles, usually:

0.1, 0.5, 0.9

The median quantile (0.5) is used as the main prediction. The lower and upper quantiles provide an estimated prediction range.

Optional Model

The project also includes a simple MLP regressor based on scikit-learn's MLPRegressor. It uses feature scaling and supports configurable hidden layers.

Features

The default feature set includes:

  • Close
  • SP500_Return
  • SMA_5, SMA_20, SMA_50
  • EMA_5, EMA_20, EMA_50
  • RSI_14
  • BB_Upper_20, BB_Lower_20
  • Volume, Volume_MA_20, Volume_Ratio
  • High_Low_Spread
  • Return_1d, Return_3d, Return_5d

For global models, additional ticker hash features are added by default so that one pooled model can learn ticker-specific patterns without creating one model file per stock.

Metrics

The project reports:

Metric Meaning
MAE Average absolute prediction error in price units.
RMSE Square-root average squared error; penalizes larger errors more heavily.
Pinball Loss Quantile-regression loss used for evaluating quantile predictions.
Baseline MAE/RMSE Naive baseline that predicts tomorrow's close as today's close.

Important Notes

  • Generated model files are intentionally excluded from Git by .gitignore.
  • If GLOBAL.pkl does not exist, the web client will not work with the default API request.
  • Training on the full S&P 500 can take significantly longer than the demo mode.
  • Predictions depend on external data from Yahoo Finance, so network issues or unavailable tickers may cause errors.
  • Pickle model files should only be loaded from trusted sources.
  • This is an educational project and not a production trading system.

Future Improvements

  • Add automated tests for feature engineering and API responses.
  • Add Docker support for easier deployment.
  • Add a configuration file for API URL, model type, and default prediction mode.
  • Add charts for historical prices and prediction ranges in the frontend.
  • Add model versioning and experiment tracking.
  • Add CI workflow for linting and test execution.
  • Add a proper LICENSE file before publishing the repository publicly.

License

No license file is currently included in the project. Before publishing or accepting contributions, add a license such as MIT, Apache-2.0, or another license that matches your intended use.

About

100/100 Course advanced programming :)

Resources

Stars

Watchers

Forks

Contributors