A full-stack educational machine-learning project for predicting the next trading day's stock closing price. The project trains regression models on historical market data, exposes predictions through a Flask API, and provides a simple browser-based client for entering ticker symbols and viewing predicted price movement.
Disclaimer
This project is intended for learning, experimentation, and portfolio demonstration only. It is not financial advice and should not be used as the sole basis for investment decisions.
- Overview
- Key Features
- Tech Stack
- Project Structure
- How It Works
- Getting Started
- Training Models
- Running the API Server
- Using the Web Client
- API Reference
- Modeling Details
- Important Notes
- Future Improvements
The system predicts the next-day closing price of a stock using historical OHLCV data, S&P 500 market movement, and technical indicators. The default training flow creates one pooled global model across selected tickers and saves it as GLOBAL.pkl. The web client then calls the API with global=true and uses that global model for predictions.
The project also supports training separate per-ticker models such as AAPL.pkl, MSFT.pkl, and TSLA.pkl.
- Next trading day stock closing-price prediction.
- Historical market-data download with
yfinance. - S&P 500 daily return as a market-context feature.
- Technical indicators including moving averages, RSI, Bollinger Bands, volume ratios, spreads, and short-term returns.
- Default Quantile Gradient Boosting model with prediction ranges.
- Optional MLP neural-network regressor.
- Global pooled model across many tickers, with compact ticker identity hash features.
- Optional per-ticker model training.
- Flask API with CORS support.
- Lightweight static HTML/CSS/JavaScript client.
- Model evaluation using MAE, RMSE, and Pinball Loss for quantile models.
| Layer | Technologies |
|---|---|
| Machine Learning | Python, scikit-learn, NumPy, pandas |
| Market Data | yfinance |
| Backend API | Flask, Flask-CORS |
| Frontend | HTML, CSS, JavaScript |
| Model Storage | Pickle files (.pkl) |
| Optional Tuning | Optuna |
predictStockMachineLearning-main/
├── Client/
│ ├── CSS/
│ │ └── style.css
│ ├── JS/
│ │ └── script.js
│ └── index.html
├── ModelTraining/
│ ├── features.py
│ ├── model.py
│ ├── predict.py
│ └── train.py
├── Server/
│ ├── requirements.txt
│ └── server.py
├── requirements.txt
├── .gitignore
└── README.md
| Path | Purpose |
|---|---|
ModelTraining/features.py |
Builds the feature set used during both training and prediction. |
ModelTraining/model.py |
Contains model wrappers and metric functions. |
ModelTraining/train.py |
Trains global or per-ticker models and saves them as .pkl files. |
ModelTraining/predict.py |
Loads trained models and generates next-day predictions. |
Server/server.py |
Exposes the prediction API on localhost:8080. |
Client/index.html |
Browser UI for submitting ticker symbols. |
Client/JS/script.js |
Calls the Flask API and renders prediction results. |
- Historical stock data is downloaded from Yahoo Finance through
yfinance. - S&P 500 historical data is downloaded and converted into daily returns.
- Technical indicators are calculated from each ticker's historical price and volume data.
- The training script creates supervised examples where today's features are mapped to tomorrow's closing price or tomorrow's return.
- A model is trained and saved under
ModelTraining/models/. - The Flask server loads the trained model and exposes prediction endpoints.
- The web client sends ticker requests to the API and displays current price, predicted price, expected change, and model metrics.
- Python 3.10 or newer recommended.
- Internet connection for downloading market data.
- A modern browser for the frontend client.
git clone <your-repository-url>
cd predictStockMachineLearning-mainOn macOS/Linux:
python -m venv .venv
source .venv/bin/activateOn Windows PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1pip install -r requirements.txtThis trains one global model on a small demo set: AAPL, MSFT, GOOGL, and TSLA.
python ModelTraining/train.py --demo --target returnThe trained model is saved to:
ModelTraining/models/GLOBAL.pkl
python Server/server.pyThe API will run at:
http://localhost:8080
Open this file directly in your browser:
Client/index.html
Enter a ticker symbol such as AAPL, MSFT, GOOGL, or TSLA and click Predict.
By default, if no --tickers or --demo flag is provided, the script attempts to train on the full S&P 500 list.
python ModelTraining/train.pyFor a faster demo run:
python ModelTraining/train.py --demopython ModelTraining/train.py --tickers AAPL MSFT NVDA AMZN --global-modelpython ModelTraining/train.py --tickers AAPL MSFT TSLA --per-tickerThis creates files such as:
ModelTraining/models/AAPL.pkl
ModelTraining/models/MSFT.pkl
ModelTraining/models/TSLA.pkl
python ModelTraining/train.py --demo --target returnTraining on return can sometimes produce more stable behavior than predicting absolute prices directly. During prediction, the return is converted back into an estimated price.
python ModelTraining/train.py --demo --model mlpCustom hidden layers can be passed as a comma-separated list:
python ModelTraining/train.py --demo --model mlp --mlp-hidden 64,32 --mlp-max-iter 1200python ModelTraining/train.py --tickers AAPL MSFT --per-ticker --walk-forwardpython ModelTraining/train.py --demo --optuna-trials 25Optuna is included in the root
requirements.txt. Hyperparameter tuning is currently supported for the Quantile Gradient Boosting model.
Start the server from the project root:
python Server/server.pyThe server exposes:
GET http://localhost:8080/health
GET http://localhost:8080/stock?ticker=AAPL&global=true
The server loads models from:
ModelTraining/models/
The frontend is a static client located in Client/index.html. It sends requests to:
http://localhost:8080/stock?ticker=<TICKER>&global=trueBecause the client uses global=true, make sure ModelTraining/models/GLOBAL.pkl exists before using the UI.
GET /healthExample response:
{
"status": "ok"
}GET /stock?ticker=AAPL&global=trueQuery parameters:
| Parameter | Required | Description |
|---|---|---|
ticker |
Yes | Stock ticker symbol, for example AAPL. |
global |
No | Use the global model when set to true, 1, yes, or y. If omitted, the server attempts to load a per-ticker model. |
Example response:
{
"ticker": "AAPL",
"last_close": 195.64,
"last_date": "2026-06-08",
"prediction": 197.21,
"range_low": 192.10,
"range_high": 201.45,
"change": 1.57,
"change_pct": 0.80,
"mae": 3.42,
"rmse": 4.91
}Response fields:
| Field | Description |
|---|---|
ticker |
Normalized ticker symbol. |
last_close |
Latest available closing price. |
last_date |
Date of the latest available market data. |
prediction |
Predicted next-day closing price. |
range_low |
Lower quantile prediction, when available. |
range_high |
Upper quantile prediction, when available. |
change |
Difference between prediction and latest close. |
change_pct |
Percentage change between prediction and latest close. |
mae |
Mean Absolute Error measured during validation. |
rmse |
Root Mean Squared Error measured during validation. |
The default model is a Quantile Gradient Boosting regressor. It trains separate models for multiple quantiles, usually:
0.1, 0.5, 0.9
The median quantile (0.5) is used as the main prediction. The lower and upper quantiles provide an estimated prediction range.
The project also includes a simple MLP regressor based on scikit-learn's MLPRegressor. It uses feature scaling and supports configurable hidden layers.
The default feature set includes:
CloseSP500_ReturnSMA_5,SMA_20,SMA_50EMA_5,EMA_20,EMA_50RSI_14BB_Upper_20,BB_Lower_20Volume,Volume_MA_20,Volume_RatioHigh_Low_SpreadReturn_1d,Return_3d,Return_5d
For global models, additional ticker hash features are added by default so that one pooled model can learn ticker-specific patterns without creating one model file per stock.
The project reports:
| Metric | Meaning |
|---|---|
| MAE | Average absolute prediction error in price units. |
| RMSE | Square-root average squared error; penalizes larger errors more heavily. |
| Pinball Loss | Quantile-regression loss used for evaluating quantile predictions. |
| Baseline MAE/RMSE | Naive baseline that predicts tomorrow's close as today's close. |
- Generated model files are intentionally excluded from Git by
.gitignore. - If
GLOBAL.pkldoes not exist, the web client will not work with the default API request. - Training on the full S&P 500 can take significantly longer than the demo mode.
- Predictions depend on external data from Yahoo Finance, so network issues or unavailable tickers may cause errors.
- Pickle model files should only be loaded from trusted sources.
- This is an educational project and not a production trading system.
- Add automated tests for feature engineering and API responses.
- Add Docker support for easier deployment.
- Add a configuration file for API URL, model type, and default prediction mode.
- Add charts for historical prices and prediction ranges in the frontend.
- Add model versioning and experiment tracking.
- Add CI workflow for linting and test execution.
- Add a proper
LICENSEfile before publishing the repository publicly.
No license file is currently included in the project. Before publishing or accepting contributions, add a license such as MIT, Apache-2.0, or another license that matches your intended use.