A Machine Learning Project for Financial Time-Series Analysis
Tools and technologies utilised in this project:
This repository supports a project exploring machine learning techniques for financial time-series analysis by building, evaluating, backtesting, and visualising next-day closing-price predictions for any stock. The repository combines data collection and preparation, feature engineering, an ensemble of predictive models (including CNN components), hyperparameter optimisation, backtesting, and a interactive Dash app for exploration.
This project is licensed under the MIT License. It uses the Wharton Research Data Services (WRDS) library, which is restricted to non-commercial academic use. Users must have a valid WRDS account and comply with WRDS Terms of Use. No WRDS data is included in this repository.
- Academic Purpose
- Research Overview
- Quick start
- Utilisation
- Installation
- Project structure
- Important workflow notes:
- Results
- License
This project was developed for research purposes, such as exploring machine learning techniques in financial time-series analysis. It is not intended for commercial use, investment advice, or real-world trading. All components, including data handling, are restricted to non-commercial academic contexts to comply with data provider terms.
This project investigates the application of machine learning to financial time-series data, using NVIDIA (NVDA) stock as a case study. The objectives include:
- Analyzing historical OHLCV (Open, High, Low, Close, Volume) data and supporting tickers.
- Developing and evaluating an ensemble of machine learning models, including convolutional neural network components, to study next-day closing price patterns.
- Conducting hyperparameter optimization using frameworks such as Ray Tune and Optuna, with experiment tracking via Weights & Biases.
- Evaluating model performance using regression and financial metrics, including backtesting simulations for research purposes.
- Providing an interactive interface (built with Dash) to explore data, metrics, and visualizations, accessible via
app/app.pyand executed throughrun.py.
- Clone the repository and change into the project directory:
git clone https://github.com/rayanjoshi/nocturne_bloom.git
cd nocturne_bloom- Install dependencies using uv:
pip install uv
# or on macOS: brew install uv
uv sync --all-extras- Provide required secrets in a
.envat the repository root:
Use the provided .env.example as a template. You can either copy it and fill values, or populate secrets from the Dash UI (recommended):
cp .env.example .env
- Run the Dash app:
uv run run.py
Then open the Dash UI in your browser at the printed local address (default: http://127.0.0.1:8050).
Configure the stock and data range in configs/data_loader.yaml. Keys used by the data loader:
- TICKER: stock ticker symbol (e.g., "NVDA")
- PERMNO / GVKEY: optional identifiers for CRSP / Compustat lookups (use when available)
- START_DATE / END_DATE: inclusive dates in YYYY-MM-DD format; START_DATE is often set to a quarter-prior to enable PE/PB calculations
Example:
data_loader:
TICKER: "NVDA"
PERMNO: 86580
GVKEY: 117768
START_DATE: "2004-10-31" # quarter-prior date to allow PE/PB calculations
END_DATE: "2022-12-31"Notes:
- A valid WRDS account is required for queries that use PERMNO/GVKEY; the project will not function without appropriate WRDS credentials.
- After changing the config, re-run the Data Preparation workflow (or restart the Dash app and trigger data prep) to refresh processed data and predictions.
- If you switch tickers, consider clearing cache/data/processed to avoid mixing prior results with new runs.
- Dates must follow YYYY-MM-DD; missing PERMNO/GVKEY will attempt a ticker-based lookup if supported by your WRDS access.
- All use must comply with WRDS terms; no raw WRDS data is stored in this repository.
- The project uses uv as the package manager.
- This repository lists dependencies in
pyproject.tomland auv.lockfile. Installing fromuv.lockwill give a stable, reproducible environment. - The lockfile has pinned dependencies due to being generated on an Intel Mac.
- The project uses PyTorch/PyTorch Lightning, Ray Tune/Optuna, Dash + dash-bootstrap-components, and financial/data helper libraries such as
wrds,pandas-taandbacktrader.
nocturne_bloom
├─ LICENSE
├─ README.md
├─ app
│ ├─ __init__.py
│ ├─ app.py
│ ├─ assets
│ │ ├─ css
│ │ │ └─ custom.css
│ │ └─ model_architecture.png
│ ├─ cache
│ │ └─ cache.db
│ ├─ layout
│ │ ├─ __init__.py
│ │ └─ sidebar.py
│ └─ pages
│ ├─ __init__.py
│ ├─ backtesting.py
│ ├─ data_preparation.py
│ ├─ home.py
│ └─ training.py
├─ assets
│ └─ results.png
├─ configs
│ ├─ backtest.yaml
│ ├─ config.yaml
│ ├─ data_loader.yaml
│ ├─ data_module.yaml
│ ├─ feature_engineering.yaml
│ ├─ model_ensemble.yaml
│ └─ trainer.yaml
├─ data
│ ├─ predictions
│ ├─ preprocessing
│ ├─ processed
│ └─ raw
├─ models
│ └─ scalers
├─ pyproject.toml
├─ run.py
├─ scripts
│ ├─ __init__.py
│ ├─ feature_correlation.py
│ ├─ feature_importance.py
│ ├─ logging_config.py
│ ├─ run_backtest.py
│ ├─ train_model.py
│ └─ tune_model.py
├─ src
│ ├─ WRDS_query.sql
│ ├─ __init__.py
│ ├─ data_loader.py
│ ├─ data_module.py
│ ├─ feature_engineering.py
│ └─ model_ensemble.py
└─ uv.lock
The project's workflows can be accessed via the interactive interface.
- Data Preparation: Use the Data Preparation page to input WRDS credentials and initiate data processing. This executes src/data_loader.py, src/feature_engineering.py, and src/data_module.py to prepare financial time-series data.
- Hyperparameter Optimization: The Training page runs scripts/tune_model.py using Ray Tune or Optuna to explore model configurations.
- Model Training: The Training page executes scripts/train_model.py to train the machine learning models.
- Evaluation and Backtesting: The Backtesting page runs scripts/run_backtest.py to evaluate model performance and stores results in data/predictions/.
If you prefer CLI, these scripts live under scripts/ and can be invoked with the repository Python environment. Example:
python scripts/train_model.py
python scripts/run_backtest.pyA .env file at the repository root is required to store credentials securely. A safe-to-commit template is included as .env.example.
The interactive interface allows input of credentials (e.g., WRDS_USERNAME, WRDS_PASSWORD, WANDB_API_KEY) through the Data Preparation and Training pages. These inputs are written to the .env file for use by the application and scripts. Credentials are stored in plain text and should be handled securely to prevent unauthorized access.
When you submit credentials in the Dash UI the app's callbacks read the existing .env (if present), update or append the submitted keys (WRDS_USERNAME, WRDS_PASSWORD, WANDB_API_KEY), and rewrite the file at the repository root using simple key=value lines. Values are stored in plain text.
- Hyperparameter tuning (Ray Tune / Optuna) and model training are CPU/GPU heavy. Run these on a machine with a GPU or increase timeouts accordingly.
- The UI executes training/tuning via
subprocesscalls to scripts inscripts/. Expect those operations to take minutes–hours depending on configuration.
This graph was prepared using data accessed via Wharton Research Data Services (WRDS). WRDS and its third-party suppliers retain all rights to the underlying data, which is considered valuable intellectual property and trade secrets. The figure is provided solely for academic and educational purposes and may not be reproduced, distributed, or used for commercial purposes without explicit permission from WRDS. It includes only derived visualizations and does not contain raw WRDS data.
This repository is released under the MIT License — see LICENSE.
Disclaimer: All code in this repository is licensed under the MIT License. Visualizations or results derived from WRDS data are included for academic and educational purposes only. No raw WRDS data is included. WRDS and its data providers retain all rights to the underlying data. Use of WRDS data is subject to their terms of service.
