Current technical reference:
docs/MODCREDB_TECHNICAL_GUIDE.mdThis README is a quick-start reference. The technical guide explains the database purpose, input files, SQLite schema, evidence model, search and FIMO behavior, rebuild workflow, validation, and known limitations.
Local first-version web database for transcription factor PWM/model files.
The code is kept separate from the large private data files. Build/import commands expect a local data directory containing the source TSV and archives.
The first version is intentionally small and easy to move later. It uses:
- SQLite for the local database.
- Python standard library HTTP server for local browsing.
- Jinja2 templates for HTML.
- MEME motif files stored inside SQLite for fast motif pages and downloads.
- Structure/model archive paths indexed in SQLite, without unpacking the large
models.tar.gz.
SQLite databases, raw archives, extracted PDB/model files, and UniProt-enriched
production data are not committed to GitHub. Use .gitignore as the source of
truth for excluded local artifacts.
For local import, provide a directory containing:
TF_PWM_chart_final.tsv
jaspar.tar.gz
cisbp.tar.gz
pwms.tar.gz
models.tar.gz
HOCOMOCO can be added to a staging database with import_hocomoco.py after
downloading the HOCOMOCO v11 CORE human mononucleotide MEME file.
From this directory:
python3 import_db.py --data-dir /path/to/source_dataThat creates:
data/tf_webdb.sqlite
For a quicker UI-only test that skips the large model archive index:
python3 import_db.py --skip-model-indexAfter the first import, enrich TF accessions with gene names, protein names, organisms, reviewed status, taxonomy IDs, and sequence lengths:
python3 enrich_uniprot.pyFor a small test run:
python3 enrich_uniprot.py --limit 20The enrichment script only fetches missing annotations by default. Use
--all when you intentionally want to refresh everything.
python3 app.py --host 127.0.0.1 --port 8090 --db data/tf_webdb.sqliteFor local curation only, expose the internal debug page and exception details with:
python3 app.py --host 127.0.0.1 --port 8090 --db data/tf_webdb.sqlite --enable-debug --show-errorsKeep --enable-debug and --show-errors disabled in production.
Then open:
http://127.0.0.1:8090/
The /scan page uses MEME/FIMO. Install MEME Suite or load the MEME module,
then make sure the fimo command is available in PATH before starting the
web app:
command -v fimo
fimo --version/home page with database stats./searchsearchable TF list./tf/<TF_ID>TF detail page with motif evidence grouped by evidence type./motif?source=<source>&id=<motif_id>motif page with PWM visualization, MEME download, and external source links when available./model-summaries/<TF_ID>model summary rows parsed from ModCRE.summary.txtfiles./model?id=<MODEL_FILE_ID>interactive 3D viewer for indexed PDB model files./scanFIMO scanner for pasted DNA/FASTA sequences, with motif search/selection and example input./docsdocumentation landing page with database statistics and import coverage./evidenceexplanation of evidence levels and interpretation./debugoptional local-only curation page for missing motif files and failed model counts, enabled with--enable-debug.
- Failed models are indexed as
status = failed, but hidden from normal TF pages. - The importer trims the TSV header
AlphaFoldtoAlphaFold. - Missing local motifs are kept as motif references and flagged in
import_issue. - UniProt annotations are optional. Pages still work before enrichment, but search becomes more useful after enrichment because users can search gene, protein, and organism names.
- ModCRE summary rows are stored in
model_summary, with links to PDB/PIR model files when the model filename can be reconstructed. - The app can download active PDB model files from the archive, but this can be slow because
models.tar.gzis compressed. - The 3D viewer currently uses NGL from a public CDN and streams PDB files
from
models.tar.gz. This is fine for local testing, but the public server should use a local NGL copy and unpacked/object-stored PDB files for speed. - The scanner writes the selected motifs as a temporary MEME file, runs
MEME/FIMO against the pasted sequence, and reports FIMO p-values/q-values.
The web app process must be started in an environment where
fimois available inPATH.
- Vendor the NGL JavaScript locally instead of loading it from a CDN.
- Add export formats for scanner results and a queued job mode for longer sequences.
- Replace the tiny local server with Flask/FastAPI when deploying publicly.
- Unpack or object-store model files on the university server for fast download and visualization.
Public provenance metadata is stored in config/source_releases.tsv and loaded into the source_release SQLite table by import_db.py or migrate_source_releases.py. Unknown release, license, citation, or workflow details are deliberately marked pending confirmation until PI/Baldo confirmation.
Motif scan/logo readiness is stored on motif_file using matrix_status and related QC columns. New imports classify these fields automatically. Existing SQLite databases can be updated without rebuilding by running python3 migrate_matrix_qc.py --db path/to/database.sqlite. Only matrix_status = 'usable' motifs are selected for FIMO scanning.