🎯 Semi-Automated Logo Detection in Brand Advertisement Videos

An application for semi-automated logo detection in brand advertisement videos using multimodal machine learning.

Overview

This is an application that allows detection of logos in non-English brand advertisement videos using multimodal ML techniques. The overall pipeline is:

Run the Whisper model on audio of the advertisement to transcribe it from the source language (Italian for e.g.) to English.
Apply an LLM like Qwen 2.5 to obtain all brand names mentioned in the audio transcript.
Run shot detection on the video to obtain the most distinct, relevant keyframes.
On each keyframe, run a zero-shot object detection model such as OWLv2 with prompt to extract as many logo-like regions as possible i.e. crops that may contain an actual logo.
Candidate regions are filtered using a combination of heuristic filters (area, aspect ratio, texture, edge density, color variance) and CLIP-based filtering to remove false positives.
All of the crops/regions then run through the following brand assignment techniques:
- Use CLIP model to assign each region a brand from a list of top ~2000 brands (Netflix, Apple, etc.) obtained publicly from Kaggle.
- Use Optical Character Recognition (OCR) along with fuzzy string matching to assign leftover regions a brand from the brand names extracted from the audio.
- Use FAISS vector store to assign leftover regions a brand from the nearest-matching logo in the vector store. The vector store is pre-populated with the LogoDet-3K dataset for now.
Optionally, a Qwen post-filtering step verifies each detected logo as Correct / Incorrect / Other.
The FAISS vector store enables continual learning of new logos over time via human labelling, logo scraping, etc.
A Gradio application allows the user to upload a video, run the pipeline, and view the matched regions/logos with corresponding brand names and timestamps.

Advantages

No need for manual training/fine-tuning of object detection models on custom logos.
Local, indigenous brands detected using the audio transcript with OCR & fuzzy matching.
CLIP model works quite well for detecting global, popular brands.
Vector stores like FAISS enable continual learning and detection of new logos over time.
Overall pipeline is agnostic to the domain (Ads, sports, etc.) and the source language.
Multiple embedding model options for the FAISS store: CLIP, DINOv2, SigLIP2, or a hybrid — each with separate benchmarks to guide model selection.

Streamlit — FAISS Logo Manager (`app/streamlit_faiss_app.py`)

A companion UI for managing and evaluating the logo vector store.

Logo Search: Query the FAISS index with an image to find the nearest matching logo, with optional Test-Time Augmentation (TTA) for more robust retrieval.
Logo Ingestion: Add new logo images to the index (with optional augmentation) to expand coverage over time.
Metrics: Track retrieval quality and accept/reject results to maintain a feedback log.
Embedding model selection: Switch between CLIP and SigLIP2 indexes from the sidebar.

Usage

Installation

Clone the repository and set up a virtual environment (optional but recommended):

git clone <repo-url>
cd Logo-Detection
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows

Install dependencies for your platform:

# macOS
pip install -r requirements-macos.txt

# Windows
pip install -r requirements-win.txt

Running the Streamlit app

cd app
streamlit run streamlit_faiss_app.py

Benchmarking

The benchmarks/ directory contains a full evaluation framework for comparing embedding models on logo retrieval:

Leave-One-Out (LOO) evaluation: benchmarks/run_loo_benchmark.py
Crop-level evaluation: benchmarks/run_crop_benchmark.py
Supported models: CLIP, DINOv2, SigLIP2, and a DINOv2+CLIP hybrid.
Results (CSV + plots) are saved to benchmarks/results/.

Notebooks

Code for object detection models and VLMS is found in Notebooks/:

Notebook	Description
`Phase1_Obj_det_Models_Testing.ipynb`	Comparison of object detection models for logo region proposal
`Phase1-Qwen3-Testing.ipynb`	Evaluation of Qwen models for brand name extraction

Note: Notebooks are designed to run on Google Colab due to GPU requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Notebooks		Notebooks
app		app
assets		assets
benchmarks		benchmarks
filtering		filtering
models		models
pipelines		pipelines
test_data		test_data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Logo-Detection.code-workspace		Logo-Detection.code-workspace
Project_Analysis_Report.pdf		Project_Analysis_Report.pdf
README.md		README.md
build_siglip2_index.py		build_siglip2_index.py
gradio_app.py		gradio_app.py
repomix-output-Logo_Detection_App_New_20_10_25.md		repomix-output-Logo_Detection_App_New_20_10_25.md
requirements-macos.txt		requirements-macos.txt
requirements-win.txt		requirements-win.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Semi-Automated Logo Detection in Brand Advertisement Videos

Overview

Advantages

Streamlit — FAISS Logo Manager (`app/streamlit_faiss_app.py`)

Usage

Installation

Running the Streamlit app

Benchmarking

Notebooks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎯 Semi-Automated Logo Detection in Brand Advertisement Videos

Overview

Advantages

Streamlit — FAISS Logo Manager (app/streamlit_faiss_app.py)

Usage

Installation

Running the Streamlit app

Benchmarking

Notebooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Streamlit — FAISS Logo Manager (`app/streamlit_faiss_app.py`)

Packages