TEmBed - Tabular Embedding Test Bed

A comprehensive benchmark suite for evaluating tabular embeddings across four representation levels: cell, row, column, and table, using a diverse collection of tasks and datasets.

Section 1: Benchmark Tasks

1) Tasks on Row Level

Row Similarity Search

Description: Given an input table and a row, find the most similar row from the input table to the given row.

Approaches: Implement the row_embedding_component to provide row embeddings, or implement row_similarity_search_component to return a ranked list directly. Set run_similarity_search_based_on in the approach's supported_tasks config accordingly.

Triplet-Based Evaluation (More Similar Than)

Description: Given a triplet of rows (anchor, positive, negative), evaluate whether the anchor is more similar to the positive than to the negative row.

Approaches: Implement the row_embedding_component.

Tabular Prediction

Description: Use row embeddings as features for downstream supervised tasks (classification or regression).

Approaches: Implement the row_embedding_component for embedding-based prediction, or implement predictive_ml_component to run the approach's own ML model.

2) Tasks on Column Level

Column Similarity Search

Description: Given a query column, retrieve and rank the most semantically similar columns from a data lake.

Approaches: Implement the column_embedding_component.

Column Type Annotation

Description: Given a table column, predict its semantic type from a fixed label vocabulary.

Approaches: Implement the column_embedding_component.

3) Tasks on Cell Level

Cell Level Semantic Retrieval

Description: Given a query cell, retrieve the top-k most semantically similar cells across a collection of tables.

Approaches: Implement the cell_embedding_component.

4) Tasks on Table Level

Table Retrieval

Description: Given a query table, retrieve and rank the most semantically similar tables from a collection.

Approaches: Implement the table_embedding_component.

Section 2: Installation

Check out this repository (including submodules):
```
git clone --recurse-submodules <repo-url>
```
Copy the setup template and edit the SETUP_* flags at the top to select which approaches to install:
```
cp setup_benchmark.sh.template setup_benchmark.sh
```
Run the setup script:
```
bash setup_benchmark.sh
```
This creates the benchmark_env conda environment (Python 3.13) and installs benchmark_src and approaches as editable packages. If you enabled additional approaches, it also creates per-approach conda environments (benchmark_env_hytrel, benchmark_env_gritlm, benchmark_env_tabicl).
(Optional) Create a .env file at the repo root for approaches that require Hugging Face model access:
```
HF_TOKEN=hf_...
HF_HOME=/path/to/hf/cache
```
Required for sap_rpt_oss and tabpfn after accepting their model license terms on Hugging Face.

Section 3: How to Add Your Approach

Copy approaches/benchmark_approaches_src/<approach_name>/ (the template folder) to a new folder and rename the class in approach.py.

Create configs/approaches/<your_approach>.yaml declaring approach_name, module_path, class_name, any hyperparameters, and a supported_tasks block listing which tasks your approach supports and their task-specific defaults:

approach_name: my_approach
module_path: "approaches/benchmark_approaches_src/my_approach"
class_name: MyApproach
my_param: some_value

supported_tasks:
  row_similarity_search:
    run_similarity_search_based_on: row_embeddings
  predictive_ml:
    run_task_based_on: row_embeddings

Implement only the component files for the capabilities you support (delete the rest). Each component must satisfy its interface in benchmark_src/approach_interfaces/.
Add your approach to a run config under configs/runs/ (see Section 4).
Use logging instead of print — the orchestrator captures log output to run.log in each result directory.

Section 4: How to Run the Benchmark

Always set PYTHONPATH first (the shell scripts below do this automatically):

export PYTHONPATH=$(pwd):$PYTHONPATH

For a full reference on all configuration options, see configs/README.md.

Run configs

Create or edit a file under configs/runs/<name>.yaml:

benchmark_output_dir: my_results
approaches:
  - name: my_approach
    params:
      my_param: some_value
    tasks: [row_similarity_search, predictive_ml]   # optional whitelist; omit to run all supported
    task_datasets:                                   # optional: override the global dataset list
      row_similarity_search: [Amazon-Google]

The same approach can appear multiple times with different params — output directories are automatically differentiated by a slug derived from the params (e.g. my_approach/embedding_model=all-MiniLM-L6-v2/row_similarity_search/Amazon-Google/).

Running

# Run all jobs defined in the run config
conda activate benchmark_env
python run_experiments.py <run_config_name>

# Stop on the first failure instead of continuing
python run_experiments.py <run_config_name> --stop-on-error

# Write results to a custom directory
python run_experiments.py <run_config_name> --results-dir results_testing

Results are written to results/<benchmark_output_dir>/<approach>/<params>/<task>/<dataset>/results.json. A run is skipped if results.json already exists — delete it to force a re-run.

Smoke test (before committing)

bash run_test_before_commit.sh <conda_env_name>

Runs one small dataset per task type and fails on any error.

Aggregating results

Results are automatically aggregated at the end of every run_experiments.py run. To re-run manually:

python benchmark_src/results_processing/gather_results.py results/<benchmark_output_dir>

Section 5: Repository Structure

├── configs/
│   ├── global_datasets.yaml              # dataset registry per task
│   ├── approaches/
│   │   └── <approach>.yaml               # approach params + supported_tasks
│   └── runs/
│       └── <run>.yaml                    # what to run: approaches + param overrides
├── approaches/
│   └── benchmark_approaches_src/
│       └── <approach>/
│           ├── approach.py               # main approach class
│           └── <task>_component.py       # one file per supported capability
├── configs/
│   ├── approaches/                       # per-approach params + supported_tasks
│   ├── runs/                             # run configs (what to run + overrides)
│   ├── task/                             # task-level defaults (top_k, metrics, etc.)
│   ├── dataset/                          # dataset-specific settings for creation scripts
│   └── global_datasets.yaml             # canonical dataset list per task
├── benchmark_src/
│   ├── approach_interfaces/              # ABCs for all component types
│   ├── tasks/                            # one run_*_benchmark.py per task
│   ├── utils/                            # metrics, result aggregation, etc.
│   └── results_processing/              # gather_results.py, ranking, plots
├── run_experiments.py                    # main orchestrator entry point
├── run_test_before_commit.sh
└── run_paper_experiments.sh

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
ContextAwareJoin @ 21f8ff4		ContextAwareJoin @ 21f8ff4
approaches		approaches
benchmark_src		benchmark_src
cache/dataset_creation_resources		cache/dataset_creation_resources
configs		configs
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
process_results.py		process_results.py
pyproject.toml		pyproject.toml
reqs_benchmark.txt		reqs_benchmark.txt
run_experiments.py		run_experiments.py
run_paper_experiments.sh		run_paper_experiments.sh
run_test_before_commit.sh		run_test_before_commit.sh
setup_benchmark.sh.template		setup_benchmark.sh.template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TEmBed - Tabular Embedding Test Bed

Section 1: Benchmark Tasks

1) Tasks on Row Level

Row Similarity Search

Triplet-Based Evaluation (More Similar Than)

Tabular Prediction

2) Tasks on Column Level

Column Similarity Search

Column Type Annotation

3) Tasks on Cell Level

Cell Level Semantic Retrieval

4) Tasks on Table Level

Table Retrieval

Section 2: Installation

Section 3: How to Add Your Approach

Section 4: How to Run the Benchmark

Run configs

Running

Smoke test (before committing)

Aggregating results

Section 5: Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TEmBed - Tabular Embedding Test Bed

Section 1: Benchmark Tasks

1) Tasks on Row Level

Row Similarity Search

Triplet-Based Evaluation (More Similar Than)

Tabular Prediction

2) Tasks on Column Level

Column Similarity Search

Column Type Annotation

3) Tasks on Cell Level

Cell Level Semantic Retrieval

4) Tasks on Table Level

Table Retrieval

Section 2: Installation

Section 3: How to Add Your Approach

Section 4: How to Run the Benchmark

Run configs

Running

Smoke test (before committing)

Aggregating results

Section 5: Repository Structure

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages