Skip to content

Add TwelveLabs provider for Marengo embeddings#820

Open
mohit-twelvelabs wants to merge 2 commits into
crmne:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration
Open

Add TwelveLabs provider for Marengo embeddings#820
mohit-twelvelabs wants to merge 2 commits into
crmne:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

Conversation

@mohit-twelvelabs

Copy link
Copy Markdown

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

What this does

Adds an opt-in :twelvelabs provider that wires TwelveLabs' Marengo multimodal embedding model into RubyLLM's existing synchronous embed interface:

RubyLLM.configure { |c| c.twelvelabs_api_key = ENV["TWELVELABS_API_KEY"] }
RubyLLM.embed("a cat playing piano", model: "marengo3.0", provider: :twelvelabs)
# => Embedding with a 512-dimensional float vector

Marengo produces embeddings in a shared space across video, image, audio, and text, which is useful for video search/RAG built on RubyLLM. This PR adds the text path (synchronous, 512-dim) as the smallest first slice.

It follows the existing provider conventions exactly: a Provider subclass with a Protocol that includes an Embeddings module (mirroring ChatCompletions::Embeddings), assume_models_exist? since TwelveLabs models aren't in the models.dev registry, and config registered via configuration_options. The TwelveLabs /embed endpoint takes multipart/form-data, so the payload uses Faraday::Multipart::ParamPart (the faraday-multipart dep is already present).

Non-breaking / opt-in: no defaults change, nothing is touched unless you pass provider: :twelvelabs.

Type of change

  • New feature

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Required for new features

  • I opened an issue before writing code and received maintainer approval
  • Linked issue: #___

I'm aware of the issue-first policy and the high bar for core providers (and the community-gem path for emerging ones). I'm opening this as a concrete, reviewable reference rather than a large speculative drop. Happy to first move this to an approved issue, or repackage it as a ruby_llm-twelvelabs community gem — whatever you prefer. Please feel free to close if it's not a fit for core.

Quality check

  • I tested my changes thoroughly
    • For provider changes: recorded VCR cassettes for the new specs
    • New specs pass: bundle exec rspec spec/ruby_llm/providers/twelve_labs
  • I updated documentation (embeddings guide + configuration reference)
  • I didn't modify auto-generated files manually (models.json, aliases.json)

Verified against the live API as well: a marengo3.0 text embedding returns a real 512-dim float vector. Three specs cover the single-vector path, the single-element-array (nested) path, and rejection of multi-input (the endpoint accepts one text per call). rubocop is clean on all changed files.

API changes

  • New public methods/classes

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

The provider-coverage specs enumerate every registered provider:
"covers every registered provider" asserts api_base_cases lists them
all, and "file protocol resolution" instantiates each via config_for.
Adding the :twelvelabs provider without updating these fixtures left it
absent from api_base_cases and unconfigured in config_for, so both specs
failed (the latter raising ConfigurationError on new). Add the
twelvelabs api_base case and config_for branch.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.93103% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.94%. Comparing base (8595adb) to head (9c52a2c).

Files with missing lines Patch % Lines
lib/ruby_llm/providers/twelvelabs/capabilities.rb 71.42% 4 Missing ⚠️
lib/ruby_llm/providers/twelvelabs.rb 94.11% 1 Missing ⚠️
lib/ruby_llm/providers/twelvelabs/api.rb 85.71% 1 Missing ⚠️
lib/ruby_llm/providers/twelvelabs/embeddings.rb 94.73% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
+ Coverage   81.90%   81.94%   +0.04%     
==========================================
  Files         169      173       +4     
  Lines        7713     7771      +58     
  Branches     1284     1288       +4     
==========================================
+ Hits         6317     6368      +51     
- Misses        884      890       +6     
- Partials      512      513       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mohit-twelvelabs

Copy link
Copy Markdown
Author

CI is green now (commit 9c52a2c). The failure was in provider_spec.rb's provider-coverage specs, which enumerate every registered provider — "covers every registered provider" expects each one listed in api_base_cases, and the file-protocol spec instantiates each via config_for. I'd added :twelvelabs without updating those two fixtures, so it was missing from api_base_cases and unconfigured in config_for (raising ConfigurationError on new). Added the twelvelabs api_base case and a config_for branch; all 23 checks pass.

The issue-first question still stands — happy to move this to an approved issue or repackage as a ruby_llm-twelvelabs community gem, whichever you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant