Scope check
Due diligence
What problem does this solve?
Ollama recently added new new image generation models: https://ollama.com/blog/image-generation
RubyLLM already supports Ollama as a provider, and also supports image generation through the paint method, but currently none of the new image models are registered in the model registry, requiring a patch for them to be used together.
Currently these models only work on OSX machines, but enable developers to iterate rapidly with powerful free models and avoid the pay-to-play limitations of frontier image generation models.
Proposed solution
Introduce the new models x/z-image-turbo and x/flux2-klein in the model registry and introduce an adapter to format the params to the underlying Ollama API call properly, including supported additional params (eg., width, height, and steps). Add defensive programming to ensure engineers using this on an incompatible platform understand the limitaitons.
Why this belongs in RubyLLM
RubyLLM acts as an interface between Ruby applications and LLM providers, already supports Ollama for text generation, and already supports image generation for frontier models. Introducing this capability seems like a natural step, and allows developers to rapidly prototype using free, local models, paving the way for quicker features.
Note: I have been using the x/flux2-klein model with a small patch to RubyLLM and have found great results, so I would be open to opening a PR to introduce this enhancement and collaborate on a good solution.
Thanks for your time and consideration - RubyLLM is a great library, so I would be honored to contribute!
Scope check
Due diligence
What problem does this solve?
Ollama recently added new new image generation models: https://ollama.com/blog/image-generation
RubyLLM already supports Ollama as a provider, and also supports image generation through the
paintmethod, but currently none of the new image models are registered in the model registry, requiring a patch for them to be used together.Currently these models only work on OSX machines, but enable developers to iterate rapidly with powerful free models and avoid the pay-to-play limitations of frontier image generation models.
Proposed solution
Introduce the new models x/z-image-turbo and x/flux2-klein in the model registry and introduce an adapter to format the params to the underlying Ollama API call properly, including supported additional params (eg., width, height, and steps). Add defensive programming to ensure engineers using this on an incompatible platform understand the limitaitons.
Why this belongs in RubyLLM
RubyLLM acts as an interface between Ruby applications and LLM providers, already supports Ollama for text generation, and already supports image generation for frontier models. Introducing this capability seems like a natural step, and allows developers to rapidly prototype using free, local models, paving the way for quicker features.
Note: I have been using the x/flux2-klein model with a small patch to RubyLLM and have found great results, so I would be open to opening a PR to introduce this enhancement and collaborate on a good solution.
Thanks for your time and consideration - RubyLLM is a great library, so I would be honored to contribute!