Skip to content

Feature Request: Latin/Spanish OCR model + MeloTTS Spanish for MaixCAM2 #196

@elchiche10

Description

@elchiche10

Feature Request: Latin/Spanish OCR model + MeloTTS Spanish for MaixCAM2

Platform: MaixCAM2 / MaixPy v4


Context

I am building an assistive vision system for blind and visually impaired users using MaixCAM2. The device uses Depth-Anything-V2 for obstacle detection, YOLO11n for object identification, SmolVLM for scene description, and PP-OCR for reading signs and text. All audio feedback is via MeloTTS. The target users are Spanish speakers (Latin America).


Request 1: Latin/Spanish PP-OCR recognition model

The current pp_ocr_en.mud model fails to correctly recognize Spanish text. It confuses letters like Ñ, accented vowels, and common Spanish character combinations.

There is already a pre-converted ONNX model available at:
https://huggingface.co/docato/PaddleOCR_Mobile_Modelslatin_PP-OCRv3_mobile_rec_infer.onnx

This model covers Latin-script languages including Spanish, French, Portuguese, etc. Since you already have the Pulsar2 pipeline set up for PP-OCR models on AX630, compiling this to .axmodel + .mud should follow the same process as the existing English model.

Request: Please provide pp_ocr_latin.mud (or similar) in the MaixCAM2 model zoo, using the Latin PP-OCRv3 recognition model compiled for AX630.


Request 2: MeloTTS Spanish model

The current melotts-zh.mud only supports Chinese and basic English. There is no Spanish TTS option available on MaixCAM2.

The upstream Spanish model exists at:
https://huggingface.co/myshell-ai/MeloTTS-Spanish

You already have the full conversion pipeline done for the Chinese model (melotts-zh.mud). The Spanish model uses the same MeloTTS architecture, so porting it should follow the same ONNX → INT8 quantization → .axmodel + .mud process.

Request: Please provide melotts-es.mud in the MaixCAM2 model zoo, following the same structure as melotts-zh.mud.


Why this matters

Spanish is spoken by 500+ million people. Latin America represents a huge potential user base for accessibility and assistive technology applications. Both of these additions would make MaixCAM2 significantly more useful for non-English speaking markets, with very little additional work given the pipelines already exist.

Thank you for the excellent work on MaixPy and MaixCAM2!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions