Technology & AI

Cohere Releases Small Aya: A 3B Small-Parameter Language Model That Supports 70 Languages ​​and Works Locally Even on Mobile

Cohere AI Labs has been released Little Ayaa family of small language models (SLMs) that redefine the functionality of many languages. While most models scale by increasing parameters, Tiny Aya uses a 3.35B-parameter architecture to deliver high-quality rendering and productivity everywhere 70 languages.

The release includes 5 models: Small Aya Base (pre-trained), Little Aya Global (limited instruction), and three regional varieties—The world (Africa/West Asia), The fire (South Asia), and Water (Asia-Pacific/Europe).

Architecture

Aya Mini is built on a compact Decoder-only Transformer architecture. Important details include:

  • Parameters: 3.35B total (2.8B not embedded)
  • Layers: 36
  • Vocabulary: 262k tokenizer is designed for parallel language representation.
  • Attention: Central sliding window with full attention (3:1 ratio) and Aggregated Query Attention (GQA).
  • Context: 8192 input and output tokens.

The model was pre-trained 6T tokens using the Warmup-Stable-Decay (WSD) system. To maintain stability, the team used SwiGLU it works and removes all the biases in dense layers.

Advanced Post-training: FUSION and SimMerge

To bridge the gap with low-resource languages, Cohere has implemented an artificial data pipeline.

  1. Fusion-of-N (FUSION): Information is sent to the ‘teacher team’ (COMMAND A, GEMMA3-27B-IT, DEEPSEEK-V3). Judge LLM, the Fusorextracts and synthesizes the strongest parts of their responses.
  2. Special Region: Models are adjusted for 5 regional clusters (eg, South Asia, Africa).
  3. SimMerge: To prevent ‘catastrophic forgetting’ of global security, regional checkpoints were integrated into a global model using SimMergewhich selects the best aggregation operators based on signal similarity.

Performance benchmarks

Tiny Aya Global consistently beats larger or similar scale competitors in multilingual operations:

  • Translation: Extremely successful GEMMA3-4B in the middle 46 out of 61 languages to WMT24++.
  • Consultation: of GlobalMGSM (statistics) of the position of African languages, Tiny Aya has been achieved 39.2% accuracy.black GEMMA3-4B (17.6%) and QWEN3-4B (6.25%).
  • Safety: Has the highest safe response rate (91.1%) to MultiJail.
  • Integrity of Language: The model gains 94% language accuracy.meaning that it rarely switches to English when asked to respond in another language.

Shipment to Device

Aya Small is optimized for edge computing. Using 4-bit quantization (Q4_K_M)the model fits in a 2.14 GB short-term memory.

  • iPhone 13: 10 tokens/s.
  • iPhone 17 Pro: 32 tokens/s.

This quantization system leads to a small 1.4-point reduced production quality, making it a viable solution for offline, private, and local AI applications.

Key Takeaways

  • Effective Multilingual Ability: Little Aya a 3.35B-parameter a family model that delivers high-quality rendering and high-quality generation throughout 70 languages. It proves that large scale is not necessary for strong multilingual performance if the models are designed with the data configuration in mind.
  • Innovative Training Pipeline: Models developed using a novel strategy involving Fusion-of-N (FUSION)where a ‘team of teachers’ (such as Command A and DeepSeek-V3) produced synthetic data. The judge model then combines the most robust components to ensure high-quality training signals even in low-resource languages.
  • Regional expertise in integration: Cohere issued a special exception—Small Verse Earth, Fire, and Water— open to specific regions such as Africa, South Asia, and Asia-Pacific. This was created by combining fine-tuned regional models with a global model using SimMerge to maintain security while improving local language functionality.
  • High Benchmark Performance: Tiny Aya Global surpasses competitors like it Gemma3-4B for the translation quality of 46 out of 61 languages on WMT24++. It also significantly reduces the disparity in mathematical reasoning in African languages, to achieve 39.2% accuracy. compared to Gemma3-4B which is 17.6%.
  • Optimized for Device Use: The model is very portable and works well on peripheral devices; it is fulfilling ~10 tokens/s on the iPhone 13 as well 32 tokens/s using iPhone 17 Pro Q4_K_M measurement. This 4-bit quantization format maintains high quality with only a small amount 1.4-point destruction.

Check it out Technical specifications, Paper, Model weights and playing field. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button