Agentic Foundation
Daisy Nova
Production-grade agentic foundation model. Powers planning, tool-use and multi-step workflows across every domain agent.
- Parameters
- Mixture-of-Experts
- Context
- 1M tokens
- Languages
- EN · HI · MR
- Residency
- Maharashtra

Sovereignty · Onshore by design
Maharashtra-resident compute, state-held HSM keys and an append-only ledger — CAG-ready, RTI-disclosable, reversible.

G2G · Mantralaya Copilot
The copilot summarises every annexure, surfaces precedent and flags policy conflicts — with full CAG-ready lineage.

G2C · Aarogya Agent
Aarogya Agent screens symptoms against PMJAY protocols and books a teleconsult when the PHC is out of reach.

G2C · Vidya Tutor
Vidya Tutor explains a math step ten different ways — and tells the teacher exactly where each child is stuck.
Foundation models
A family of foundation models tuned for Indian context, Indian languages and Indian regulation — trained, fine-tuned and served on sovereign Maharashtra infrastructure.

FAMILY
10 frontier models
One MoE core. Nine specialists. All onshore.
LANGUAGES
मराठी · हिंदी · English
+ 18 Marathi dialects, 12 Indic scripts.
RESIDENCY
maharashtra-r1
Weights, embeddings & logs never leave the state.
PROVENANCE
Daisy Nova lineage
Proven in commercial Insure AI in North America.
What is a frontier model
Hundreds of billions of parameters trained on trillions of tokens. Enough capacity to hold law, language, code, science and Marathi dialects in a single model.
One base model that transfers to thousands of downstream tasks — reasoning, retrieval, planning, vision, voice — without retraining from scratch.
Capabilities that only appear past a certain scale: tool-use, multi-step reasoning, chain-of-thought, cross-lingual transfer, in-context learning.
Foundation · Frontier · Specialist
Daisy Nova · MahaAI Marathi Frontier
Trained from scratch on raw, multilingual, multimodal corpora. These are the base weights every other system inherits from. Expensive to build, cheap to reuse.
Voice · Vision · Code · Compliance
Mid-training and post-training on the foundation base to unlock a specific modality or skill at state-of-the-art quality — streaming speech, document OCR, code synthesis, legal reasoning.
Krishi · Aarogya · Bhumi · Lokshahi
Small, distilled, instruction-tuned models — fine-tuned on departmental knowledge graphs. Cheap to run on the edge, governed by the same audit and residency rules.
Anatomy of a MahaAI foundation model
01
Devanagari-aware byte-pair tokenizer with 256K vocabulary. Marathi compounds, Sanskrit roots, code and emoji share one address space — no script penalty for Indic users.
02
Decoder-only transformer with Mixture-of-Experts routing on Daisy Nova; 70B dense on Marathi Frontier. RoPE positions, grouped-query attention, sliding-window cache for 1M context.
03
4.2T Marathi tokens, 3.8T Hindi, 9T English, 1.1T code, 600B legal and gazette text. Provenance-tagged, deduplicated, PII-scrubbed, licence-cleared.
04
SFT on 1.8M Marathi instruction pairs curated with SCERT and IIT Bombay. DPO plus constitutional rules drawn from the Indian Constitution, DPDP Act and ministry SOPs.
05
Bharat-Bench, IndicGenBench, MMLU-MR and an internal 'Mantralaya Hard' eval. Red-teamed against caste, communal, electoral and procurement-fraud attack surfaces.
06
Quantised to FP8/INT4 on Maharashtra-resident GPU clusters. Speculative decoding, paged KV cache, per-tenant isolation. Every inference logged to the sovereign audit ledger.
Training pipeline
01
Curate, dedupe, licence-clear, PII-scrub. Provenance hashed for every shard.
02
Trillion-token runs on sovereign GPU mesh. Checkpoints signed, weights never exported.
03
Long-context, multimodal and tool-use extensions on top of the base.
04
SFT, DPO, constitutional AI with Indian legal and cultural rule packs.
05
Quantise, route, log. Every token traceable to a policy and an officer.
The model family
Agentic Foundation
Production-grade agentic foundation model. Powers planning, tool-use and multi-step workflows across every domain agent.
Language
First Marathi-native frontier LLM — trained on 4.2T tokens of Marathi corpus, vetted with SCERT and IIT Bombay linguists.
Speech
Sub-300 ms voice agent stack for call-centres, IVR and field-officer apps. Robust to rural dialect variation.
Multimodal
Document understanding for 7/12 extracts, claim forms, satellite crop imagery and CCTV review.
Code
Code assistant fine-tuned for DigiLocker, India-Stack APIs, and the state's legacy COBOL/RPG estate.
Specialist
Rule-pack model that turns gazette notifications, GRs and SOPs into machine-checkable compliance graphs.
Benchmarks
92.4
IndicGenBench-MR
best-in-class Marathi generation
88.1
MMLU (EN)
general knowledge, 5-shot
94.7
Bhasha-Voice
ASR accuracy, rural Marathi
96.2
Bhumi-OCR
7/12 extract field accuracy
For ministries & departments
Pilot a Daisy-powered agent inside your ministry in 90 days, with audit-grade decision logs and full Maharashtra data residency.