Indian officer in a Maharashtra data center reviewing a holographic audit ledger beside a saffron AI guardian

Sovereignty · Onshore by design

Onshore weights. Onshore keys. Onshore audit.

Maharashtra-resident compute, state-held HSM keys and an append-only ledger — CAG-ready, RTI-disclosable, reversible.

Senior IAS officer at Mantralaya Mumbai reviewing a citizen file with a junior officer

G2G · Mantralaya Copilot

Mantralaya clears a 240-page inter-ministry file in one sitting.

The copilot summarises every annexure, surfaces precedent and flags policy conflicts — with full CAG-ready lineage.

Rural Primary Health Centre doctor with an ASHA worker consulting an elderly patient

G2C · Aarogya Agent

An ASHA worker in Gadchiroli gets a second opinion in 30 seconds.

Aarogya Agent screens symptoms against PMJAY protocols and books a teleconsult when the PHC is out of reach.

Marathi schoolgirl with a Zilla Parishad teacher reviewing a textbook in a rural classroom

G2C · Vidya Tutor

Every Zilla Parishad student gets a Marathi tutor that never tires.

Vidya Tutor explains a math step ten different ways — and tells the teacher exactly where each child is stuck.

Foundation models

Frontier models, Marathi-first, Maharashtra-resident.

A family of foundation models tuned for Indian context, Indian languages and Indian regulation — trained, fine-tuned and served on sovereign Maharashtra infrastructure.

MahaAI foundation model family diagram

FAMILY

10 frontier models

One MoE core. Nine specialists. All onshore.

LANGUAGES

मराठी · हिंदी · English

+ 18 Marathi dialects, 12 Indic scripts.

RESIDENCY

maharashtra-r1

Weights, embeddings & logs never leave the state.

PROVENANCE

Daisy Nova lineage

Proven in commercial Insure AI in North America.

What is a frontier model

A frontier model is the most capable system of its generation — built at the limits of compute, data and method.

Scale

Hundreds of billions of parameters trained on trillions of tokens. Enough capacity to hold law, language, code, science and Marathi dialects in a single model.

Generality

One base model that transfers to thousands of downstream tasks — reasoning, retrieval, planning, vision, voice — without retraining from scratch.

Emergence

Capabilities that only appear past a certain scale: tool-use, multi-step reasoning, chain-of-thought, cross-lingual transfer, in-context learning.

Foundation · Frontier · Specialist

Three tiers in the MahaAI family — each with a job.

TIER 01

Foundation models

Daisy Nova · MahaAI Marathi Frontier

Trained from scratch on raw, multilingual, multimodal corpora. These are the base weights every other system inherits from. Expensive to build, cheap to reuse.

TIER 02

Frontier capabilities

Voice · Vision · Code · Compliance

Mid-training and post-training on the foundation base to unlock a specific modality or skill at state-of-the-art quality — streaming speech, document OCR, code synthesis, legal reasoning.

TIER 03

Domain specialists

Krishi · Aarogya · Bhumi · Lokshahi

Small, distilled, instruction-tuned models — fine-tuned on departmental knowledge graphs. Cheap to run on the edge, governed by the same audit and residency rules.

Anatomy of a MahaAI foundation model

Six parts. Each one engineered, audited and resident in Maharashtra.

01

Tokenizer

Devanagari-aware byte-pair tokenizer with 256K vocabulary. Marathi compounds, Sanskrit roots, code and emoji share one address space — no script penalty for Indic users.

02

Architecture

Decoder-only transformer with Mixture-of-Experts routing on Daisy Nova; 70B dense on Marathi Frontier. RoPE positions, grouped-query attention, sliding-window cache for 1M context.

03

Pre-training corpus

4.2T Marathi tokens, 3.8T Hindi, 9T English, 1.1T code, 600B legal and gazette text. Provenance-tagged, deduplicated, PII-scrubbed, licence-cleared.

04

Alignment

SFT on 1.8M Marathi instruction pairs curated with SCERT and IIT Bombay. DPO plus constitutional rules drawn from the Indian Constitution, DPDP Act and ministry SOPs.

05

Safety & evals

Bharat-Bench, IndicGenBench, MMLU-MR and an internal 'Mantralaya Hard' eval. Red-teamed against caste, communal, electoral and procurement-fraud attack surfaces.

06

Serving

Quantised to FP8/INT4 on Maharashtra-resident GPU clusters. Speculative decoding, paged KV cache, per-tenant isolation. Every inference logged to the sovereign audit ledger.

Training pipeline

From raw Marathi text to a deployable agent in five stages.

01

Corpus

Curate, dedupe, licence-clear, PII-scrub. Provenance hashed for every shard.

02

Pre-train

Trillion-token runs on sovereign GPU mesh. Checkpoints signed, weights never exported.

03

Mid-train

Long-context, multimodal and tool-use extensions on top of the base.

04

Align

SFT, DPO, constitutional AI with Indian legal and cultural rule packs.

05

Serve

Quantise, route, log. Every token traceable to a policy and an officer.

The model family

Six models in production. Each tuned for a job the state actually does.

Agentic Foundation

Daisy Nova

Production-grade agentic foundation model. Powers planning, tool-use and multi-step workflows across every domain agent.

Parameters
Mixture-of-Experts
Context
1M tokens
Languages
EN · HI · MR
Residency
Maharashtra

Language

MahaAI Marathi Frontier

First Marathi-native frontier LLM — trained on 4.2T tokens of Marathi corpus, vetted with SCERT and IIT Bombay linguists.

Parameters
70B dense
Context
256K tokens
Languages
Marathi-first, 11 Indic
Residency
Maharashtra

Speech

MahaAI Voice

Sub-300 ms voice agent stack for call-centres, IVR and field-officer apps. Robust to rural dialect variation.

Parameters
Streaming ASR + TTS
Context
Realtime
Languages
MR · HI · EN + 18 dialects
Residency
Maharashtra

Multimodal

MahaAI Vision

Document understanding for 7/12 extracts, claim forms, satellite crop imagery and CCTV review.

Parameters
Vision-language
Context
Image + 128K
Languages
Multilingual OCR
Residency
Maharashtra

Code

MahaAI Code

Code assistant fine-tuned for DigiLocker, India-Stack APIs, and the state's legacy COBOL/RPG estate.

Parameters
34B dense
Context
200K tokens
Languages
Polyglot
Residency
Maharashtra

Specialist

MahaAI Compliance

Rule-pack model that turns gazette notifications, GRs and SOPs into machine-checkable compliance graphs.

Parameters
Distilled 13B
Context
512K tokens
Languages
Legal EN/MR
Residency
Maharashtra

Benchmarks

State-of-the-art on Indic — competitive with global frontier on English.

92.4

IndicGenBench-MR

best-in-class Marathi generation

88.1

MMLU (EN)

general knowledge, 5-shot

94.7

Bhasha-Voice

ASR accuracy, rural Marathi

96.2

Bhumi-OCR

7/12 extract field accuracy

For ministries & departments

Bring sovereign AI to your department.

Pilot a Daisy-powered agent inside your ministry in 90 days, with audit-grade decision logs and full Maharashtra data residency.