home/ai/projects

Models, datasets,
and experiments.

filter :///

Model Collections

2 threads

5 checkpoints

124M LLM checkpoints from scratch training, continued pre-training, and SFT.

mrinaal-124m-base
123.6M-parameter decoder-only LM trained from scratch on 2B FineWeb-Edu tokens. RoPE, RMSNorm, SwiGLU, tied embeddings.
mrinaal-124m-base-v2
Continued pre-training checkpoint: 1B additional tokens on a mixed recipe atop mrinaal-124m-base.
mrinaal-124m-instruct-smoltalk-50k
Instruction-tuned checkpoint: mrinaal-124m-base-v2 SFT on the first 50k valid SmolTalk examples.
mrinaal-124m-base-v3-mathmix
Continued pre-training checkpoint: 1.5B additional math-heavy mixed-recipe tokens atop mrinaal-124m-base-v2; best val loss 2.6333.
mrinaal-124m-instruct-v3-mathmix-smoltalk-150k
Instruction-tuned checkpoint: mrinaal-124m-base-v3-mathmix SFT on 150k valid SmolTalk examples; best val loss 1.6581.

3 checkpoints

LoRA cold-start SFT teaching structured reasoning to Nanbeige4-3B-Base from distilled frontier traces.

3 entries

[01]

OpenEnv RL env where agents act as crisis commanders: verify reports, allocate resources, and publish sitreps.

[02]

OpenEnv RL env for biology experiment planning under partial observability, noisy outputs, and budget limits.

[03]

OpenEnv RL env where agents clean malformed JSON to a target schema across four difficulty levels.