Models, datasets,
and experiments.

Hosted on HuggingFace. Some have writeups linked.

Model Collections

124M-Base-Experiments

124M LLM checkpoints from scratch training, continued pre-training, and SFT.

mrinaal-124m-base
mrinaal-124m-base-v2
mrinaal-124m-instruct-smoltalk-50k
mrinaal-124m-base-v3-mathmix
mrinaal-124m-instruct-v3-mathmix-smoltalk-150k

Nanbeige4-3B Cold Start Reasoning LoRA Experiments

LoRA cold-start SFT teaching structured reasoning to Nanbeige4-3B-Base from distilled frontier traces.

Nanbeige4-3B Cold Start Reasoning LoRA (GLM 12K)
Nanbeige4-3B Cold Start Reasoning LoRA
Nanbeige4-3B Cold Start Reasoning LoRA (Opus Epoch 3)

RL Environments

CrisisOps

OpenEnv RL env where agents act as crisis commanders—verify reports, allocate resources, and publish sitreps.

DryLabSim

OpenEnv RL env for biology experiment planning under partial observability, noisy outputs, and budget limits.

JSON Cleaning Environment

OpenEnv RL env where agents clean malformed JSON to a target schema across four difficulty levels.