Models, datasets,
and experiments.

Hosted on HuggingFace. Some have writeups linked.

Model Collections

124M-Base-Experiments

Checkpoints from my first 124M LLM pre-training project, covering scratch training, continued pre-training, and SFT experiments.

mrinaal-124m-base
mrinaal-124m-base-v2
mrinaal-124m-instruct-smoltalk-50k
mrinaal-124m-base-v3-mathmix
mrinaal-124m-instruct-v3-mathmix-smoltalk-150k

Nanbeige4-3B Cold Start Reasoning LoRA Experiments

Two LoRA cold-start SFT experiments teaching structured think/answer reasoning to Nanbeige4-3B-Base using distilled traces from frontier models.

Nanbeige4-3B Cold Start Reasoning LoRA (GLM 12K)
Nanbeige4-3B Cold Start Reasoning LoRA
Nanbeige4-3B Cold Start Reasoning LoRA (Opus Epoch 3)

RL Environments

CrisisOps

An OpenEnv RL environment where LLM agents act as crisis command operators, verifying noisy reports, allocating scarce resources, and publishing situation reports across long-horizon disaster response tasks.

DryLabSim

An OpenEnv-compatible RL environment for biological experiment planning, where agents run dry-lab and wet-lab pipelines under partial observability, noisy outputs, budget limits, and deterministic grading.

JSON Cleaning Environment

An OpenEnv RL environment where LLM agents clean malformed JSON to match a target schema, with four difficulty levels and deterministic scoring.