Models, datasets,
and experiments.
Hosted on HuggingFace. Some have writeups linked.
Model Collections
124M-Base-Experiments
124M LLM checkpoints from scratch training, continued pre-training, and SFT.
Nanbeige4-3B Cold Start Reasoning LoRA Experiments
LoRA cold-start SFT teaching structured reasoning to Nanbeige4-3B-Base from distilled frontier traces.
RL Environments
CrisisOps
OpenEnv RL env where agents act as crisis commanders—verify reports, allocate resources, and publish sitreps.
DryLabSim
OpenEnv RL env for biology experiment planning under partial observability, noisy outputs, and budget limits.
JSON Cleaning Environment
OpenEnv RL env where agents clean malformed JSON to a target schema across four difficulty levels.