Models, datasets,
and experiments.
Hosted on HuggingFace. Some have writeups linked.
Model Collections
124M-Base-Experiments
Checkpoints from my first 124M LLM pre-training project, covering scratch training, continued pre-training, and SFT experiments.
Nanbeige4-3B Cold Start Reasoning LoRA Experiments
Two LoRA cold-start SFT experiments teaching structured think/answer reasoning to Nanbeige4-3B-Base using distilled traces from frontier models.
RL Environments
CrisisOps
An OpenEnv RL environment where LLM agents act as crisis command operators, verifying noisy reports, allocating scarce resources, and publishing situation reports across long-horizon disaster response tasks.
DryLabSim
An OpenEnv-compatible RL environment for biological experiment planning, where agents run dry-lab and wet-lab pipelines under partial observability, noisy outputs, budget limits, and deterministic grading.
JSON Cleaning Environment
An OpenEnv RL environment where LLM agents clean malformed JSON to match a target schema, with four difficulty levels and deterministic scoring.