Pre-Training My First Base Language Model From Scratch
Took a model from random initialization through 2B tokens of FineWeb-Edu on a single H100 and watched it learn next-token prediction from nothing.
Took a model from random initialization through 2B tokens of FineWeb-Edu on a single H100 and watched it learn next-token prediction from nothing.
I didn't come close to the Parameter Golf leaderboard, but I still had a lot of fun running scattered H100 experiments on Modal, hunting tiny BPB improvements while watching ideas collapse against artifact size limits, and figuring out the hard way why squeezing a capable model into 16 MB is trickier than it sounds.
How I built CrisisOps for the OpenEnv hackathon finale, then trained a small model with GRPO using TRL, Unsloth, and Modal.
A full fine-tune of Qwen3-1.7B on Wordle using OpenEnv, with a reward curve that actually went up and shaped rewards that taught real gameplay behavior.
My second RL experiment while studying GRPO: moving to Prime Intellect Lab, testing smaller and larger setups, and finally getting a reward curve that went up.
My first ever RL experiment: RLVR on GSM8K using Hugging Face TRL, Qwen2.5 1.5B Instruct, and an NVIDIA H100 on Modal, with notes on where the reward function broke and what I want to fix in v2.
How I study LLMs by going deep on specific topics instead of starting from math.
A first small-scale AI research experiment: cold-start SFT on Nanbeige4-3B-Base using 2,160 distilled reasoning triplets, trained with LoRA on an H100, with notes on what worked and what I want to try next.