A home for my notes, small builds, and writeups as I study LLMs and deep learning from scratch.

World Fan Arena

A public football fan simulation where AI fan agents from different countries react to matches, rivalries, upsets, predictions and tournament drama inside chat rooms.

July 16, 2026

[ICML ’26 Effort] Efficient Qwen: Making Qwen3.5-4B Faster on a Single A10G

My ICML 2026 AdaptFM effort to optimize Qwen3.5-4B for low-latency inference on a single NVIDIA A10G through 100+ runtime, compiler, and quantization experiments.

June 30, 2026

Looking Inside Qwen3-4B With Sparse Autoencoders

I trained a sparse autoencoder on Qwen3-4B-Base, labeled its learned features, and tested one by steering the model toward cooking instructions.

June 12, 2026

Pre-Training My First Base Language Model From Scratch

Took a model from random initialization through 2B tokens of FineWeb-Edu on a single H100 and watched it learn next-token prediction from nothing.

May 13, 2026

Parameter Golf: My OpenAI Model Craft Experiment

I didn't come close to the Parameter Golf leaderboard, but I still had a lot of fun running scattered H100 experiments on Modal, hunting tiny BPB improvements while watching ideas collapse against artifact size limits, and figuring out the hard way why squeezing a capable model into 16 MB is trickier than it sounds.

May 4, 2026

CrisisOps: My (Final Round) OpenEnv Hackathon Project

How I built CrisisOps for the OpenEnv hackathon finale, then trained a small model with GRPO using TRL, Unsloth, and Modal.

April 29, 2026

My First Successful RL Training Run With OpenEnv And Wordle

A full fine-tune of Qwen3-1.7B on Wordle using OpenEnv, with a reward curve that actually went up and shaped rewards that taught real gameplay behavior.

April 2, 2026

My Second RL Experiment On Prime Intellect Lab

My second RL experiment while studying GRPO: moving to Prime Intellect Lab, testing smaller and larger setups, and finally getting a reward curve that went up.

March 25, 2026

First RL Experimental Project: RLVR Using GRPO With TRL On Modal

My first ever RL experiment: RLVR on GSM8K using Hugging Face TRL, Qwen2.5 1.5B Instruct, and an NVIDIA H100 on Modal, with notes on where the reward function broke and what I want to fix in v2.

March 23, 2026

My Current Approach in Learning and Experimenting with LLMs and Deep Learning

How I study LLMs by going deep on specific topics instead of starting from math.

March 18, 2026

Teaching a 3B Base Model to Emit Reasoning Traces

A first small-scale AI research experiment: cold-start SFT on Nanbeige4-3B-Base using 2,160 distilled reasoning triplets, trained with LoRA on an H100, with notes on what worked and what I want to try next.

March 11, 2026

July 16, 2026reads

World Fan Arena

A public football fan simulation where AI fan agents from different countries react to matches, rivalries, upsets, predictions and tournament drama inside chat rooms.

June 30, 2026reads

[ICML ’26 Effort] Efficient Qwen: Making Qwen3.5-4B Faster on a Single A10G

My ICML 2026 AdaptFM effort to optimize Qwen3.5-4B for low-latency inference on a single NVIDIA A10G through 100+ runtime, compiler, and quantization experiments.

June 12, 2026reads

Looking Inside Qwen3-4B With Sparse Autoencoders

I trained a sparse autoencoder on Qwen3-4B-Base, labeled its learned features, and tested one by steering the model toward cooking instructions.

May 13, 2026reads

Pre-Training My First Base Language Model From Scratch

Took a model from random initialization through 2B tokens of FineWeb-Edu on a single H100 and watched it learn next-token prediction from nothing.

May 4, 2026reads

Parameter Golf: My OpenAI Model Craft Experiment

April 29, 2026reads

CrisisOps: My (Final Round) OpenEnv Hackathon Project

How I built CrisisOps for the OpenEnv hackathon finale, then trained a small model with GRPO using TRL, Unsloth, and Modal.

April 2, 2026reads

My First Successful RL Training Run With OpenEnv And Wordle

A full fine-tune of Qwen3-1.7B on Wordle using OpenEnv, with a reward curve that actually went up and shaped rewards that taught real gameplay behavior.

March 25, 2026reads

My Second RL Experiment On Prime Intellect Lab

My second RL experiment while studying GRPO: moving to Prime Intellect Lab, testing smaller and larger setups, and finally getting a reward curve that went up.

March 23, 2026reads

First RL Experimental Project: RLVR Using GRPO With TRL On Modal

My first ever RL experiment: RLVR on GSM8K using Hugging Face TRL, Qwen2.5 1.5B Instruct, and an NVIDIA H100 on Modal, with notes on where the reward function broke and what I want to fix in v2.

March 18, 2026reads

My Current Approach in Learning and Experimenting with LLMs and Deep Learning

How I study LLMs by going deep on specific topics instead of starting from math.

March 11, 2026reads