Stories by gpjt

Thoughts on Role Confusion

Flax debugging: making a hash of things

10Gb/s Ethernet: switching to a Broadcom SFP+ module

Jax: Commitment Issues

Jax Back Ends and Devices

Using Safetensors with Flax

First Looking into Jax

10Gb/s Ethernet: using mini-heatsinks with a 10GBASE-T SFP+ module

10Gb/s Ethernet: what I did to get it working in my home

10Gb Ethernet: what I had to (re)learn

LLM from scratch, part 33 – what I learned from the appendices

LLM from scratch (32l) – Interventions: updated instruction fine-tuning results

How an LLM becomes more coherent as we train it

LLM from scratch, part 32k – Interventions: gradient accumulation

Provision: LLM-powered server setup from Markdown

LLM from scratch, part 32j – trying to train a better model in the cloud

Writing an LLM from scratch, part 32i – Interventions: what is in the noise?

Writing an LLM from scratch, part 32h – Interventions: full fat float32

Writing an LLM from scratch, part 32g – Interventions: weight tying

Writing an LLM from scratch, part 32f – Interventions: weight decay

1 2 3