Stories by EarlyOom

Replace OCR with Vision Language Models

Show HN: Visually parse an entire YouTube video frame by frame

Ask HN: What are folks using to train/fine-tune Vision Language Models

A Node.js SDK for calling Vision Language Models

Run structured extraction on documents/images locally with Ollama and Pydantic

Show HN: Vlm Run, Extract JSON from images, videos and documents in a simple API

Fine-grained Visual Transcription for YouTube videos

"Ok Computer, why are you slow?"

Show HN: NOS – A fast, and ergonomic PyTorch inference server