Toggle navigation
HN
Paper
All
Show
Ask
Jobs
Top stories
Today
Last 7 days
Last months
This year
Stats
Stories by skidrow
Creating custom kernels for the AMD MI300
2 points
skidrow
2025-07-25T17:58:31Z
huggingface.co
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
skidrow
2025-07-25T17:58:18Z
www.spatters.ca
Creating custom kernels for the AMD MI300
1 points
skidrow
2025-07-24T15:58:49Z
huggingface.co
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4 points
skidrow
2025-07-24T15:58:35Z
www.spatters.ca
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
skidrow
2025-07-18T08:26:22Z
www.spatters.ca
Compiler Explorer: An Essential Kernel Playground for CUDA Developers
2 points
skidrow
2025-07-18T08:24:55Z
developer.nvidia.com
Creating custom kernels for the AMD MI300
1 points
skidrow
2025-07-18T08:23:35Z
huggingface.co
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points
skidrow
2025-04-19T14:42:57Z
research.colfax-intl.com
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024)
147 points
skidrow
2025-04-19T14:42:48Z
alexarmbr.github.io
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points
skidrow
2025-04-18T10:08:47Z
research.colfax-intl.com
Implementing a Fast Tensor Core Matmul on the Ada Architecture
1 points
skidrow
2025-04-18T10:06:19Z
www.spatters.ca
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores
2 points
skidrow
2025-04-18T10:04:55Z
alexarmbr.github.io
1 points
skidrow
2025-04-02T17:10:55Z
news.ycombinator.com
Understanding Peak, Max-Achievable and Delivered FLOPs
1 points
skidrow
2025-04-01T16:49:59Z
rocm.blogs.amd.com
DeepSeek-R1 and FP8 Mixed-Precision Training
1 points
skidrow
2025-04-01T16:47:32Z
research.colfax-intl.com
Outperforming cuBLAS on H100: A Worklog
3 points
skidrow
2025-04-01T16:44:17Z
cudaforfun.substack.com
Optimizing Matrix Multiplication on RDNA3
118 points
skidrow
2025-03-25T09:55:21Z
seb-v.github.io
Outperforming cuBLAS on H100: A Worklog
1 points
skidrow
2025-03-25T09:55:05Z
cudaforfun.substack.com
Mastering LLM Techniques: Inference Optimization
2 points
skidrow
2025-03-24T19:03:58Z
developer.nvidia.com
Optimizing Matrix Multiplication on RDNA3
2 points
skidrow
2025-03-24T19:02:27Z
seb-v.github.io
1