Developer Articles | TechForDev

Latest AI / ML JavaScript Python React Next.js Web Dev DevOps Cloud

Why your diffusion model is slow at batch size 1 (and what actually helps)

Elise MoreauMay 22, 2026 • 4 min read

Why your diffusion model is slow at batch size 1 (and what actually helps)

TL;DR: Single-image diffusion inference is bottlenecked by kernel launch overhead and attention...

#machinelearning#llm#mlops#pytorch

0 0

The bf16 grad accumulator that killed our SDXL LoRA training

Elise Moreau2d ago • 4 min read

The bf16 grad accumulator that killed our SDXL LoRA training

TL;DR: Our SDXL LoRA fine-tune for a Photoroom product photography model trained for six days while....

#machinelearning#pytorch#mlops#computervision

0 0

My high-res image-to-video kept OOMing — turns out I was decoding outside no_grad

shinji shimizu3d ago • 4 min read

My high-res image-to-video kept OOMing — turns out I was decoding outside no_grad

A 96GB GPU couldn't run 1024x768 I2V (83.5 GiB peak). The 54 GiB wasn't the model — it was an autogr...

#pytorch#ai#machinelearning#python

0 0

Quantising event-camera networks to run under 1MB on a Cortex-M7

Marco RinaldiMay 22, 2026 • 4 min read

Quantising event-camera networks to run under 1MB on a Cortex-M7

TL;DR: I shrunk a gesture-recognition model for a Prophesee EVK4 event camera from 4.2MB down to...

#machinelearning#computervision#mlops#pytorch

0 0

Prefix caching in vLLM under multi-tenant agent traffic

Marcus Chen3d ago • 4 min read

Prefix caching in vLLM under multi-tenant agent traffic

TL;DR: We turned on vLLM's prefix cache for our agent workloads at Nexus Labs and watched TTFT drop....

#llm#mlops#infrastructure#pytorch

0 1

LLM-as-judge variance broke our DPO training signal for 3 weeks

Marcus Chen2d ago • 4 min read

LLM-as-judge variance broke our DPO training signal for 3 weeks

TL;DR: Our DPO pipeline used a single LLM as the preference judge. Training reward climbed every run...

#machinelearning#mlops#llm#pytorch

0 0

QAT vs PTQ on our edge vision model: 6 months of A/B data

Marco Rinaldi1d ago • 4 min read

QAT vs PTQ on our edge vision model: 6 months of A/B data

TL;DR: We ran post-training quantisation (PTQ) and quantisation-aware training (QAT) side by side on...

#machinelearning#computervision#mlops#pytorch

0 0

I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail

Aditya Mehra3d ago • 2 min read

I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail

Every time a PyTorch model refuses to learn, the debugging process looks the same: Stare at the...

#pytorch#python#machinelearning#opensource

0 0

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Alan West4d ago • 5 min read

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Your GPU sits at 15% utilization and bigger batches don't help? Here's how to diagnose whether you'r...

#pytorch#performance#machinelearning#gpu

0 0

Distilling SAM 2 into a 6MB student for industrial inspection

Marco Rinaldi2d ago • 4 min read

Distilling SAM 2 into a 6MB student for industrial inspection

TL;DR: We took Meta's SAM 2 small (around 224M params) and distilled it into a 6.3MB student that...

#computervision#machinelearning#pytorch#mlops

0 0

Tech Articles