Blog

Deep dives into inference performance, power telemetry, the compile-and-serve layer, and tokens per watt. Written by the engineers doing the work.

Describe your image

Adaptive SM clocking for energy-efficient LLM serving

LLM serving is moving from occasional jobs to persistent service infrastructure. At that scale, GPU power is no longer a background detail. It sets thermal limi …

Read Story

9 min read
Jun 1, 2026