Adaptive SM clocking for energy-efficient LLM serving

jml@carbonforge.ai (Jean-Maxime Larouche) — Mon, 01 Jun 2026 17:18:12 GMT

LLM serving is moving from occasional jobs to persistent service infrastructure. At that scale, GPU power is no longer a background detail. It sets thermal limits, caps how much inference fits on a given site, and decides where new capacity can be deployed [15]. Integrated over time, that draw becomes energy, and energy is what drives operating cost [14].

blog

Adaptive SM clocking for energy-efficient LLM serving