<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>blog</title>
    <link>https://carbonforge.ai/blog</link>
    <description>Field notes on AI inference at the operating point. Measurement, methodology, and observations on tokens per watt in production.</description>
    <language>en</language>
    <pubDate>Wed, 03 Jun 2026 13:48:29 GMT</pubDate>
    <dc:date>2026-06-03T13:48:29Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Adaptive SM clocking for energy-efficient LLM serving</title>
      <link>https://carbonforge.ai/blog/adaptive-sm-clocking-for-energy-efficient-llm-serving</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://carbonforge.ai/blog/adaptive-sm-clocking-for-energy-efficient-llm-serving" title="" class="hs-featured-image-link"&gt; &lt;img src="https://carbonforge.ai/hubfs/blog/Memory-bound%20decode-less%20power-no%20throughput%20loss.png" alt="Adaptive SM clocking for energy-efficient LLM serving" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;p&gt;LLM serving is moving from occasional jobs to persistent service infrastructure. At that scale, GPU power is no longer a background detail. It sets thermal limits, caps how much inference fits on a given site, and decides where new capacity can be deployed &lt;a href="https://powering-intelligence.epri.com/executive-summary.html"&gt;[15]&lt;/a&gt;. Integrated over time, that draw becomes energy, and energy is what drives operating cost &lt;a href="https://buildings.lbl.gov/publications/2024-lbnl-data-center-energy-usage-report"&gt;[14]&lt;/a&gt;.&lt;/p&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://carbonforge.ai/blog/adaptive-sm-clocking-for-energy-efficient-llm-serving" title="" class="hs-featured-image-link"&gt; &lt;img src="https://carbonforge.ai/hubfs/blog/Memory-bound%20decode-less%20power-no%20throughput%20loss.png" alt="Adaptive SM clocking for energy-efficient LLM serving" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;p&gt;LLM serving is moving from occasional jobs to persistent service infrastructure. At that scale, GPU power is no longer a background detail. It sets thermal limits, caps how much inference fits on a given site, and decides where new capacity can be deployed &lt;a href="https://powering-intelligence.epri.com/executive-summary.html"&gt;[15]&lt;/a&gt;. Integrated over time, that draw becomes energy, and energy is what drives operating cost &lt;a href="https://buildings.lbl.gov/publications/2024-lbnl-data-center-energy-usage-report"&gt;[14]&lt;/a&gt;.&lt;/p&gt;  
&lt;img src="https://track-na3.hubspot.com/__ptq.gif?a=342884351&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fcarbonforge.ai%2Fblog%2Fadaptive-sm-clocking-for-energy-efficient-llm-serving&amp;amp;bu=https%253A%252F%252Fcarbonforge.ai%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <category>Tokens per watt</category>
      <category>LLM inference</category>
      <category>GPU power</category>
      <pubDate>Mon, 01 Jun 2026 17:18:12 GMT</pubDate>
      <author>jml@carbonforge.ai (Jean-Maxime Larouche)</author>
      <guid>https://carbonforge.ai/blog/adaptive-sm-clocking-for-energy-efficient-llm-serving</guid>
      <dc:date>2026-06-01T17:18:12Z</dc:date>
    </item>
  </channel>
</rss>
