CarbonForge — Inference optimization

1 · Measure

Power Telemetry

Sub-millisecond power and latency, with kernel-level attribution.

2 · Optimize

Optimization Engine

Searches the operating point that captures what your monitoring misses.

3 · Re-lock

Runtime Controller

Re-locks the operating point when models, traffic, or hardware change.

Jean-Maxime Larouche

Co-Founder · Tech
Serial AI Entrepreneur

Laurent Maisonnave

Co-Founder · GTM
Serial AI Entrepreneur

Pierre-Luc Bacon

Co-founder & Chief Scientist
Reinforcement learning, Mila

Sacha Lepretre

CTO · AI platforms
Ex-LuxCarta, CAE, Mila

Christophe Dubach

Scientific Advisor
Compilers and GPU codegen
McGill, CIFAR AI Chair

Richard Reiner

Strategic Advisor
Enterprise and infrastructure
Former Data Center CTO, Intel

Lower cost per million tokens
on your GPUs

Inference is where AI
hits its cost ceiling

How the CarbonForge loop works

Power Telemetry

Optimization Engine

Runtime Controller

Run the complete Loop on your fleet. More tokens per GPU under the same power envelope.

Built at Mila

Leadership

Advisors

Become an early adopter partner

Locked pricing on first deployments

Priority input on the roadmap

Direct collaboration with the team

Subscribe to updates

Lower cost per million tokenson your GPUs

Inference is where AIhits its cost ceiling

How the CarbonForge loop works

Power Telemetry

Optimization Engine

Runtime Controller

Run the complete Loop on your fleet. More tokens per GPU under the same power envelope.

Built at Mila

Leadership

Advisors

Become an early adopter partner

Locked pricing on first deployments

Priority input on the roadmap

Direct collaboration with the team

Subscribe to updates

Lower cost per million tokens
on your GPUs

Inference is where AI
hits its cost ceiling