energymeasurementmethodology

Joules Per Token: A Practical Metric

November 19, 2025·MLNavigator Team

Practical Outcomes

Teams can compare model/runtime options using a single efficiency number, not just speed.
On-prem and air-gapped deployments can forecast power costs more accurately.
Procurement can evaluate hardware tradeoffs with measured efficiency data.
Capacity planning improves when energy use is normalized per token.

Tokens per second tells you how fast. Joules per token tells you how efficient.

Why Energy Matters

Edge deployments face constraints that many data center deployments do not:

Battery life: Mobile and disconnected systems run on limited power
Thermal limits: Fanless devices throttle under sustained load
Cost: Energy bills scale with usage in always-on applications

Raw throughput ignores these constraints. A model that runs 50 tokens/second at 40 watts isn't more useful than one running 30 tokens/second at 15 watts if you're battery-constrained.

The Metric

Joules per token normalizes energy consumption by output:

J/token = (Power × Time) / Tokens
        = (Watts × Seconds) / Tokens

Lower is better. It's hardware-comparable (with caveats) and captures the efficiency of the full stack: model, runtime, and hardware.

Measuring on macOS

Apple provides powermetrics for power measurement. Here's our protocol:

1. Stabilize Thermal State

# Wait 5 minutes at idle
sleep 300

Thermal state affects power consumption. Hot chips throttle and use power differently than cold chips.

2. Capture Baseline

sudo powermetrics --samplers cpu_power \
    --sample-interval 100 \
    --sample-count 300 \
    -o baseline.txt

30 seconds of idle power tells you the floor.

3. Run Workload

./inference --input benchmark.txt --output results.txt

Use a fixed input for repeatability.

4. Calculate

baseline_watts = parse_average(baseline.txt)
workload_watts = parse_average(workload.txt)
inference_watts = workload_watts - baseline_watts

tokens = count_tokens(results.txt)
time_seconds = measure_runtime()

joules = inference_watts * time_seconds
j_per_token = joules / tokens

Repeatability Rules

For results to be comparable:

Same hardware - Don't compare M1 to M2
Same thermal state - Always stabilize first
Same input - Use identical benchmark data
Power adapter - Battery mode behaves differently
Report conditions - Ambient temperature, chip variant, macOS version

We see <5% variance across 10 runs when following this protocol.

What It Doesn't Tell You

Joules per token doesn't tell you:

Quality of outputs
Latency per request
Memory efficiency
Cost per token (unless you know your electricity rate)

It is one metric among many. Use it alongside others.

Bottom Line

Energy efficiency is measurable and comparable with a consistent method. Joules per token provides a normalized metric for edge deployment planning. On macOS, measurement is straightforward when thermal stabilization and baseline correction are done correctly.

Continue Reading

Choose a topic path: research catalog, pillar context, or deployment discussion.