This article reflects MLNavigator Research Group work. Deployment lives in AdapterOS.

← Back to Research Notes
energy measurement methodology

Joules Per Token: A Practical Metric

November 19, 2025

Tokens per second tells you how fast. Joules per token tells you how efficient.

Why Energy Matters

Edge deployments face constraints that data center deployments don't:

  • Battery life: Mobile and disconnected systems run on limited power
  • Thermal limits: Fanless devices throttle under sustained load
  • Cost: Energy bills scale with usage in always-on applications

Raw throughput ignores these constraints. A model that runs 50 tokens/second at 40 watts isn't more useful than one running 30 tokens/second at 15 watts if you're battery-constrained.

The Metric

Joules per token normalizes energy consumption by output:

J/token = (Power × Time) / Tokens
        = (Watts × Seconds) / Tokens

Lower is better. It's hardware-comparable (with caveats) and captures the efficiency of the full stack: model, runtime, and hardware.

Measuring on macOS

Apple provides powermetrics for power measurement. Here's our protocol:

1. Stabilize Thermal State

# Wait 5 minutes at idle
sleep 300

Thermal state affects power consumption. Hot chips throttle and use power differently than cold chips.

2. Capture Baseline

sudo powermetrics --samplers cpu_power \
    --sample-interval 100 \
    --sample-count 300 \
    -o baseline.txt

30 seconds of idle power tells you the floor.

3. Run Workload

./inference --input benchmark.txt --output results.txt

Use a fixed input for repeatability.

4. Calculate

baseline_watts = parse_average(baseline.txt)
workload_watts = parse_average(workload.txt)
inference_watts = workload_watts - baseline_watts

tokens = count_tokens(results.txt)
time_seconds = measure_runtime()

joules = inference_watts * time_seconds
j_per_token = joules / tokens

Repeatability Rules

For results to be comparable:

  1. Same hardware - Don't compare M1 to M2
  2. Same thermal state - Always stabilize first
  3. Same input - Use identical benchmark data
  4. Power adapter - Battery mode behaves differently
  5. Report conditions - Ambient temperature, chip variant, macOS version

We see <5% variance across 10 runs when following this protocol.

What It Doesn't Tell You

Joules per token doesn't tell you:

  • Quality of outputs
  • Latency per request
  • Memory efficiency
  • Cost per token (unless you know your electricity rate)

It's one metric among many. Use it alongside others.

Conclusion

Energy efficiency is measurable and comparable with proper methodology. Joules per token provides a normalized metric for edge deployment planning. The measurement is straightforward on macOS with appropriate stabilization and baseline correction.