Working paper. Content reflects ongoing research and may change.

← Back to Working Papers
energy measurement apple-silicon draft

Joules Per Token: An Energy Measurement Methodology for Apple Silicon

September 20, 2025

Draft methodology note outlining how we measure inference energy on Apple silicon using macOS powermetrics. The protocol is intended for repeatable internal benchmarking; tooling and results are still in development.

Status: Draft. Metrics below are illustrative and may change.

Motivation

Energy efficiency is increasingly important for edge AI deployments. However, comparing efficiency across hardware and software configurations requires standardized measurement methodology.

We focus on Apple silicon because:

  1. Unified memory reduces data movement energy
  2. Integrated power management enables precise measurement
  3. Growing adoption in sensitive deployments

Measurement Protocol

Prerequisites

  • macOS 13.0 or later
  • Administrative access for powermetrics
  • Thermal stabilization (5 min idle)
  • Power adapter connected

Procedure

# 1. Thermal stabilization
sleep 300

# 2. Baseline measurement (30s)
sudo powermetrics --samplers cpu_power \
    --sample-interval 100 \
    --sample-count 300 > baseline.txt

# 3. Run inference workload
./inference --input benchmark.txt &
INFERENCE_PID=$!

# 4. Measure during inference
sudo powermetrics --samplers cpu_power \
    --sample-interval 100 \
    --sample-count 1000 > workload.txt

wait $INFERENCE_PID

# 5. Calculate
python3 calculate_joules.py baseline.txt workload.txt tokens.txt

Calculation

E_inference = (P_workload × t_workload) - (P_baseline × t_workload)
J_per_token = E_inference / token_count

Results (Illustrative)

Illustrative internal runs on M2 Pro (12-core). Not a public benchmark:

ModelTokens/secWattsJoules/token
7B Q442.318.50.44
7B Q828.722.10.77
13B Q424.121.30.88
13B Q815.824.71.56

Repeatability

Across 10 runs with identical conditions:

  • Mean variance: 3.2%
  • Max variance: 4.8%
  • Thermal drift impact: <2% when stabilized

Conclusion

Joules per token can provide a hardware-normalized efficiency metric. The methodology is in active development. Tooling is in progress; contact us if you need the protocol details.

About MLNavigator Research Group

We explore verifiable, offline AI systems and publish working notes as the research develops. If you want to discuss collaboration or pilots, reach out.

Contact Research Pillars