Status: Draft. Metrics below are illustrative and may change.
Motivation
Energy efficiency is increasingly important for edge AI deployments. However, comparing efficiency across hardware and software configurations requires standardized measurement methodology.
We focus on Apple silicon because:
- Unified memory reduces data movement energy
- Integrated power management enables precise measurement
- Growing adoption in sensitive deployments
Measurement Protocol
Prerequisites
- macOS 13.0 or later
- Administrative access for powermetrics
- Thermal stabilization (5 min idle)
- Power adapter connected
Procedure
# 1. Thermal stabilization
sleep 300
# 2. Baseline measurement (30s)
sudo powermetrics --samplers cpu_power \
--sample-interval 100 \
--sample-count 300 > baseline.txt
# 3. Run inference workload
./inference --input benchmark.txt &
INFERENCE_PID=$!
# 4. Measure during inference
sudo powermetrics --samplers cpu_power \
--sample-interval 100 \
--sample-count 1000 > workload.txt
wait $INFERENCE_PID
# 5. Calculate
python3 calculate_joules.py baseline.txt workload.txt tokens.txt
Calculation
E_inference = (P_workload × t_workload) - (P_baseline × t_workload)
J_per_token = E_inference / token_count
Results (Illustrative)
Illustrative internal runs on M2 Pro (12-core). Not a public benchmark:
| Model | Tokens/sec | Watts | Joules/token |
|---|---|---|---|
| 7B Q4 | 42.3 | 18.5 | 0.44 |
| 7B Q8 | 28.7 | 22.1 | 0.77 |
| 13B Q4 | 24.1 | 21.3 | 0.88 |
| 13B Q8 | 15.8 | 24.7 | 1.56 |
Repeatability
Across 10 runs with identical conditions:
- Mean variance: 3.2%
- Max variance: 4.8%
- Thermal drift impact: <2% when stabilized
Conclusion
Joules per token can provide a hardware-normalized efficiency metric. The methodology is in active development. Tooling is in progress; contact us if you need the protocol details.