AI/ML Compute Cost Governance

Version

v0.3.0

Last Updated

February 28, 2026

Requirements

References

This standard is draft. It is under active development and should not be used for compliance purposes. Practitioner input is welcome.

Standard Summary

Defines governance requirements for organizations that incur material AI and machine learning infrastructure costs, covering training compute, inference serving, foundation model API consumption, and GPU cluster management. This draft standard is under active development by the AI/ML FinOps Working Group and is not yet ready for implementation. Practitioner input is actively sought.

Authors & Contributors

Rohan Mehta, AI/ML FinOps Working Group ChairDr. W. Okonkwo, ML Infrastructure LeadDaniel Park, GPU Economics Researcher

Rationale

AI compute costs are opaque, variable, and growing faster than any other cloud spend category. The absence of governance frameworks leads to experimental workloads running without budget controls, inference costs that are invisible to business owners, and foundation model API usage that lacks any accountability structure.

Scope

Intended scope covers all AI and ML workloads that run on cloud infrastructure including GPU instances, TPUs, specialized AI accelerators, and managed ML platforms (AWS SageMaker, Azure ML, GCP Vertex AI, Databricks). Also covers third-party foundation model API consumption with material monthly spend. Threshold for applicability: organizations with AI-related cloud spend exceeding $50,000 per month.

Requirements

7 requirements - MUST indicates mandatory; SHOULD indicates recommended.

Draft

All AI training jobs above $5,000 estimated cost MUST require pre-approval with documented business justification.

Draft

GPU cluster utilization MUST be monitored at minimum hourly; idle GPU-hours MUST be reported weekly.

Draft

Inference serving costs MUST be attributed to the product or service that benefits from the model.

Draft

Foundation model API costs MUST be tracked by model, by team, and by use case.

Draft

Token efficiency metrics (output quality per dollar) SHOULD be tracked for generative AI applications.

Draft

A kill-switch capability MUST exist for any training job that exceeds 150% of its estimated cost.

Draft

Model ownership records MUST identify the team responsible for each model in production.

Full Description

AI and machine learning workloads represent the fastest-growing and least-governed category of cloud expenditure. Training a large language model can cost hundreds of thousands of dollars in a single run; inference serving at scale can exceed cloud infrastructure costs in a matter of months. Yet most FinOps practices were designed for deterministic web application workloads, not probabilistic AI experiments.

IFO4-S-007 is being designed to address this gap. The draft standard distinguishes between three AI cost categories: experimental compute (training runs, fine-tuning jobs, hyperparameter searches), production inference (serving trained models to end users or systems), and foundation model API consumption (API calls to provider-hosted models such as OpenAI, Anthropic, Google Gemini, and Meta Llama endpoints).

Each category requires different governance approaches. Experimental compute requires pre-approval gates, budget caps, and kill-switch capabilities. Production inference requires unit cost monitoring, autoscaling governance, and model version cost attribution. Foundation model API consumption requires provider cost visibility, token efficiency tracking, and prompt optimization governance.

This draft is v0.3.0. It does not yet address AI agents, multi-model orchestration costs, vector database costs, or the financial implications of model distillation and quantization. These areas are under active research by the working group.

References

MLCommons: ML Training Cost Benchmarks (2024)

Andreessen Horowitz: AI Cost Trends (2024)

NVIDIA: GPU Cost Optimization for AI Workloads (2024)

IFO4 AI/ML FinOps Working Group: Interim Guidance (2025)

Hugging Face: Open LLM Cost Calculator (2024)