Skip to content
GitHubX/TwitterRSS

Model Access Control: Who Can Use What

A financial services company discovered a $180,000 surprise bill after developers used GPT-4o for development tasks that GPT-4o-mini would have handled perfectly. Their access control policy was a single API key shared across 47 engineers. This guide shows you how to prevent that scenario by implementing robust model access control with RBAC for AI.

Model access control sits at the intersection of security, compliance, and cost optimization. Without it, you face three critical risks:

  1. Budget Explosion: Unrestricted access to premium models (like GPT-4o at $5.00/$15.00 per 1M tokens) for tasks that require only budget models (like GPT-4o-mini at $0.150/$0.600 per 1M tokens)
  2. Compliance Violations: Using models without proper data handling certifications for sensitive workloads
  3. Capability Mismatch: Developers accidentally using models with insufficient context windows or reasoning capabilities

According to industry data, organizations without model access controls spend 3-5x more on LLM APIs than those with granular policies. The average mid-size company (50-200 engineers) wastes $50,000-$150,000 annually on inappropriate model usage.

Traditional RBAC (Role-Based Access Control) needs adaptation for AI systems. You’re not just controlling file access—you’re controlling computational resources with variable costs and capabilities.

Instead of simple “allow/deny,” AI access control should be capability-based:

CapabilityDescriptionExample Models
Basic ReasoningSimple Q&A, classificationGPT-4o-mini, Haiku-3.5
Advanced ReasoningComplex analysis, multi-step problemsGPT-4o, Claude-3.5-Sonnet
VisionImage analysis, OCRGPT-4o, Claude-3.5-Sonnet
Large ContextGreater than 100K token processingClaude-3.5-Sonnet (200K)
Code GenerationProgramming tasksGPT-4o, Claude-3.5-Sonnet

Your RBAC system should automatically route requests to the cheapest model that satisfies the capability requirements:

Implementing model access control requires a three-layer approach: capability mapping, role definition, and dynamic routing. Here’s how to structure it:

First, map your models to capabilities and costs:

models:
gpt-4o-mini:
capabilities: [basic_reasoning, classification]
cost: { input: 0.15, output: 0.60 } # per 1M tokens
context: 128000
tier: budget
gpt-4o:
capabilities: [advanced_reasoning, vision, code_generation]
cost: { input: 5.00, output: 15.00 }
context: 128000
tier: premium
claude-3-5-sonnet:
capabilities: [advanced_reasoning, large_context, code_generation]
cost: { input: 3.00, output: 15.00 }
context: 200000
tier: premium
haiku-3.5:
capabilities: [basic_reasoning, classification]
cost: { input: 1.25, output: 5.00 }
context: 200000
tier: budget

Define roles by required capabilities rather than specific models:

roles:
developer:
capabilities: [basic_reasoning, classification]
max_cost_per_request: $0.01
allowed_tiers: [budget]
data_analyst:
capabilities: [advanced_reasoning, large_context]
max_cost_per_request: $0.50
allowed_tiers: [budget, premium]
ml_researcher:
capabilities: [advanced_reasoning, vision, code_generation, large_context]
max_cost_per_request: $2.00
allowed_tiers: [budget, premium]
finance_team:
capabilities: [basic_reasoning]
max_cost_per_request: $0.005
allowed_tiers: [budget]
data_residency: EU

The router selects the cheapest model that satisfies all requirements:

interface AccessRequest {
userRole: string;
requiredCapabilities: string[];
estimatedTokens: number;
dataSensitivity: 'low' | 'medium' | 'high';
}
interface RouterDecision {
selectedModel: string;
estimatedCost: number;
reasoning: string;
}
function routeModel(request: AccessRequest): RouterDecision {
const roleConfig = getRoleConfig(request.userRole);
const eligibleModels = getEligibleModels(roleConfig, request.requiredCapabilities);
if (eligibleModels.length === 0) {
throw new Error('No model satisfies capability requirements');
}
// Sort by cost for estimated token count
const sorted = eligibleModels.sort((a, b) => {
const costA = calculateCost(a, request.estimatedTokens);
const costB = calculateCost(b, request.estimatedTokens);
return costA - costB;
});
const selected = sorted[0];
const cost = calculateCost(selected, request.estimatedTokens);
if (cost > roleConfig.max_cost_per_request) {
throw new Error(`Request exceeds cost limit: ${cost} > ${roleConfig.max_cost_per_request}`);
}
return {
selectedModel: selected.name,
estimatedCost: cost,
reasoning: `Selected ${selected.name} as cheapest model satisfying [${request.requiredCapabilities.join(', ')}]`
};
}

Here’s a complete implementation for a FastAPI service with model access control:

from enum import Enum
from typing import List, Set, Dict
from pydantic import BaseModel, Field
from fastapi import FastAPI, HTTPException, Depends, Security
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
class ModelTier(Enum):
BUDGET = "budget"
PREMIUM = "premium"
class Capability(Enum):
BASIC_REASONING = "basic_reasoning"
ADVANCED_REASONING = "advanced_reasoning"
VISION = "vision"
LARGE_CONTEXT = "large_context"
CODE_GENERATION = "code_generation"
class ModelConfig(BaseModel):
name: str
capabilities: Set[Capability]
cost_per_million_tokens: Dict[str, float] # input/output
context_window: int
tier: ModelTier
class RoleConfig(BaseModel):
name: str
allowed_capabilities: Set[Capability]
max_cost_per_request: float
allowed_tiers: Set[ModelTier]
data_residency: str = "any"
# Verified pricing data from research
MODELS = {
"gpt-4o-mini": ModelConfig(
name="gpt-4o-mini",
capabilities={Capability.BASIC_REASONING, Capability.CLASSIFICATION},
cost_per_million_tokens={"input": 0.15, "output": 0.60},
context_window=128000,
tier=ModelTier.BUDGET
),
"gpt-4o": ModelConfig(
name="gpt-4o",
capabilities={
Capability.ADVANCED_REASONING,
Capability.VISION,
Capability.CODE_GENERATION
},
cost_per_million_tokens={"input": 5.00, "output": 15.00},
context_window=128000,
tier=ModelTier.PREMIUM
),
"claude-3-5-sonnet": ModelConfig(
name="claude-3-5-sonnet",
capabilities={
Capability.ADVANCED_REASONING,
Capability.LARGE_CONTEXT,
Capability.CODE_GENERATION
},
cost_per_million_tokens={"input": 3.00, "output": 15.00},
context_window=200000,
tier=ModelTier.PREMIUM
),
"haiku-3.5": ModelConfig(
name="haiku-3.5",
capabilities={Capability.BASIC_REASONING, Capability.CLASSIFICATION},
cost_per_million_tokens={"input": 1.25, "output": 5.00},
context_window=200000,
tier=ModelTier.BUDGET
)
}
ROLES = {
"developer": RoleConfig(
name="developer",
allowed_capabilities={Capability.BASIC_REASONING, Capability.CLASSIFICATION},
max_cost_per_request=0.01,
allowed_tiers={ModelTier.BUDGET}
),
"data_analyst": RoleConfig(
name="data_analyst",
allowed_capabilities={Capability.ADVANCED_REASONING, Capability.LARGE_CONTEXT},
max_cost_per_request=0.50,
allowed_tiers={ModelTier.BUDGET, ModelTier.PREMIUM}
),
"ml_researcher": RoleConfig(
name="ml_researcher",
allowed_capabilities={
Capability.ADVANCED_REASONING,
Capability.VISION,
Capability.CODE_GENERATION,
Capability.LARGE_CONTEXT
},
max_cost_per_request=2.00,
allowed_tiers={ModelTier.BUDGET, ModelTier.PREMIUM}
)
}
class ModelRouter:
def route(self, role_name: str, required_capabilities: Set[Capability],
estimated_input_tokens: int, estimated_output_tokens: int) -> Dict:
if role_name not in ROLES:
raise HTTPException(status_code=403, detail="Unknown role")
role = ROLES[role_name]
# Validate capabilities
if not required_capabilities.issubset(role.allowed_capabilities):
missing = required_capabilities - role.allowed_capabilities
raise HTTPException(
status_code=403,
detail=f"Role lacks required capabilities: {missing}"
)
# Find eligible models
eligible = []
for model in MODELS.values():
if (required_capabilities.issubset(model.capabilities) and
model.tier in role.allowed_tiers):
eligible.append(model)
if not eligible:
raise HTTPException(
status_code=400,
detail="No model satisfies all required capabilities"
)
# Calculate costs and select cheapest
def calculate_cost(model: ModelConfig) -> float:
input_cost = (model.cost_per_million_tokens["input"] *
estimated_input_tokens / 1_000_000)
output_cost = (model.cost_per_million_tokens["output"] *
estimated_output_tokens / 1_000_000)
return input_cost + output_cost
selected = min(eligible, key=calculate_cost)
cost = calculate_cost(selected)
if cost > role.max_cost_per_request:
raise HTTPException(
status_code=400,
detail=f"Estimated cost ${cost:.4f} exceeds limit ${role.max_cost_per_request}"
)
return {
"model": selected.name,
"estimated_cost": cost,
"capabilities": [c.value for c in required_capabilities]
}
# FastAPI integration
app = FastAPI()
security = HTTPBearer()
@app.post("/route")
async def route_request(
request: AccessRequest,
credentials: HTTPAuthorizationCredentials = Security(security)
):
router = ModelRouter()
try:
result = router.route(
role_name=request.userRole,
required_capabilities=set(request.requiredCapabilities),
estimated_input_tokens=request.estimatedTokens,
estimated_output_tokens=int(request.estimatedTokens * 0.5) # Conservative estimate
)
return result
except HTTPException as e:
raise e
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

The Problem: Assigning specific models to roles instead of capabilities. When model pricing changes or new models release, you must manually update every role.

The Fix: Use capability-based permissions. A “developer” role should have capabilities: [basic_reasoning] rather than allowed_models: [gpt-4o-mini]. This lets your router automatically adopt cheaper or better models as they become available.

The Problem: Routing without pre-flight cost checks. One unbounded request can consume your entire monthly budget.

The Fix: Always validate estimated cost against role limits before routing. Implement circuit breakers that block requests exceeding configured thresholds.

The Problem: Gradually adding capabilities to roles until budget models have premium access. A “developer” role starts with basic reasoning but slowly gains vision and code generation capabilities.

The Fix: Quarterly capability audits. Review which roles actually need which capabilities. Use your access logs to verify capability usage patterns.

The Problem: Routing large-context requests to models with insufficient context windows, causing silent truncation or failures.

The Fix: Include context window in capability matching. A request requiring 150K tokens should never route to a 128K context model, even if other capabilities match.

The Problem: Can’t trace which role used which model for which request, making cost attribution and compliance audits impossible.

The Fix: Log every routing decision with role, model, capabilities, estimated vs. actual cost, and reasoning. Store these logs for at least 90 days.

Capability RequiredBudget ModelPremium ModelCost Difference
Basic ReasoningGPT-4o-mini ($0.15/$0.60)GPT-4o ($5/$15)33x input, 25x output
Advanced ReasoningHaiku-3.5 ($1.25/$5.00)Claude-3.5-Sonnet ($3/$15)2.4x input, 3x output
VisionN/AGPT-4o ($5/$15)Required
Large Context (200K)Haiku-3.5 ($1.25/$5.00)Claude-3.5-Sonnet ($3/$15)2.4x input, 3x output
Code GenerationN/AGPT-4o ($5/$15)Required
RoleMax Cost/RequestAllowed TiersKey Capabilities
Developer$0.01BudgetBasic reasoning, classification
Data Analyst$0.50Budget, PremiumAdvanced reasoning, large context
ML Researcher$2.00Budget, PremiumAll capabilities
Finance Team$0.005BudgetBasic reasoning, EU data residency

Access control policy builder

Interactive widget derived from “Model Access Control: Who Can Use What” that lets readers explore access control policy builder.

Key models to cover:

  • Anthropic claude-3-5-sonnet (tier: general) — refreshed 2024-11-15
  • OpenAI gpt-4o-mini (tier: balanced) — refreshed 2024-10-10
  • Anthropic haiku-3.5 (tier: throughput) — refreshed 2024-11-15

Widget metrics to capture: user_selections, calculated_monthly_cost, comparison_delta.

Data sources: model-catalog.json, retrieved-pricing.