Documentation Index
Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt
Use this file to discover all available pages before exploring further.
API reference
TensorCost exposes a REST gateway at https://api.tensorcost.com/v1/ that fronts the 14 backend microservices. All endpoints are tenant-scoped and require authentication.
Programmatic agents — Claude Desktop, internal LLM agents, partners — should prefer the MCP server over the REST API. MCP gives scope-guarded tools with built-in tenant binding.
Base URL and versioning
https://api.tensorcost.com/v1/
The gateway is versioned at the path prefix. /v1/ is the only currently-supported version; older /api/ paths are deprecated and will be removed after 2026-09-30.
Authentication
All endpoints require a Bearer JWT in the Authorization header.
Authorization: Bearer eyJhbG...
Tokens are minted by identity-service. Three flows:
- Browser SSO — Cognito federation (RS256), verified against the user pool’s JWKS.
- CLI / API client —
POST /v1/auth/login with email + password + tenant_id.
- Service-to-service — backend-issued HS256 tokens with embedded
tenantId. Used by partner integrations.
Login
{
"email": "user@company.com",
"password": "...",
"tenant_id": "uuid-of-tenant"
}
{
"success": true,
"data": {
"accessToken": "eyJhbG...",
"refreshToken": "eyJhbG...",
"user": { "id": 1, "email": "user@company.com", "role": "admin" }
}
}
Refresh
POST /v1/auth/refresh-tokens
{ "refreshToken": "eyJhbG..." }
Tenant scoping
Every request is implicitly scoped to the tenant embedded in the JWT. Cross-tenant reads are unreachable by construction — Postgres Row Level Security enforces it at the storage layer (see architecture).
Users with multi-tenant access pass X-Active-Tenant-Id: <uuid> to switch context per request.
Endpoints by service
The gateway groups endpoints under their owning microservice. The table below is the public surface — internal-only routes (e.g. audit-service direct ingestion, mcp-server tool dispatch) are not listed.
cost-service — /v1/cost/
| Method + path | Purpose |
|---|
GET /v1/cost/summary | Tenant-wide totals across GPU + managed-inference + agent workloads |
GET /v1/cost/by-source | Cost broken down by source: bedrock, azure_openai, vertex, openai_api, anthropic_api, gpu_agent |
GET /v1/cost/by-tag | Group by application / team / environment / owner |
GET /v1/cost/forecast | 30/60/90-day cost forecast with confidence interval |
GET /v1/cost/budgets | Budget hierarchy and current burn rate |
GET /v1/cost/savings-ledger | Realized + projected savings, attributed to recommendation IDs |
GET /v1/cost/recommendations | Active recommendations across all recommenders |
POST /v1/cost/recommendations/:id/accept | Accept a recommendation; opens an action queue entry if remediation is wired |
POST /v1/cost/recommendations/:id/dismiss | Dismiss with reason; feeds the inference feedback loop |
gpu-service — /v1/gpu/
| Method + path | Purpose |
|---|
GET /v1/gpu/instances | Fleet inventory, filterable by state, instance_type, gpu_type, region, tags |
GET /v1/gpu/instances/:id | Detail view + latest metrics |
GET /v1/gpu/instances/:id/metrics | Time-series with hours and limit query params |
GET /v1/gpu/instances/idle | Instances below the idle threshold |
GET /v1/gpu/mig | MIG slice topology by host |
GET /v1/gpu/agents | Agent fleet — connection status, last-seen, version |
ai-service — /v1/ai/
| Method + path | Purpose |
|---|
GET /v1/ai/spend | Per-event managed-inference spend; filter by source, model_id, application, team, time range |
GET /v1/ai/spend/summary | Totals by provider, model, application, team |
GET /v1/ai/unit-economics | Cost per 1K input/output tokens, cost per request, cache-hit rate |
GET /v1/ai/recommenders/routing | Model-routing recommendations |
GET /v1/ai/recommenders/cache | Prompt-cache recommendations |
GET /v1/ai/recommenders/provisioned-throughput | Provisioned-throughput break-even analysis |
GET /v1/ai/anomalies | Detected anomalies (runaway loops, cost spikes, retry storms) |
GET /v1/ai/agents/:id/cost | Per-agent attribution |
GET /v1/ai/workflows/:id/cost | Per-workflow attribution |
alert-service — /v1/alert/
| Method + path | Purpose |
|---|
GET /v1/alert/alerts | Active and historical alerts |
GET /v1/alert/alerts/summary | Counts by severity and type |
PATCH /v1/alert/alerts/:id/resolve | Mark resolved |
PATCH /v1/alert/alerts/:id/ignore | Ignore (audit-trailed) |
POST /v1/alert/alerts/:id/acknowledge | Stops the escalation chain |
GET /v1/alert/rules | Alert rules |
POST /v1/alert/rules | Create a rule (see configuration for the body shape) |
GET /v1/alert/escalation-policies | Escalation policies |
GET /v1/alert/incidents | Incident timeline |
enforcement-service — /v1/enforcement/
| Method + path | Purpose |
|---|
GET /v1/enforcement/policies | Policy list |
GET /v1/enforcement/policies/templates | Pre-built templates |
POST /v1/enforcement/policies/from-template/:templateId | Clone + customize |
POST /v1/enforcement/policies/simulate | Dry run against historical data |
POST /v1/enforcement/policies | Create a policy |
PATCH /v1/enforcement/policies/:id/toggle | Enable / disable |
GET /v1/enforcement/actions | Action queue |
POST /v1/enforcement/actions/:id/approve | Approve a queued action |
POST /v1/enforcement/actions/:id/cancel | Cancel a queued action |
integration-service — /v1/integration/
| Method + path | Purpose |
|---|
GET /v1/integration/connections | Connected cloud accounts and inference providers |
POST /v1/integration/connections | Create a connection (Bedrock, Azure OpenAI, etc.) |
POST /v1/integration/connections/:id/validate | Re-run STS / health check |
GET /v1/integration/connections/:id/sync-history | Per-step ingestion history |
POST /v1/integration/connections/:id/rotate-secret | Trigger secret rotation |
tenant-service — /v1/tenant/
| Method + path | Purpose |
|---|
GET /v1/tenant | Current tenant settings |
GET /v1/tenant/members | Members + roles |
POST /v1/tenant/members | Invite a member |
PATCH /v1/tenant/members/:id/role | Change a member’s role |
GET /v1/tenant/branding | Branding (logo, colors, custom domain) |
GET /v1/tenant/feature-flags | Live flag values for the current user — see feature flags |
identity-service — /v1/identity/
| Method + path | Purpose |
|---|
GET /v1/identity/sso | SAML / OIDC provider configuration |
POST /v1/identity/agent-credentials | Mint a new agent HMAC key (returns plaintext once) |
DELETE /v1/identity/agent-credentials/:keyId | Revoke an agent credential |
GET /v1/identity/audit | Identity-domain audit trail |
notification-service — /v1/notification/
| Method + path | Purpose |
|---|
GET /v1/notification/channels | Configured channels |
POST /v1/notification/channels | Create a Slack / PagerDuty / Teams / email / webhook channel |
POST /v1/notification/channels/:id/test | Send a test notification |
report-service — /v1/report/
| Method + path | Purpose | |
|---|
GET /v1/report/exports/:resource | One-shot export — `format=csv | pdf; resourceisinstances, cost, ai-spend, alerts, metrics, savings` |
GET /v1/report/scheduled | Scheduled export jobs | |
POST /v1/report/scheduled | Create a scheduled export | |
Common response shape
Successful responses:
{
"success": true,
"data": { ... },
"count": 1
}
Errors:
{
"success": false,
"error": "Human-readable message",
"code": "VALIDATION_FAILED",
"trace_id": "7a0db7e045..."
}
The trace_id correlates to the OpenTelemetry trace and is searchable in our internal observability stack — include it in support tickets.
HTTP status codes
| Code | Meaning |
|---|
200 | Success |
201 | Created |
400 | Validation error (body or query parameters) |
401 | Missing or invalid token |
403 | Authenticated but insufficient role / scope |
404 | Resource not found within the current tenant |
409 | Conflict (e.g. duplicate connection) |
429 | Rate limited |
500 | Server error |
Rate limits
Default: 1,000 requests per minute per tenant. Enterprise tier raises this to 10,000 RPM. Limits are enforced at api-gateway via Redis token-bucket and surface in response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1719438962
Per-tenant limits also apply to the gRPC ingress (per ADR-0010 CC-4) and to the managed-inference adapters’ upstream pulls (we don’t burst your CUR / CloudWatch budget).
Webhooks
Every event in the real-time event catalog can also be delivered as a webhook. Configure under Integrations → Webhooks, sign with HMAC-SHA256 (header X-TensorCost-Signature: t=<unix>,v1=<sig>), and verify on receipt.
SDKs
- TypeScript —
@tensorcost/sdk on npm. Mirrors the REST surface with type-safe clients per service.
- Python —
tensorcost on PyPI. Same coverage; designed for notebooks and CI scripts.
- Go —
github.com/vaadh-labs/tensorcost-go for backend integrations.
All SDKs read TENSORCOST_API_TOKEN from env or accept an explicit apiToken constructor argument.
Deprecated endpoints
The pre-/v1/ paths (/api/instances, /api/costs, etc.) remain available but emit a Deprecation header and will be removed after 2026-09-30. New integrations should target /v1/.