Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt

Use this file to discover all available pages before exploring further.

API reference

TensorCost exposes a REST gateway at https://api.tensorcost.com/v1/ that fronts the 14 backend microservices. All endpoints are tenant-scoped and require authentication.
Programmatic agents — Claude Desktop, internal LLM agents, partners — should prefer the MCP server over the REST API. MCP gives scope-guarded tools with built-in tenant binding.

Base URL and versioning

https://api.tensorcost.com/v1/
The gateway is versioned at the path prefix. /v1/ is the only currently-supported version; older /api/ paths are deprecated and will be removed after 2026-09-30.

Authentication

All endpoints require a Bearer JWT in the Authorization header.
Authorization: Bearer eyJhbG...
Tokens are minted by identity-service. Three flows:
  • Browser SSO — Cognito federation (RS256), verified against the user pool’s JWKS.
  • CLI / API clientPOST /v1/auth/login with email + password + tenant_id.
  • Service-to-service — backend-issued HS256 tokens with embedded tenantId. Used by partner integrations.

Login

POST /v1/auth/login
{
  "email": "user@company.com",
  "password": "...",
  "tenant_id": "uuid-of-tenant"
}
{
  "success": true,
  "data": {
    "accessToken": "eyJhbG...",
    "refreshToken": "eyJhbG...",
    "user": { "id": 1, "email": "user@company.com", "role": "admin" }
  }
}

Refresh

POST /v1/auth/refresh-tokens
{ "refreshToken": "eyJhbG..." }

Tenant scoping

Every request is implicitly scoped to the tenant embedded in the JWT. Cross-tenant reads are unreachable by construction — Postgres Row Level Security enforces it at the storage layer (see architecture). Users with multi-tenant access pass X-Active-Tenant-Id: <uuid> to switch context per request.

Endpoints by service

The gateway groups endpoints under their owning microservice. The table below is the public surface — internal-only routes (e.g. audit-service direct ingestion, mcp-server tool dispatch) are not listed.

cost-service/v1/cost/

Method + pathPurpose
GET /v1/cost/summaryTenant-wide totals across GPU + managed-inference + agent workloads
GET /v1/cost/by-sourceCost broken down by source: bedrock, azure_openai, vertex, openai_api, anthropic_api, gpu_agent
GET /v1/cost/by-tagGroup by application / team / environment / owner
GET /v1/cost/forecast30/60/90-day cost forecast with confidence interval
GET /v1/cost/budgetsBudget hierarchy and current burn rate
GET /v1/cost/savings-ledgerRealized + projected savings, attributed to recommendation IDs
GET /v1/cost/recommendationsActive recommendations across all recommenders
POST /v1/cost/recommendations/:id/acceptAccept a recommendation; opens an action queue entry if remediation is wired
POST /v1/cost/recommendations/:id/dismissDismiss with reason; feeds the inference feedback loop

gpu-service/v1/gpu/

Method + pathPurpose
GET /v1/gpu/instancesFleet inventory, filterable by state, instance_type, gpu_type, region, tags
GET /v1/gpu/instances/:idDetail view + latest metrics
GET /v1/gpu/instances/:id/metricsTime-series with hours and limit query params
GET /v1/gpu/instances/idleInstances below the idle threshold
GET /v1/gpu/migMIG slice topology by host
GET /v1/gpu/agentsAgent fleet — connection status, last-seen, version

ai-service/v1/ai/

Method + pathPurpose
GET /v1/ai/spendPer-event managed-inference spend; filter by source, model_id, application, team, time range
GET /v1/ai/spend/summaryTotals by provider, model, application, team
GET /v1/ai/unit-economicsCost per 1K input/output tokens, cost per request, cache-hit rate
GET /v1/ai/recommenders/routingModel-routing recommendations
GET /v1/ai/recommenders/cachePrompt-cache recommendations
GET /v1/ai/recommenders/provisioned-throughputProvisioned-throughput break-even analysis
GET /v1/ai/anomaliesDetected anomalies (runaway loops, cost spikes, retry storms)
GET /v1/ai/agents/:id/costPer-agent attribution
GET /v1/ai/workflows/:id/costPer-workflow attribution

alert-service/v1/alert/

Method + pathPurpose
GET /v1/alert/alertsActive and historical alerts
GET /v1/alert/alerts/summaryCounts by severity and type
PATCH /v1/alert/alerts/:id/resolveMark resolved
PATCH /v1/alert/alerts/:id/ignoreIgnore (audit-trailed)
POST /v1/alert/alerts/:id/acknowledgeStops the escalation chain
GET /v1/alert/rulesAlert rules
POST /v1/alert/rulesCreate a rule (see configuration for the body shape)
GET /v1/alert/escalation-policiesEscalation policies
GET /v1/alert/incidentsIncident timeline

enforcement-service/v1/enforcement/

Method + pathPurpose
GET /v1/enforcement/policiesPolicy list
GET /v1/enforcement/policies/templatesPre-built templates
POST /v1/enforcement/policies/from-template/:templateIdClone + customize
POST /v1/enforcement/policies/simulateDry run against historical data
POST /v1/enforcement/policiesCreate a policy
PATCH /v1/enforcement/policies/:id/toggleEnable / disable
GET /v1/enforcement/actionsAction queue
POST /v1/enforcement/actions/:id/approveApprove a queued action
POST /v1/enforcement/actions/:id/cancelCancel a queued action

integration-service/v1/integration/

Method + pathPurpose
GET /v1/integration/connectionsConnected cloud accounts and inference providers
POST /v1/integration/connectionsCreate a connection (Bedrock, Azure OpenAI, etc.)
POST /v1/integration/connections/:id/validateRe-run STS / health check
GET /v1/integration/connections/:id/sync-historyPer-step ingestion history
POST /v1/integration/connections/:id/rotate-secretTrigger secret rotation

tenant-service/v1/tenant/

Method + pathPurpose
GET /v1/tenantCurrent tenant settings
GET /v1/tenant/membersMembers + roles
POST /v1/tenant/membersInvite a member
PATCH /v1/tenant/members/:id/roleChange a member’s role
GET /v1/tenant/brandingBranding (logo, colors, custom domain)
GET /v1/tenant/feature-flagsLive flag values for the current user — see feature flags

identity-service/v1/identity/

Method + pathPurpose
GET /v1/identity/ssoSAML / OIDC provider configuration
POST /v1/identity/agent-credentialsMint a new agent HMAC key (returns plaintext once)
DELETE /v1/identity/agent-credentials/:keyIdRevoke an agent credential
GET /v1/identity/auditIdentity-domain audit trail

notification-service/v1/notification/

Method + pathPurpose
GET /v1/notification/channelsConfigured channels
POST /v1/notification/channelsCreate a Slack / PagerDuty / Teams / email / webhook channel
POST /v1/notification/channels/:id/testSend a test notification

report-service/v1/report/

Method + pathPurpose
GET /v1/report/exports/:resourceOne-shot export — `format=csvpdf; resourceisinstances, cost, ai-spend, alerts, metrics, savings`
GET /v1/report/scheduledScheduled export jobs
POST /v1/report/scheduledCreate a scheduled export

Common response shape

Successful responses:
{
  "success": true,
  "data": { ... },
  "count": 1
}
Errors:
{
  "success": false,
  "error": "Human-readable message",
  "code": "VALIDATION_FAILED",
  "trace_id": "7a0db7e045..."
}
The trace_id correlates to the OpenTelemetry trace and is searchable in our internal observability stack — include it in support tickets.

HTTP status codes

CodeMeaning
200Success
201Created
400Validation error (body or query parameters)
401Missing or invalid token
403Authenticated but insufficient role / scope
404Resource not found within the current tenant
409Conflict (e.g. duplicate connection)
429Rate limited
500Server error

Rate limits

Default: 1,000 requests per minute per tenant. Enterprise tier raises this to 10,000 RPM. Limits are enforced at api-gateway via Redis token-bucket and surface in response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1719438962
Per-tenant limits also apply to the gRPC ingress (per ADR-0010 CC-4) and to the managed-inference adapters’ upstream pulls (we don’t burst your CUR / CloudWatch budget).

Webhooks

Every event in the real-time event catalog can also be delivered as a webhook. Configure under Integrations → Webhooks, sign with HMAC-SHA256 (header X-TensorCost-Signature: t=<unix>,v1=<sig>), and verify on receipt.

SDKs

  • TypeScript@tensorcost/sdk on npm. Mirrors the REST surface with type-safe clients per service.
  • Pythontensorcost on PyPI. Same coverage; designed for notebooks and CI scripts.
  • Gogithub.com/vaadh-labs/tensorcost-go for backend integrations.
All SDKs read TENSORCOST_API_TOKEN from env or accept an explicit apiToken constructor argument.

Deprecated endpoints

The pre-/v1/ paths (/api/instances, /api/costs, etc.) remain available but emit a Deprecation header and will be removed after 2026-09-30. New integrations should target /v1/.