API reference

TensorCost exposes a REST gateway at https://api.tensorcost.com/v1/ that fronts the 14 backend microservices. All endpoints are tenant-scoped and require authentication.

Programmatic agents — Claude Desktop, internal LLM agents, partners — should prefer the MCP server over the REST API. MCP gives scope-guarded tools with built-in tenant binding.

Base URL and versioning

https://api.tensorcost.com/v1/

The gateway is versioned at the path prefix. /v1/ is the only currently-supported version; older /api/ paths are deprecated and will be removed after 2026-09-30.

Authentication

All endpoints require a Bearer JWT in the Authorization header.

Authorization: Bearer eyJhbG...

Tokens are minted by identity-service. Three flows:

Browser SSO — Cognito federation (RS256), verified against the user pool’s JWKS.
CLI / API client — POST /v1/auth/login with email + password + tenant_id.
Service-to-service — backend-issued HS256 tokens with embedded tenantId. Used by partner integrations.

POST /v1/auth/login

{
  "email": "user@company.com",
  "password": "...",
  "tenant_id": "uuid-of-tenant"
}

{
  "success": true,
  "data": {
    "accessToken": "eyJhbG...",
    "refreshToken": "eyJhbG...",
    "user": { "id": 1, "email": "user@company.com", "role": "admin" }
  }
}

Refresh

POST /v1/auth/refresh-tokens

{ "refreshToken": "eyJhbG..." }

Tenant scoping

Every request is implicitly scoped to the tenant embedded in the JWT. Cross-tenant reads are unreachable by construction — Postgres Row Level Security enforces it at the storage layer (see architecture). Users with multi-tenant access pass X-Active-Tenant-Id: <uuid> to switch context per request.

Endpoints by service

The gateway groups endpoints under their owning microservice. The table below is the public surface — internal-only routes (e.g. audit-service direct ingestion, mcp-server tool dispatch) are not listed.

`cost-service` — `/v1/cost/`

Method + path	Purpose
`GET /v1/cost/summary`	Tenant-wide totals across GPU + managed-inference + agent workloads
`GET /v1/cost/by-source`	Cost broken down by source: `bedrock`, `azure_openai`, `vertex`, `openai_api`, `anthropic_api`, `gpu_agent`
`GET /v1/cost/by-tag`	Group by `application` / `team` / `environment` / `owner`
`GET /v1/cost/forecast`	30/60/90-day cost forecast with confidence interval
`GET /v1/cost/budgets`	Budget hierarchy and current burn rate
`GET /v1/cost/savings-ledger`	Realized + projected savings, attributed to recommendation IDs
`GET /v1/cost/recommendations`	Active recommendations across all recommenders
`POST /v1/cost/recommendations/:id/accept`	Accept a recommendation; opens an action queue entry if remediation is wired
`POST /v1/cost/recommendations/:id/dismiss`	Dismiss with reason; feeds the inference feedback loop

`gpu-service` — `/v1/gpu/`

Method + path	Purpose
`GET /v1/gpu/instances`	Fleet inventory, filterable by `state`, `instance_type`, `gpu_type`, `region`, `tags`
`GET /v1/gpu/instances/:id`	Detail view + latest metrics
`GET /v1/gpu/instances/:id/metrics`	Time-series with `hours` and `limit` query params
`GET /v1/gpu/instances/idle`	Instances below the idle threshold
`GET /v1/gpu/mig`	MIG slice topology by host
`GET /v1/gpu/agents`	Agent fleet — connection status, last-seen, version

`ai-service` — `/v1/ai/`

Method + path	Purpose
`GET /v1/ai/spend`	Per-event managed-inference spend; filter by `source`, `model_id`, `application`, `team`, time range
`GET /v1/ai/spend/summary`	Totals by provider, model, application, team
`GET /v1/ai/unit-economics`	Cost per 1K input/output tokens, cost per request, cache-hit rate
`GET /v1/ai/recommenders/routing`	Model-routing recommendations
`GET /v1/ai/recommenders/cache`	Prompt-cache recommendations
`GET /v1/ai/recommenders/provisioned-throughput`	Provisioned-throughput break-even analysis
`GET /v1/ai/anomalies`	Detected anomalies (runaway loops, cost spikes, retry storms)
`GET /v1/ai/agents/:id/cost`	Per-agent attribution
`GET /v1/ai/workflows/:id/cost`	Per-workflow attribution

`alert-service` — `/v1/alert/`

Method + path	Purpose
`GET /v1/alert/alerts`	Active and historical alerts
`GET /v1/alert/alerts/summary`	Counts by severity and type
`PATCH /v1/alert/alerts/:id/resolve`	Mark resolved
`PATCH /v1/alert/alerts/:id/ignore`	Ignore (audit-trailed)
`POST /v1/alert/alerts/:id/acknowledge`	Stops the escalation chain
`GET /v1/alert/rules`	Alert rules
`POST /v1/alert/rules`	Create a rule (see configuration for the body shape)
`GET /v1/alert/escalation-policies`	Escalation policies
`GET /v1/alert/incidents`	Incident timeline

`enforcement-service` — `/v1/enforcement/`

Method + path	Purpose
`GET /v1/enforcement/policies`	Policy list
`GET /v1/enforcement/policies/templates`	Pre-built templates
`POST /v1/enforcement/policies/from-template/:templateId`	Clone + customize
`POST /v1/enforcement/policies/simulate`	Dry run against historical data
`POST /v1/enforcement/policies`	Create a policy
`PATCH /v1/enforcement/policies/:id/toggle`	Enable / disable
`GET /v1/enforcement/actions`	Action queue
`POST /v1/enforcement/actions/:id/approve`	Approve a queued action
`POST /v1/enforcement/actions/:id/cancel`	Cancel a queued action

`integration-service` — `/v1/integration/`

Method + path	Purpose
`GET /v1/integration/connections`	Connected cloud accounts and inference providers
`POST /v1/integration/connections`	Create a connection (Bedrock, Azure OpenAI, etc.)
`POST /v1/integration/connections/:id/validate`	Re-run STS / health check
`GET /v1/integration/connections/:id/sync-history`	Per-step ingestion history
`POST /v1/integration/connections/:id/rotate-secret`	Trigger secret rotation

`tenant-service` — `/v1/tenant/`

Method + path	Purpose
`GET /v1/tenant`	Current tenant settings
`GET /v1/tenant/members`	Members + roles
`POST /v1/tenant/members`	Invite a member
`PATCH /v1/tenant/members/:id/role`	Change a member’s role
`GET /v1/tenant/branding`	Branding (logo, colors, custom domain)
`GET /v1/tenant/feature-flags`	Live flag values for the current user — see feature flags

`identity-service` — `/v1/identity/`

Method + path	Purpose
`GET /v1/identity/sso`	SAML / OIDC provider configuration
`POST /v1/identity/agent-credentials`	Mint a new agent HMAC key (returns plaintext once)
`DELETE /v1/identity/agent-credentials/:keyId`	Revoke an agent credential
`GET /v1/identity/audit`	Identity-domain audit trail

`notification-service` — `/v1/notification/`

Method + path	Purpose
`GET /v1/notification/channels`	Configured channels
`POST /v1/notification/channels`	Create a Slack / PagerDuty / Teams / email / webhook channel
`POST /v1/notification/channels/:id/test`	Send a test notification

`report-service` — `/v1/report/`

Method + path	Purpose
`GET /v1/report/exports/:resource`	One-shot export — `format=csv	pdf`;` resource`is`instances`,` cost`,` ai-spend`,` alerts`,` metrics`,` savings`
`GET /v1/report/scheduled`	Scheduled export jobs
`POST /v1/report/scheduled`	Create a scheduled export

Common response shape

Successful responses:

{
  "success": true,
  "data": { ... },
  "count": 1
}

Errors:

{
  "success": false,
  "error": "Human-readable message",
  "code": "VALIDATION_FAILED",
  "trace_id": "7a0db7e045..."
}

The trace_id correlates to the OpenTelemetry trace and is searchable in our internal observability stack — include it in support tickets.

HTTP status codes

Code	Meaning
`200`	Success
`201`	Created
`400`	Validation error (body or query parameters)
`401`	Missing or invalid token
`403`	Authenticated but insufficient role / scope
`404`	Resource not found within the current tenant
`409`	Conflict (e.g. duplicate connection)
`429`	Rate limited
`500`	Server error

Rate limits

Default: 1,000 requests per minute per tenant. Enterprise tier raises this to 10,000 RPM. Limits are enforced at api-gateway via Redis token-bucket and surface in response headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1719438962

Per-tenant limits also apply to the gRPC ingress (per ADR-0010 CC-4) and to the managed-inference adapters’ upstream pulls (we don’t burst your CUR / CloudWatch budget).

Webhooks

Every event in the real-time event catalog can also be delivered as a webhook. Configure under Integrations → Webhooks, sign with HMAC-SHA256 (header X-TensorCost-Signature: t=<unix>,v1=<sig>), and verify on receipt.

SDKs

TypeScript — @tensorcost/sdk on npm. Mirrors the REST surface with type-safe clients per service.
Python — tensorcost on PyPI. Same coverage; designed for notebooks and CI scripts.
Go — github.com/vaadh-labs/tensorcost-go for backend integrations.

All SDKs read TENSORCOST_API_TOKEN from env or accept an explicit apiToken constructor argument.

Deprecated endpoints

The pre-/v1/ paths (/api/instances, /api/costs, etc.) remain available but emit a Deprecation header and will be removed after 2026-09-30. New integrations should target /v1/.

Getting Started

Architecture

Setup

Features

Reference

API reference

API reference

Base URL and versioning

Authentication

Refresh

Tenant scoping

Endpoints by service

`cost-service` — `/v1/cost/`

`gpu-service` — `/v1/gpu/`

`ai-service` — `/v1/ai/`

`alert-service` — `/v1/alert/`

`enforcement-service` — `/v1/enforcement/`

`integration-service` — `/v1/integration/`

`tenant-service` — `/v1/tenant/`

`identity-service` — `/v1/identity/`

`notification-service` — `/v1/notification/`

`report-service` — `/v1/report/`

Common response shape

HTTP status codes

Rate limits

Webhooks

SDKs

Deprecated endpoints

Getting Started

Architecture

Setup

Features

Reference

Documentation Index

​API reference

​Base URL and versioning

​Authentication

​Login

​Refresh

​Tenant scoping

​Endpoints by service

​cost-service — /v1/cost/

​gpu-service — /v1/gpu/

​ai-service — /v1/ai/

​alert-service — /v1/alert/

​enforcement-service — /v1/enforcement/

​integration-service — /v1/integration/

​tenant-service — /v1/tenant/

​identity-service — /v1/identity/

​notification-service — /v1/notification/

​report-service — /v1/report/

​Common response shape

​HTTP status codes

​Rate limits

​Webhooks

​SDKs

​Deprecated endpoints

API reference

Base URL and versioning

Authentication

Login

Refresh

Tenant scoping

Endpoints by service

`cost-service` — `/v1/cost/`

`gpu-service` — `/v1/gpu/`

`ai-service` — `/v1/ai/`

`alert-service` — `/v1/alert/`

`enforcement-service` — `/v1/enforcement/`

`integration-service` — `/v1/integration/`

`tenant-service` — `/v1/tenant/`

`identity-service` — `/v1/identity/`

`notification-service` — `/v1/notification/`

`report-service` — `/v1/report/`

Common response shape

HTTP status codes

Rate limits

Webhooks

SDKs

Deprecated endpoints