Documentation Index
Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt
Use this file to discover all available pages before exploring further.
Real-time events
The TensorCost backend streams live events to the dashboard, partner integrations, and any authenticated client over Socket.IO. Events cover the full lifecycle of the platform — new metrics arriving, instances changing state, alerts firing, ML training, and the runaway-loop detector pausing an agent.
Why a separate WebSocket endpoint
Our REST gateway runs behind API Gateway, which doesn’t support WebSocket upgrades on the REST product. Socket.IO traffic therefore lives on a dedicated subdomain that targets the Application Load Balancer directly.
| Traffic | Hostname | Path | Port |
|---|
| REST API | api.tensorcost.com | /v1/* | 443 |
| WebSocket | ws.tensorcost.com | /ws/socket.io/* | 443 |
| gRPC (agents) | grpc.<region>.tensorcost.com | — | 50051 (TLS) |
All three share an ACM certificate with multiple SANs.
Connecting
import { io } from 'socket.io-client';
const socket = io('https://ws.tensorcost.com', {
path: '/ws',
auth: {
token: 'YOUR_JWT', // Cognito ID token or backend-issued HS256
activeTenantId: 'tenant-...' // optional: for users with multi-tenant access
},
transports: ['websocket', 'polling'],
});
socket.on('event', (event) => {
console.log(event.type, event.data);
});
Authentication
- Cognito (RS256) — verified against the user pool’s JWKS. Resolved to a user via
cognito_sub; tenant membership joins the socket to the right room.
- Backend-issued (HS256) — verified against the shared secret;
tenantId (or tenant_id) embedded directly.
Clients are only subscribed to their own tenant’s room. Cross-tenant events are unreachable by construction.
Event envelope
Every message arrives on the event channel with a uniform shape:
{
"type": "ml.training.completed",
"data": {
"model_type": "gpu_anomaly",
"duration_ms": 7512,
"model_id": 42
},
"timestamp": "2026-04-29T16:41:38.412Z",
"trace_id": "7a0db7e045..."
}
| Field | Description |
|---|
type | Dotted event name; the prefix is the domain (instance.*, recommendation.*, etc.) |
data | Event-specific payload |
timestamp | ISO 8601 publish time |
trace_id | OTel trace ID for cross-system correlation |
Event catalog
| Prefix | Example events | Typical use |
|---|
instance.* | instance.created, instance.stopped, instance.state_changed | Invalidate the instances list, refresh fleet topology |
metrics.* | metrics.collected, metrics.processed, metrics.anomaly_detected | Refresh sparklines, flash an anomaly badge |
cost.* | cost.updated, cost.threshold_exceeded, cost.forecast_generated, cost.budget_breach_predicted | Update cost charts and budget gauges |
alert.* | alert.created, alert.resolved, alert.escalated, alert.suppressed | Inbox UIs, PagerDuty bridges |
agent.* | agent.connected, agent.disconnected, agent.error, agent.health_check | Agent fleet health |
recommendation.* | recommendation.created, recommendation.accepted, recommendation.dismissed | Recommendations feed |
runaway_loop.* | runaway_loop.detected, runaway_loop.paused, runaway_loop.resolved | Agent cost pager |
inference.* | inference.spike_detected, inference.cache_miss_surge | Bedrock / Azure OpenAI dashboards |
action.* | action.queued, action.approved, action.executing, action.completed, action.rolled_back | Enforcement queue |
notification.* | notification.requested, notification.sent, notification.failed | Trace alert delivery |
incident.*, security_incident.* | incident.created, security_incident.escalated | Incident response timeline |
spot.* | spot.interruption_detected, spot.fallback_initiated | AWS spot handling |
scaling.* | scaling.action_triggered, scaling.completed | Auto-scaling surface |
ml.* | ml.training.started, ml.training.completed, ml.training.failed | Live retraining progress |
tenant.*, policy.*, system.* | Housekeeping | Audit log enrichment, admin UIs |
Using events in a React app
The shell pairs Socket.IO with TanStack Query (and legacy RTK Query during the migration). When the backend publishes an event, the client invalidates the right query key so any subscribed component refetches:
const PREFIX_TAG_MAP: Record<string, string[]> = {
'instance.': ['instance'],
'metrics.': ['metric'],
'cost.': ['cost'],
'alert.': ['alert'],
'recommendation.': ['recommendation'],
'runaway_loop.': ['agent', 'recommendation'],
'ml.': ['ml-model'],
};
socket.on('event', (e) => {
const tags = tagsForEvent(e.type);
if (tags) queryClient.invalidateQueries({ queryKey: tags });
});
For more targeted UI (a progress banner, a toast), components can subscribe via a hook like useRunawayLoopEvents() that filters the stream and exposes only the relevant slice.
Reliability guarantees
- In-process priority. Events dispatch to local subscribers synchronously in the publish path with no Redis dependency. A single-instance deployment works end-to-end with no broker.
- Redis for durability (when configured). Events are also
LPUSH’d onto per-type queues for cross-instance replay. A bounded LRU of recently-dispatched event IDs prevents double delivery.
- Best-effort on Redis failure. If Redis is unreachable, events still reach all in-process subscribers (and thus the Socket.IO bridge). Only cross-instance replay is lost.
- Per-type ordering. Events within a single type are ordered. Across types, no ordering guarantees.
- Per-tenant rate limits. Socket.IO emit rate is capped per tenant to protect noisy-neighbor scenarios. Default 100 events/sec/tenant; raise on enterprise tier.
Webhooks
Every event in the catalog can also be delivered as a webhook (HMAC-signed X-TensorCost-Signature: t=<unix>,v1=<sig>). Configure under Integrations → Webhooks. Webhooks complement Socket.IO for systems that prefer pull-once-and-acknowledge delivery (PagerDuty, ServiceNow, internal eventing pipelines).
Debugging connection issues
The dashboard browser console prints [SocketService] connection error when the handshake fails. Common causes:
- Wrong hostname. If the socket points at
api.tensorcost.com, the upgrade is rejected. Verify VITE_WS_URL (or your env-equivalent) targets ws.tensorcost.com.
- Certificate SAN mismatch. A new WS subdomain must be in the ALB cert’s SAN list. Reissue ACM if you added one.
- Expired JWT. Sockets don’t auto-refresh tokens. After ~1 hour, reconnect with a fresh token. The dashboard handles this automatically on route navigation.
- CORS origin. The backend’s
CORS_ORIGIN must include the frontend origin.
MCP
TensorCost ships a built-in MCP server so Claude Desktop, internal LLM agents, and partner integrations can query the platform programmatically.
| Tool | Scope | Returns |
|---|
cost.summary | cost:read | Tenant-wide totals, by-source breakdown, by-tag rollup |
fleet.list_instances | gpu:read | GPU fleet inventory |
workload.cost | workload:read | Per-application / per-team / per-environment cost |
inference.recommendations | ai:read | Active managed-inference recommendations |
agent.cost | agent:read | Per-agent and per-workflow attribution |
alerts.recent | alert:read | Recent alerts, filterable by severity |
connections.health | integration:read | Sync status per cloud / inference provider |
recommendations.accept | recommendation:write | Accept a recommendation |
policies.simulate | enforcement:read | Dry-run an enforcement policy |
policies.create | enforcement:write | Create / update a policy |
Scope-guarded by RBAC
Every tool call is scope-checked via @tensorcost/rbac against the calling identity. Read tools are available on all plan tiers; write tools require mcp-write-tools-enabled and an explicit tenant grant.
Authenticating
MCP clients authenticate the same way as the REST API — Bearer JWT in the connection handshake. Service-to-service callers (e.g. an internal agent) use a backend-issued HS256 token with the embedded tenantId.
Use cases
- CFO / finance — point Claude Desktop at TensorCost MCP and ask “what drove last week’s Bedrock bill?” The model uses
cost.summary + inference.recommendations to compose the answer.
- Platform engineer — wire your internal agent to MCP to assert budgets and pause runaway agents during incidents.
- Partner integration — build a Datadog / Grafana panel that consumes
cost.summary and connections.health for embedded TensorCost views.
The MCP server is configurable per tenant under Settings → MCP. The same RBAC rules apply across REST, gRPC, and MCP.