Documentation Index
Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
TensorCost is a multi-tenant SaaS control plane for AI cost governance. The platform attributes every dollar across GPU fleets, managed inference, and agent workloads, surfaces recommendations, and (with customer opt-in) takes enforcement actions.
This page describes the system at the level a platform-engineering reviewer cares about: what runs where, how tenants are isolated, how data flows in, and how events flow out.
High-level topology
Backend — gateway plus 14 microservices
Every service is NestJS + TypeScript, communicates over gRPC (proto contracts in packages/proto), persists to its own Postgres schema, and emits events to Redis-backed pub/sub.
| Service | Schema | Responsibility |
|---|
api-gateway | — | REST entry point, JWT auth, RBAC, rate limiting, request fan-out |
tenant-service | tenant.* | Tenants, members, memberships, customer onboarding state machine |
identity-service | identity.* | Cognito federation, agent credentials (HMAC keys), SSO/SAML |
cost-service | cost.* | Cost rollups, budgets, savings ledger, recommendations queue |
gpu-service | gpu.*, monitoring.* | GPU fleet inventory, agent gRPC ingress, MIG topology |
ai-service | ai.* | Managed-inference adapters (Bedrock, Azure OpenAI, Vertex, OpenAI, Anthropic), recommenders, anomaly detection |
inference-cost-service | ai.* (shared) | High-throughput ai_spend_events ingest path; targeted at OLAP shim per ADR-0011 |
enforcement-service | enforcement.* | Policy engine, action queue, approval workflow, dry-run simulation |
alert-service | alert.* | Alert rules, escalation policies, incidents, digests |
integration-service | integration.* | Cloud-account connections, validation, secret rotation |
notification-service | notification.* | Slack, PagerDuty, Teams, email, webhook delivery |
report-service | report.* | Scheduled exports, savings methodology PDFs |
mcp-server | — | MCP endpoint for Claude Desktop / agents (scope-guarded by RBAC) |
embed-service | — | Embedded dashboards / widgets in customer surfaces |
audit-service | audit.* | Cross-tenant audit ledger, data-access trail |
Services run on AWS Fargate behind an Application Load Balancer (REST + WebSocket) and a Network Load Balancer (gRPC). Postgres is RDS Multi-AZ; Redis is ElastiCache cluster mode with replicas.
Frontend — shell plus 14 microfrontends
Module Federation (Vite) loads each microfrontend on demand into a single shell. Each MF owns a domain.
| Microfrontend | Owns |
|---|
shell | Auth, navigation, RBAC gating, feature-flag context |
cost-mf | Cost explorer, savings ledger, budgets, first-30-days view |
gpu-mf | Fleet view, MIG topology, agent health |
ai-mf | Managed-inference dashboard, model routing analytics |
agents-mf | Agent workload attribution, runaway-loop alerts |
integration-mf | Bedrock + cloud-account connection wizards |
enforcement-mf | Policy editor, action queue, approval UI |
alerts-mf | Alert rules + incidents + escalations |
recommendations-mf | The recommendations feed |
reports-mf | Scheduled reports, exports |
customization-mf | Branding, custom domain, notification channels |
tenant-admin-mf | Members, RBAC, SSO, billing |
embed-mf | Embed widgets exposed to customer surfaces |
mcp-admin-mf | MCP scope and tool-grant configuration |
dev-tools-mf | Internal — feature-flag inspector, audit viewer |
Stack: React 18, MUI 5, Vite, Module Federation, RTK Query (migrating to TanStack Query). Real-time updates via socket.io joined to a tenant-scoped room.
Shared packages
Twenty-one packages in packages/ carry the cross-cutting concerns. The ones an integrator most often interacts with:
@tensorcost/proto — gRPC contracts.
@tensorcost/agent-sdk — IMDS auto-detect, HMAC signing, retry/backoff.
@tensorcost/db-utils — Sequelize defaults, RLS hook (runAsBypass), tenant-scoped repository.
@tensorcost/rbac — role + scope checks; the same primitive guards REST, gRPC, and MCP tools.
@tensorcost/audit-trail — append-only audit ledger writes.
@tensorcost/observability — OpenTelemetry init, structured logger, trace-correlated logs.
@tensorcost/feature-flags — LaunchDarkly client + useFeature() for the frontend.
@tensorcost/jobs — scheduled-job framework with per-tenant fairness.
@tensorcost-internal/mcp-framework — MCP tool-registration framework.
Multi-tenancy and Row-Level Security
Every customer is a tenant. Every row in every table that holds tenant data carries a tenant_id and is protected by Postgres Row Level Security.
- Current rollout: 21 of 50 target tables enabled (cost, enforcement, integration, gpu, identity, monitoring schemas in flight).
- Bypass discipline: the only path that bypasses RLS is
runAsBypass(tenantId, fn) from @tensorcost/db-utils, which sets app.tenant_id for the duration of the transaction. gRPC handlers verify the agent’s tenantId via HMAC before any bypass call.
- Linting: any
sequelize.query() that touches tenant data must be wrapped in a transaction or runAsBypass. A RUN-AS-BYPASS-LINT rule is on the way; until then, code review enforces the discipline.
The full RLS pattern, the four compensating controls for our gRPC ingress, and the agent-credential HMAC verifier are documented in SOC 2 readiness.
Agent ingest — gRPC, HMAC, IMDS
The unified GPU agent runs in customer environments and streams metrics over a long-lived bidirectional gRPC stream on TCP/50051. The pattern (formalized in ADR-0010):
- Agent boots, auto-detects EC2 metadata via IMDSv2 (with a
±300s skew guard and a Redis-backed nonce replay defense).
- First gRPC message is
AgentHello with tenant ID, agent ID, nonce, timestamp_unix, and an HMAC-SHA256 signature over those fields.
gpu-service verifies HMAC with timingSafeEqual, rejects nonce reuse via SETNX nonce:{tenantId}:{keyId}:{hex(nonce)} with 600s TTL.
- Verified
tenantId is bound to the stream session; every subsequent DB write runs inside runAsBypass with that tenantId.
- The NLB listener accepts TLS only (wildcard ACM cert); plaintext gRPC is rejected before reaching the handler.
Onboarding ≤15 minutes is the design budget. CloudFormation creates the IAM role + external ID; the customer pastes the role ARN back into the wizard; the validator runs STS-AssumeRole + a sample CloudWatch read.
Managed-inference ingest — read-only, redacted
Bedrock and friends never see an outbound call from us. The data flows in over four read-only paths:
| Source | Latency | What it gives us |
|---|
| Cost and Usage Reports (CUR 2.0) | 24h | Daily line items by model, region, account, tag |
CloudWatch metrics (AWS/Bedrock) | 1–5 min | Invocations, InputTokenCount, OutputTokenCount, cache-hit fields |
| Bedrock invocation logs (S3 / CloudWatch Logs) | Near real-time | Per-request: model, tokens, latency, prompt-cache fields |
| Inference profiles | On-demand | Profile ARN → model + region routing + cost-allocation tags |
Raw prompts and responses are never stored. Hashes only. Customers concerned about content can disable request-metadata capture entirely and still get model/cost/latency attribution. Raw logs live in customer-owned S3 buckets; we read with read-only IAM scoped to the tenant.
Real-time events
socket.io runs on a dedicated ws.tensorcost.com subdomain (REST API Gateway can’t WS-upgrade). Clients are joined to a tenant-scoped room at handshake; cross-tenant events are unreachable by construction.
The full event catalog and reliability model is in real-time events.
Observability
OpenTelemetry across the backend (Node.js auto-instrumentation) and the agent (Python grpc + requests instrumentation). Logs are JSON, tagged with trace_id / span_id. Default deployment shape is the AWS Distro for OpenTelemetry collector as an ECS sidecar forwarding to X-Ray + CloudWatch, but any OTLP-compatible backend works. See observability.
MCP — agents query TensorCost
A built-in MCP server exposes scope-guarded tools for cost queries, fleet inspection, workload attribution, inference analytics, and (where the tenant grants the scope) write tools. Claude Desktop, internal agents, and partner integrations all consume the same surface. Tool grants are RBAC-checked at every call.
Where to read more