Architecture
TensorCost is a multi-tenant SaaS control plane for AI cost governance. The platform attributes every dollar across GPU fleets, managed inference, and agent workloads, surfaces recommendations, and (with customer opt-in) takes enforcement actions. This page describes the system at the level a platform-engineering reviewer cares about: what runs where, how tenants are isolated, how data flows in, and how events flow out.High-level topology
Backend — gateway plus 14 microservices
Every service is NestJS + TypeScript, communicates over gRPC (proto contracts inpackages/proto), persists to its own Postgres schema, and emits events to Redis-backed pub/sub.
| Service | Schema | Responsibility |
|---|---|---|
api-gateway | — | REST entry point, JWT auth, RBAC, rate limiting, request fan-out |
tenant-service | tenant.* | Tenants, members, memberships, customer onboarding state machine |
identity-service | identity.* | Cognito federation, agent credentials (HMAC keys), SSO/SAML |
cost-service | cost.* | Cost rollups, budgets, savings ledger, recommendations queue |
gpu-service | gpu.*, monitoring.* | GPU fleet inventory, agent gRPC ingress, MIG topology |
ai-service | ai.* | Managed-inference adapters (Bedrock, Azure OpenAI, Vertex, OpenAI, Anthropic), recommenders, anomaly detection |
inference-cost-service | ai.* (shared) | High-throughput ai_spend_events ingest path; targeted at OLAP shim per ADR-0011 |
enforcement-service | enforcement.* | Policy engine, action queue, approval workflow, dry-run simulation |
alert-service | alert.* | Alert rules, escalation policies, incidents, digests |
integration-service | integration.* | Cloud-account connections, validation, secret rotation |
notification-service | notification.* | Slack, PagerDuty, Teams, email, webhook delivery |
report-service | report.* | Scheduled exports, savings methodology PDFs |
mcp-server | — | MCP endpoint for Claude Desktop / agents (scope-guarded by RBAC) |
embed-service | — | Embedded dashboards / widgets in customer surfaces |
audit-service | audit.* | Cross-tenant audit ledger, data-access trail |
Frontend — shell plus 14 microfrontends
Module Federation (Vite) loads each microfrontend on demand into a single shell. Each MF owns a domain.| Microfrontend | Owns |
|---|---|
shell | Auth, navigation, RBAC gating, feature-flag context |
cost-mf | Cost explorer, savings ledger, budgets, first-30-days view |
gpu-mf | Fleet view, MIG topology, agent health |
ai-mf | Managed-inference dashboard, model routing analytics |
agents-mf | Agent workload attribution, runaway-loop alerts |
integration-mf | Bedrock + cloud-account connection wizards |
enforcement-mf | Policy editor, action queue, approval UI |
alerts-mf | Alert rules + incidents + escalations |
recommendations-mf | The recommendations feed |
reports-mf | Scheduled reports, exports |
customization-mf | Branding, custom domain, notification channels |
tenant-admin-mf | Members, RBAC, SSO, billing |
embed-mf | Embed widgets exposed to customer surfaces |
mcp-admin-mf | MCP scope and tool-grant configuration |
dev-tools-mf | Internal — feature-flag inspector, audit viewer |
socket.io joined to a tenant-scoped room.
Shared packages
Twenty-one packages inpackages/ carry the cross-cutting concerns. The ones an integrator most often interacts with:
@tensorcost/proto— gRPC contracts.@tensorcost/agent-sdk— IMDS auto-detect, HMAC signing, retry/backoff.@tensorcost/db-utils— Sequelize defaults, RLS hook (runAsBypass), tenant-scoped repository.@tensorcost/rbac— role + scope checks; the same primitive guards REST, gRPC, and MCP tools.@tensorcost/audit-trail— append-only audit ledger writes.@tensorcost/observability— OpenTelemetry init, structured logger, trace-correlated logs.@tensorcost/feature-flags— LaunchDarkly client +useFeature()for the frontend.@tensorcost/jobs— scheduled-job framework with per-tenant fairness.@tensorcost-internal/mcp-framework— MCP tool-registration framework.
Multi-tenancy and Row-Level Security
Every customer is atenant. Every row in every table that holds tenant data carries a tenant_id and is protected by Postgres Row Level Security.
- Current rollout: 21 of 50 target tables enabled (cost, enforcement, integration, gpu, identity, monitoring schemas in flight).
- Bypass discipline: the only path that bypasses RLS is
runAsBypass(tenantId, fn)from@tensorcost/db-utils, which setsapp.tenant_idfor the duration of the transaction. gRPC handlers verify the agent’stenantIdvia HMAC before any bypass call. - Linting: any
sequelize.query()that touches tenant data must be wrapped in a transaction orrunAsBypass. ARUN-AS-BYPASS-LINTrule is on the way; until then, code review enforces the discipline.
Agent ingest — gRPC, HMAC, IMDS
The unified GPU agent runs in customer environments and streams metrics over a long-lived bidirectional gRPC stream on TCP/50051. The pattern (formalized in ADR-0010):- Agent boots, auto-detects EC2 metadata via IMDSv2 (with a
±300sskew guard and a Redis-backed nonce replay defense). - First gRPC message is
AgentHellowith tenant ID, agent ID, nonce,timestamp_unix, and an HMAC-SHA256 signature over those fields. gpu-serviceverifies HMAC withtimingSafeEqual, rejects nonce reuse viaSETNX nonce:{tenantId}:{keyId}:{hex(nonce)}with 600s TTL.- Verified
tenantIdis bound to the stream session; every subsequent DB write runs insiderunAsBypasswith thattenantId. - The NLB listener accepts TLS only (wildcard ACM cert); plaintext gRPC is rejected before reaching the handler.
Managed-inference ingest — read-only, redacted
Bedrock and friends never see an outbound call from us. The data flows in over four read-only paths:| Source | Latency | What it gives us |
|---|---|---|
| Cost and Usage Reports (CUR 2.0) | 24h | Daily line items by model, region, account, tag |
CloudWatch metrics (AWS/Bedrock) | 1–5 min | Invocations, InputTokenCount, OutputTokenCount, cache-hit fields |
| Bedrock invocation logs (S3 / CloudWatch Logs) | Near real-time | Per-request: model, tokens, latency, prompt-cache fields |
| Inference profiles | On-demand | Profile ARN → model + region routing + cost-allocation tags |
Real-time events
socket.io runs on a dedicated ws.tensorcost.com subdomain (REST API Gateway can’t WS-upgrade). Clients are joined to a tenant-scoped room at handshake; cross-tenant events are unreachable by construction.
The full event catalog and reliability model is in real-time events.
Observability
OpenTelemetry across the backend (Node.js auto-instrumentation) and the agent (Pythongrpc + requests instrumentation). Logs are JSON, tagged with trace_id / span_id. Default deployment shape is the AWS Distro for OpenTelemetry collector as an ECS sidecar forwarding to X-Ray + CloudWatch, but any OTLP-compatible backend works. See observability.
MCP — agents query TensorCost
A built-in MCP server exposes scope-guarded tools for cost queries, fleet inspection, workload attribution, inference analytics, and (where the tenant grants the scope) write tools. Claude Desktop, internal agents, and partner integrations all consume the same surface. Tool grants are RBAC-checked at every call.Where to read more
- Agent installation — the customer-side install flow.
- Bedrock integration — the lead managed-inference adapter.
- SOC 2 readiness — the security and compliance posture.
- API reference — gateway REST surface.