Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt

Use this file to discover all available pages before exploring further.

Architecture

TensorCost is a multi-tenant SaaS control plane for AI cost governance. The platform attributes every dollar across GPU fleets, managed inference, and agent workloads, surfaces recommendations, and (with customer opt-in) takes enforcement actions. This page describes the system at the level a platform-engineering reviewer cares about: what runs where, how tenants are isolated, how data flows in, and how events flow out.

High-level topology

Backend — gateway plus 14 microservices

Every service is NestJS + TypeScript, communicates over gRPC (proto contracts in packages/proto), persists to its own Postgres schema, and emits events to Redis-backed pub/sub.
ServiceSchemaResponsibility
api-gatewayREST entry point, JWT auth, RBAC, rate limiting, request fan-out
tenant-servicetenant.*Tenants, members, memberships, customer onboarding state machine
identity-serviceidentity.*Cognito federation, agent credentials (HMAC keys), SSO/SAML
cost-servicecost.*Cost rollups, budgets, savings ledger, recommendations queue
gpu-servicegpu.*, monitoring.*GPU fleet inventory, agent gRPC ingress, MIG topology
ai-serviceai.*Managed-inference adapters (Bedrock, Azure OpenAI, Vertex, OpenAI, Anthropic), recommenders, anomaly detection
inference-cost-serviceai.* (shared)High-throughput ai_spend_events ingest path; targeted at OLAP shim per ADR-0011
enforcement-serviceenforcement.*Policy engine, action queue, approval workflow, dry-run simulation
alert-servicealert.*Alert rules, escalation policies, incidents, digests
integration-serviceintegration.*Cloud-account connections, validation, secret rotation
notification-servicenotification.*Slack, PagerDuty, Teams, email, webhook delivery
report-servicereport.*Scheduled exports, savings methodology PDFs
mcp-serverMCP endpoint for Claude Desktop / agents (scope-guarded by RBAC)
embed-serviceEmbedded dashboards / widgets in customer surfaces
audit-serviceaudit.*Cross-tenant audit ledger, data-access trail
Services run on AWS Fargate behind an Application Load Balancer (REST + WebSocket) and a Network Load Balancer (gRPC). Postgres is RDS Multi-AZ; Redis is ElastiCache cluster mode with replicas.

Frontend — shell plus 14 microfrontends

Module Federation (Vite) loads each microfrontend on demand into a single shell. Each MF owns a domain.
MicrofrontendOwns
shellAuth, navigation, RBAC gating, feature-flag context
cost-mfCost explorer, savings ledger, budgets, first-30-days view
gpu-mfFleet view, MIG topology, agent health
ai-mfManaged-inference dashboard, model routing analytics
agents-mfAgent workload attribution, runaway-loop alerts
integration-mfBedrock + cloud-account connection wizards
enforcement-mfPolicy editor, action queue, approval UI
alerts-mfAlert rules + incidents + escalations
recommendations-mfThe recommendations feed
reports-mfScheduled reports, exports
customization-mfBranding, custom domain, notification channels
tenant-admin-mfMembers, RBAC, SSO, billing
embed-mfEmbed widgets exposed to customer surfaces
mcp-admin-mfMCP scope and tool-grant configuration
dev-tools-mfInternal — feature-flag inspector, audit viewer
Stack: React 18, MUI 5, Vite, Module Federation, RTK Query (migrating to TanStack Query). Real-time updates via socket.io joined to a tenant-scoped room.

Shared packages

Twenty-one packages in packages/ carry the cross-cutting concerns. The ones an integrator most often interacts with:
  • @tensorcost/proto — gRPC contracts.
  • @tensorcost/agent-sdk — IMDS auto-detect, HMAC signing, retry/backoff.
  • @tensorcost/db-utils — Sequelize defaults, RLS hook (runAsBypass), tenant-scoped repository.
  • @tensorcost/rbac — role + scope checks; the same primitive guards REST, gRPC, and MCP tools.
  • @tensorcost/audit-trail — append-only audit ledger writes.
  • @tensorcost/observability — OpenTelemetry init, structured logger, trace-correlated logs.
  • @tensorcost/feature-flags — LaunchDarkly client + useFeature() for the frontend.
  • @tensorcost/jobs — scheduled-job framework with per-tenant fairness.
  • @tensorcost-internal/mcp-framework — MCP tool-registration framework.

Multi-tenancy and Row-Level Security

Every customer is a tenant. Every row in every table that holds tenant data carries a tenant_id and is protected by Postgres Row Level Security.
  • Current rollout: 21 of 50 target tables enabled (cost, enforcement, integration, gpu, identity, monitoring schemas in flight).
  • Bypass discipline: the only path that bypasses RLS is runAsBypass(tenantId, fn) from @tensorcost/db-utils, which sets app.tenant_id for the duration of the transaction. gRPC handlers verify the agent’s tenantId via HMAC before any bypass call.
  • Linting: any sequelize.query() that touches tenant data must be wrapped in a transaction or runAsBypass. A RUN-AS-BYPASS-LINT rule is on the way; until then, code review enforces the discipline.
The full RLS pattern, the four compensating controls for our gRPC ingress, and the agent-credential HMAC verifier are documented in SOC 2 readiness.

Agent ingest — gRPC, HMAC, IMDS

The unified GPU agent runs in customer environments and streams metrics over a long-lived bidirectional gRPC stream on TCP/50051. The pattern (formalized in ADR-0010):
  1. Agent boots, auto-detects EC2 metadata via IMDSv2 (with a ±300s skew guard and a Redis-backed nonce replay defense).
  2. First gRPC message is AgentHello with tenant ID, agent ID, nonce, timestamp_unix, and an HMAC-SHA256 signature over those fields.
  3. gpu-service verifies HMAC with timingSafeEqual, rejects nonce reuse via SETNX nonce:{tenantId}:{keyId}:{hex(nonce)} with 600s TTL.
  4. Verified tenantId is bound to the stream session; every subsequent DB write runs inside runAsBypass with that tenantId.
  5. The NLB listener accepts TLS only (wildcard ACM cert); plaintext gRPC is rejected before reaching the handler.
Onboarding ≤15 minutes is the design budget. CloudFormation creates the IAM role + external ID; the customer pastes the role ARN back into the wizard; the validator runs STS-AssumeRole + a sample CloudWatch read.

Managed-inference ingest — read-only, redacted

Bedrock and friends never see an outbound call from us. The data flows in over four read-only paths:
SourceLatencyWhat it gives us
Cost and Usage Reports (CUR 2.0)24hDaily line items by model, region, account, tag
CloudWatch metrics (AWS/Bedrock)1–5 minInvocations, InputTokenCount, OutputTokenCount, cache-hit fields
Bedrock invocation logs (S3 / CloudWatch Logs)Near real-timePer-request: model, tokens, latency, prompt-cache fields
Inference profilesOn-demandProfile ARN → model + region routing + cost-allocation tags
Raw prompts and responses are never stored. Hashes only. Customers concerned about content can disable request-metadata capture entirely and still get model/cost/latency attribution. Raw logs live in customer-owned S3 buckets; we read with read-only IAM scoped to the tenant.

Real-time events

socket.io runs on a dedicated ws.tensorcost.com subdomain (REST API Gateway can’t WS-upgrade). Clients are joined to a tenant-scoped room at handshake; cross-tenant events are unreachable by construction. The full event catalog and reliability model is in real-time events.

Observability

OpenTelemetry across the backend (Node.js auto-instrumentation) and the agent (Python grpc + requests instrumentation). Logs are JSON, tagged with trace_id / span_id. Default deployment shape is the AWS Distro for OpenTelemetry collector as an ECS sidecar forwarding to X-Ray + CloudWatch, but any OTLP-compatible backend works. See observability.

MCP — agents query TensorCost

A built-in MCP server exposes scope-guarded tools for cost queries, fleet inspection, workload attribution, inference analytics, and (where the tenant grants the scope) write tools. Claude Desktop, internal agents, and partner integrations all consume the same surface. Tool grants are RBAC-checked at every call.

Where to read more