Architecture

TensorCost is a multi-tenant SaaS control plane for AI cost governance. The platform attributes every dollar across GPU fleets, managed inference, and agent workloads, surfaces recommendations, and (with customer opt-in) takes enforcement actions. This page describes the system at the level a platform-engineering reviewer cares about: what runs where, how tenants are isolated, how data flows in, and how events flow out.

High-level topology

Backend — gateway plus 14 microservices

Every service is NestJS + TypeScript, communicates over gRPC (proto contracts in packages/proto), persists to its own Postgres schema, and emits events to Redis-backed pub/sub.

Service	Schema	Responsibility
`api-gateway`	—	REST entry point, JWT auth, RBAC, rate limiting, request fan-out
`tenant-service`	`tenant.*`	Tenants, members, memberships, customer onboarding state machine
`identity-service`	`identity.*`	Cognito federation, agent credentials (HMAC keys), SSO/SAML
`cost-service`	`cost.*`	Cost rollups, budgets, savings ledger, recommendations queue
`gpu-service`	`gpu.`, `monitoring.`	GPU fleet inventory, agent gRPC ingress, MIG topology
`ai-service`	`ai.*`	Managed-inference adapters (Bedrock, Azure OpenAI, Vertex, OpenAI, Anthropic), recommenders, anomaly detection
`inference-cost-service`	`ai.*` (shared)	High-throughput `ai_spend_events` ingest path; targeted at OLAP shim per ADR-0011
`enforcement-service`	`enforcement.*`	Policy engine, action queue, approval workflow, dry-run simulation
`alert-service`	`alert.*`	Alert rules, escalation policies, incidents, digests
`integration-service`	`integration.*`	Cloud-account connections, validation, secret rotation
`notification-service`	`notification.*`	Slack, PagerDuty, Teams, email, webhook delivery
`report-service`	`report.*`	Scheduled exports, savings methodology PDFs
`mcp-server`	—	MCP endpoint for Claude Desktop / agents (scope-guarded by RBAC)
`embed-service`	—	Embedded dashboards / widgets in customer surfaces
`audit-service`	`audit.*`	Cross-tenant audit ledger, data-access trail

Services run on AWS Fargate behind an Application Load Balancer (REST + WebSocket) and a Network Load Balancer (gRPC). Postgres is RDS Multi-AZ; Redis is ElastiCache cluster mode with replicas.

Frontend — shell plus 14 microfrontends

Module Federation (Vite) loads each microfrontend on demand into a single shell. Each MF owns a domain.

Microfrontend	Owns
`shell`	Auth, navigation, RBAC gating, feature-flag context
`cost-mf`	Cost explorer, savings ledger, budgets, first-30-days view
`gpu-mf`	Fleet view, MIG topology, agent health
`ai-mf`	Managed-inference dashboard, model routing analytics
`agents-mf`	Agent workload attribution, runaway-loop alerts
`integration-mf`	Bedrock + cloud-account connection wizards
`enforcement-mf`	Policy editor, action queue, approval UI
`alerts-mf`	Alert rules + incidents + escalations
`recommendations-mf`	The recommendations feed
`reports-mf`	Scheduled reports, exports
`customization-mf`	Branding, custom domain, notification channels
`tenant-admin-mf`	Members, RBAC, SSO, billing
`embed-mf`	Embed widgets exposed to customer surfaces
`mcp-admin-mf`	MCP scope and tool-grant configuration
`dev-tools-mf`	Internal — feature-flag inspector, audit viewer

Stack: React 18, MUI 5, Vite, Module Federation, RTK Query (migrating to TanStack Query). Real-time updates via socket.io joined to a tenant-scoped room.

Shared packages

Twenty-one packages in packages/ carry the cross-cutting concerns. The ones an integrator most often interacts with:

@tensorcost/proto — gRPC contracts.
@tensorcost/agent-sdk — IMDS auto-detect, HMAC signing, retry/backoff.
@tensorcost/db-utils — Sequelize defaults, RLS hook (runAsBypass), tenant-scoped repository.
@tensorcost/rbac — role + scope checks; the same primitive guards REST, gRPC, and MCP tools.
@tensorcost/audit-trail — append-only audit ledger writes.
@tensorcost/observability — OpenTelemetry init, structured logger, trace-correlated logs.
@tensorcost/feature-flags — LaunchDarkly client + useFeature() for the frontend.
@tensorcost/jobs — scheduled-job framework with per-tenant fairness.
@tensorcost-internal/mcp-framework — MCP tool-registration framework.

Multi-tenancy and Row-Level Security

Every customer is a tenant. Every row in every table that holds tenant data carries a tenant_id and is protected by Postgres Row Level Security.

Current rollout: 21 of 50 target tables enabled (cost, enforcement, integration, gpu, identity, monitoring schemas in flight).
Bypass discipline: the only path that bypasses RLS is runAsBypass(tenantId, fn) from @tensorcost/db-utils, which sets app.tenant_id for the duration of the transaction. gRPC handlers verify the agent’s tenantId via HMAC before any bypass call.
Linting: any sequelize.query() that touches tenant data must be wrapped in a transaction or runAsBypass. A RUN-AS-BYPASS-LINT rule is on the way; until then, code review enforces the discipline.

The full RLS pattern, the four compensating controls for our gRPC ingress, and the agent-credential HMAC verifier are documented in SOC 2 readiness.

Agent ingest — gRPC, HMAC, IMDS

The unified GPU agent runs in customer environments and streams metrics over a long-lived bidirectional gRPC stream on TCP/50051. The pattern (formalized in ADR-0010):

Agent boots, auto-detects EC2 metadata via IMDSv2 (with a ±300s skew guard and a Redis-backed nonce replay defense).
First gRPC message is AgentHello with tenant ID, agent ID, nonce, timestamp_unix, and an HMAC-SHA256 signature over those fields.
gpu-service verifies HMAC with timingSafeEqual, rejects nonce reuse via SETNX nonce:{tenantId}:{keyId}:{hex(nonce)} with 600s TTL.
Verified tenantId is bound to the stream session; every subsequent DB write runs inside runAsBypass with that tenantId.
The NLB listener accepts TLS only (wildcard ACM cert); plaintext gRPC is rejected before reaching the handler.

Onboarding ≤15 minutes is the design budget. CloudFormation creates the IAM role + external ID; the customer pastes the role ARN back into the wizard; the validator runs STS-AssumeRole + a sample CloudWatch read.

Managed-inference ingest — read-only, redacted

Bedrock and friends never see an outbound call from us. The data flows in over four read-only paths:

Source	Latency	What it gives us
Cost and Usage Reports (CUR 2.0)	24h	Daily line items by model, region, account, tag
CloudWatch metrics (`AWS/Bedrock`)	1–5 min	`Invocations`, `InputTokenCount`, `OutputTokenCount`, cache-hit fields
Bedrock invocation logs (S3 / CloudWatch Logs)	Near real-time	Per-request: model, tokens, latency, prompt-cache fields
Inference profiles	On-demand	Profile ARN → model + region routing + cost-allocation tags

Raw prompts and responses are never stored. Hashes only. Customers concerned about content can disable request-metadata capture entirely and still get model/cost/latency attribution. Raw logs live in customer-owned S3 buckets; we read with read-only IAM scoped to the tenant.

Real-time events

socket.io runs on a dedicated ws.tensorcost.com subdomain (REST API Gateway can’t WS-upgrade). Clients are joined to a tenant-scoped room at handshake; cross-tenant events are unreachable by construction. The full event catalog and reliability model is in real-time events.

Observability

OpenTelemetry across the backend (Node.js auto-instrumentation) and the agent (Python grpc + requests instrumentation). Logs are JSON, tagged with trace_id / span_id. Default deployment shape is the AWS Distro for OpenTelemetry collector as an ECS sidecar forwarding to X-Ray + CloudWatch, but any OTLP-compatible backend works. See observability.

MCP — agents query TensorCost

A built-in MCP server exposes scope-guarded tools for cost queries, fleet inspection, workload attribution, inference analytics, and (where the tenant grants the scope) write tools. Claude Desktop, internal agents, and partner integrations all consume the same surface. Tool grants are RBAC-checked at every call.

Where to read more

Agent installation — the customer-side install flow.
Bedrock integration — the lead managed-inference adapter.
SOC 2 readiness — the security and compliance posture.
API reference — gateway REST surface.

Getting Started

Architecture

Setup

Features

Reference

Architecture

Architecture

High-level topology

Backend — gateway plus 14 microservices

Frontend — shell plus 14 microfrontends

Shared packages

Multi-tenancy and Row-Level Security

Agent ingest — gRPC, HMAC, IMDS

Managed-inference ingest — read-only, redacted

Real-time events

Observability

MCP — agents query TensorCost

Where to read more

Getting Started

Architecture

Setup

Features

Reference

Documentation Index

​Architecture

​High-level topology

​Backend — gateway plus 14 microservices

​Frontend — shell plus 14 microfrontends

​Shared packages

​Multi-tenancy and Row-Level Security

​Agent ingest — gRPC, HMAC, IMDS

​Managed-inference ingest — read-only, redacted

​Real-time events

​Observability

​MCP — agents query TensorCost

​Where to read more

Architecture

High-level topology

Backend — gateway plus 14 microservices

Frontend — shell plus 14 microfrontends

Shared packages

Multi-tenancy and Row-Level Security

Agent ingest — gRPC, HMAC, IMDS

Managed-inference ingest — read-only, redacted

Real-time events

Observability

MCP — agents query TensorCost

Where to read more