SOC 2 readiness and security posture

TensorCost is built and operated by Vaadhlabs. This page describes our customer-facing security posture: the controls in place today, the compliance certifications in flight, and the architecture details an enterprise security reviewer cares about.

We are SOC 2 Type I in progress, targeting completion in Q3 2026 with Type II following on a 6–9 month observation window. ISO 27001 is on the 2027 roadmap. FedRAMP Moderate is on the 2028 roadmap. The trust portal at trust.tensorcost.com is in build; for now, request the security package directly via security@tensorcost.com.

Compliance posture at a glance

Framework	Status	Target
SOC 2 Type I	In progress	Q3 2026
SOC 2 Type II	Scheduled	Q1 2027 (6–9 months post Type I)
ISO 27001	Roadmap	2027
HIPAA BAA	On request	Available for Enterprise tier
FedRAMP Moderate	Roadmap	2028
Penetration test	Annual + on-major-release	Latest summary on request

Trust service criteria — what we have, what’s in flight

Security (mandatory)

In place:

Multi-tenant Row Level Security on Postgres. 21 tables enforced today, target 50; rolling out wave-by-wave with a runAsBypass discipline lint. See the RLS architecture below.
JWT auth with refresh tokens; Cognito federation for browser SSO; SAML / OIDC for enterprise SSO.
Three built-in roles (member, admin, owner); RBAC primitive shared across REST, gRPC, and MCP.
HMAC-SHA256 + Redis nonce replay for every agent gRPC stream (see agent ingress).
TLS 1.3 in transit for every customer-facing endpoint (REST, WS, gRPC).
AES-256 encryption at rest (RDS, S3, EBS); per-tenant KMS-derived keys for sensitive credential storage.
AWS STS AssumeRole + external ID for cross-account reads; 15-minute temp credentials, never persisted.
Read-only IAM on customer accounts. No bedrock:Invoke, no s3:Put, no logs:Put anywhere.
Customer-owned S3 buckets for raw Bedrock invocation logs. We never copy raw logs to our account.
Redaction at ingestion — raw prompts and responses are hashed; only hashes enter ai_spend_events. See redaction at ingestion.
Rate limiting at the gateway and per-tenant on the gRPC ingress.
CloudTrail, GuardDuty, AWS Config enabled across the platform AWS organization.
Dependabot, secret scanning, required PR reviews + status checks on every repo.

In flight:

Trust portal scaffolding at trust.tensorcost.com.
Per-tenant gRPC token-bucket rate limit (ADR-0010 CC-4).
mTLS upgrade path for the agent ingress (post-Envoy migration).
AI-service RLS rollout (next wave: 17 tables across the ai.* schema).

Availability

Tier	SLA	DR
Free / design partner	Best-effort	—
Growth	99.9%	RPO 24h, RTO 4h
Enterprise	99.95%	RPO 1h, RTO 1h

RDS Multi-AZ with automated backups (35-day retention).
Redis ElastiCache cluster mode with replicas.
ALB cross-zone load balancing; NLB targets in three AZs.
Per-region deployment in us-east-1 (default), with EU-region rollout planned.
Public status page at status.tensorcost.com (embed coming).
Documented incident response procedure with PagerDuty rotation.

Confidentiality

All tenant data scoped by tenant_id and enforced by RLS.
Sensitive config values (webhook URLs, routing keys, cloud credentials) encrypted at the column level using a per-tenant KMS-derived key. Surfaced as masked values in API responses; full values only via decrypt-scoped endpoints.
Customizable data retention (30–365 days for raw metrics; 90–730 days for recommendations; 7 years for audit).
Tenant offboarding flow with 30-day soft-delete; hard delete preserves audit-trail rows. See tenant offboarding.
NDA template signed by every employee and contractor with production access.

Processing integrity

Analysis audit log captures every anomaly-detection decision: baseline statistics used, current value, each method’s score, composite confidence, classification.
Inference feedback records every recommendation outcome (accepted / rejected / modified) with reason + post-action measurements; feeds back into ML training.
Action queue tracks every enforcement action through pending → approved → executing → completed/failed with full audit trail.
Event persistence via the event store (ADR-0011) — cross-instance durable event ledger.
Daily ingest reconciliation — agent-emitted vs backend-received counts, with a delta alert.

Privacy

Privacy policy and DPA template on request.
Personal data collected: email, name, IP / user-agent on auth, timezone preference.
Subprocessor list available on request and republished annually.
Cookie policy on the marketing site.
DSAR process documented; DSARs handled within 30 days.

Multi-tenant RLS

Every customer is a tenant; every row in every table that holds tenant data carries a tenant_id column and is protected by Postgres Row Level Security.

-- Pattern applied across 21 tables today, target 50
ALTER TABLE cost.savings_ledger ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation
  ON cost.savings_ledger
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

Application code sets app.tenant_id per request via a Sequelize hook; cross-tenant reads are unreachable by construction. The only path that bypasses RLS is runAsBypass(tenantId, fn) from @tensorcost/db-utils, used by:

gRPC handlers (after the agent’s tenantId is verified by HMAC).
Cron-driven sweeps (anomaly detection, recommenders).
Cross-service joins.

A RUN-AS-BYPASS-LINT rule is on the way; until then, code review enforces the discipline. A botched RLS migration silently returns zero rows, which during a customer demo is indistinguishable from a feature that “doesn’t work” — so each table’s rollout requires both the migration and a wrap+verify pass on every reader before merge.

Agent ingress

Per ADR-0010, the unified GPU agent connects to TensorCost over a long-lived gRPC stream on TCP/50051, fronted by an AWS Network Load Balancer with TLS-only listeners. The task security group accepts 0.0.0.0/0 on TCP/50051; four compensating controls make this safe:

Control	Implementation
CC-1: HMAC + replay guard	Every `AgentHello` carries a tenant-bound HMAC-SHA256, `±300s` skew window, and a Redis-backed nonce reuse rejection (`SETNX nonce:{tenantId}:{keyId}:{hex(nonce)}` with 600s TTL). Verified with `timingSafeEqual` against a per-deployment Secrets-Manager pepper.
CC-2: Per-stream tenant binding	The verified `AgentHello` populates `session.tenantId`; every DB write runs inside `runAsBypass` with the resolved `tenantId` so RLS policies cannot leak cross-tenant.
CC-3: TLS-only at the NLB	Plaintext gRPC connections fail before reaching the handler. mTLS is the planned upgrade path once we have an Envoy / ingress gateway.
CC-4: Per-tenant per-stream rate limit	In flight; tracked as a follow-up. Today’s compensating control is the NLB connection-per-source-IP soft limit.

The escape hatch GPU_GRPC_ALLOW_UNREGISTERED_AGENTS=true was removed; every agent must carry a verified hello.

Redaction at ingestion

Raw prompts and responses never enter our storage.

Bedrock invocation logs are read read-only from customer-owned S3 buckets (or CloudWatch Logs).
The ingestion parser computes prompt_hash and response_hash (SHA-256) and writes those.
Unit tests assert no raw prompt or response text escapes the parser.
Customers concerned about content can disable request-metadata capture entirely and still receive model/cost/latency-level attribution and recommendations.
Anonymized aggregates that feed our public benchmark report are gated behind explicit per-tenant consent.

Customer onboarding — least-privilege IAM

The CFN onboarding stack ships with the minimum read-only IAM policy required. Modes:

Mode	What the role can do	Where the stack deploys
`SingleAccount` (default)	Read-only Bedrock + CloudWatch + EC2 + Cost Explorer in one account	The single account being monitored
`Organization`	All of `SingleAccount` plus `sts:AssumeRole` into `OrganizationAccountAccessRole` (or `AWSControlTowerExecution`) per member account	The Organization’s management account

Variants supported in the wizard:

Consolidated billing with separate payer — payer-account CUR read + member-account Bedrock log read.
SCP-restricted environments — RolePathPrefix parameter for Organizations that mandate a custom path.
AWS Control Tower / Landing Zone — defaults to AWSControlTowerExecution jump role.
Cross-account CUR — separate bucket-policy snippet shipped as a distinct artifact (security teams routinely review these independently).

Full onboarding flow with day-by-day expectations, top-5 day-1 failure remediations, and Organization-mode verification snippets lives in our internal customer-onboarding runbook (request via your Slack Connect channel).

Secret rotation

Per the secret-rotation runbook, all platform secrets rotate on a documented cadence:

Secret	Cadence	Owner	Mechanism
Agent HMAC pepper (`AGENT_HMAC_PEPPER`)	Quarterly	Platform	Secrets Manager rotation lambda; old + new accepted for 24h overlap
Database master credentials	Quarterly	Platform	RDS-managed rotation
Cognito signing keys	Annually	Platform	Cognito-managed
LaunchDarkly SDK keys	On suspected compromise + annually	Platform	LD console + Secrets Manager
Customer agent credentials	Customer-driven	Customer	`POST /v1/identity/agent-credentials` for new key, then revoke old
Customer integration credentials	90 days suggested	Customer	`POST /v1/integration/connections/:id/rotate-secret`

Customer-facing secrets (agent HMAC keys, OpenAI API keys we hold, etc.) are stored encrypted with a per-tenant KMS-derived key in integration.connection_secret (RLS-enforced).

Tenant offboarding

A documented state machine — active → offboarding_pending → offboarding_archive → deleted — covers every churn or GDPR Art. 17 request:

Soft delete (30 days) — tenant moves to offboarding_pending. Ingestion stops, dashboards become read-only, recommendations freeze. Data is retained for 30 days to allow recovery.
Archive (7 days) — tenant moves to offboarding_archive. A signed export bundle (CSV + JSON) is delivered to the tenant’s offboarding contact.
Hard delete — every tenant-scoped row is purged via chunked DELETE per schema (large schemas like ai.ai_spend_events are deleted in batches to avoid bloat). Customer-owned S3 buckets are not touched — those are the customer’s to manage.
Audit-trail preservation — audit.* rows survive offboarding indefinitely; required for compliance and legal-hold preservation.

Customer-side teardown (revoking the IAM role, disabling Bedrock invocation logging, removing CFN stacks) is a customer responsibility; we ship a runbook and a Slack Connect handoff.

Penetration testing

Cadence	Scope
Annual	Full external + authenticated app + API
On major release	Targeted (the new surface)
On request	Customer-driven re-test; latest summary available

We use independent third-party testers and publish the executive summary to enterprise prospects under NDA.

Background checks and security training

Background checks on every team member with production access.
Annual security awareness training (KnowBe4) for all staff.
Quarterly tabletop incident-response exercises.

What an enterprise prospect typically asks for

Security questionnaire (SIG / CAIQ) — completed on request, typically 48–72 hours.
Architecture diagrams at three levels (exec, platform-lead, security-reviewer). Available on request.
Subprocessor list and DPA. Available on request.
SOC 2 bridge letter — issued during the Type I observation window (Q2-Q3 2026).
Penetration-test executive summary — under NDA.
Sample audit-trail export — under NDA.

Email security@tensorcost.com for the security package or to start a security review.

Reporting a vulnerability

Please report suspected vulnerabilities by emailing security@tensorcost.com. Include:

A description of the issue and potential impact.
Reproduction steps (PoC, logs, screenshots).
Affected versions / commit hashes.
Suggested mitigations if any.

We acknowledge receipt within 3 business days and provide a status update within 7 business days after triage. Please do not open public issues for suspected vulnerabilities. We coordinate disclosure with the reporter and publish security advisories with remediation details when fixes ship.

​SOC 2 readiness and security posture

​Compliance posture at a glance

​Trust service criteria — what we have, what’s in flight

​Security (mandatory)

​Availability

​Confidentiality

​Processing integrity

​Privacy

​Multi-tenant RLS

​Agent ingress

​Redaction at ingestion

​Customer onboarding — least-privilege IAM

​Secret rotation

​Tenant offboarding

​Penetration testing

​Background checks and security training

​What an enterprise prospect typically asks for

​Reporting a vulnerability

SOC 2 readiness and security posture

Compliance posture at a glance

Trust service criteria — what we have, what’s in flight

Security (mandatory)

Availability

Confidentiality

Processing integrity

Privacy

Multi-tenant RLS

Agent ingress

Redaction at ingestion

Customer onboarding — least-privilege IAM

Secret rotation

Tenant offboarding

Penetration testing

Background checks and security training

What an enterprise prospect typically asks for

Reporting a vulnerability