SOC 2 readiness and security posture
TensorCost is built and operated by Vaadhlabs. This page describes our customer-facing security posture: the controls in place today, the compliance certifications in flight, and the architecture details an enterprise security reviewer cares about.We are SOC 2 Type I in progress, targeting completion in Q3 2026 with Type II following on a 6–9 month observation window. ISO 27001 is on the 2027 roadmap. FedRAMP Moderate is on the 2028 roadmap. The trust portal at
trust.tensorcost.com is in build; for now, request the security package directly via security@tensorcost.com.Compliance posture at a glance
| Framework | Status | Target |
|---|---|---|
| SOC 2 Type I | In progress | Q3 2026 |
| SOC 2 Type II | Scheduled | Q1 2027 (6–9 months post Type I) |
| ISO 27001 | Roadmap | 2027 |
| HIPAA BAA | On request | Available for Enterprise tier |
| FedRAMP Moderate | Roadmap | 2028 |
| Penetration test | Annual + on-major-release | Latest summary on request |
Trust service criteria — what we have, what’s in flight
Security (mandatory)
In place:- Multi-tenant Row Level Security on Postgres. 21 tables enforced today, target 50; rolling out wave-by-wave with a
runAsBypassdiscipline lint. See the RLS architecture below. - JWT auth with refresh tokens; Cognito federation for browser SSO; SAML / OIDC for enterprise SSO.
- Three built-in roles (
member,admin,owner); RBAC primitive shared across REST, gRPC, and MCP. - HMAC-SHA256 + Redis nonce replay for every agent gRPC stream (see agent ingress).
- TLS 1.3 in transit for every customer-facing endpoint (REST, WS, gRPC).
- AES-256 encryption at rest (RDS, S3, EBS); per-tenant KMS-derived keys for sensitive credential storage.
- AWS STS AssumeRole + external ID for cross-account reads; 15-minute temp credentials, never persisted.
- Read-only IAM on customer accounts. No
bedrock:Invoke, nos3:Put, nologs:Putanywhere. - Customer-owned S3 buckets for raw Bedrock invocation logs. We never copy raw logs to our account.
- Redaction at ingestion — raw prompts and responses are hashed; only hashes enter
ai_spend_events. See redaction at ingestion. - Rate limiting at the gateway and per-tenant on the gRPC ingress.
- CloudTrail, GuardDuty, AWS Config enabled across the platform AWS organization.
- Dependabot, secret scanning, required PR reviews + status checks on every repo.
- Trust portal scaffolding at
trust.tensorcost.com. - Per-tenant gRPC token-bucket rate limit (ADR-0010 CC-4).
- mTLS upgrade path for the agent ingress (post-Envoy migration).
- AI-service RLS rollout (next wave: 17 tables across the
ai.*schema).
Availability
| Tier | SLA | DR |
|---|---|---|
| Free / design partner | Best-effort | — |
| Growth | 99.9% | RPO 24h, RTO 4h |
| Enterprise | 99.95% | RPO 1h, RTO 1h |
- RDS Multi-AZ with automated backups (35-day retention).
- Redis ElastiCache cluster mode with replicas.
- ALB cross-zone load balancing; NLB targets in three AZs.
- Per-region deployment in
us-east-1(default), with EU-region rollout planned. - Public status page at
status.tensorcost.com(embed coming). - Documented incident response procedure with PagerDuty rotation.
Confidentiality
- All tenant data scoped by
tenant_idand enforced by RLS. - Sensitive config values (webhook URLs, routing keys, cloud credentials) encrypted at the column level using a per-tenant KMS-derived key. Surfaced as masked values in API responses; full values only via
decrypt-scoped endpoints. - Customizable data retention (30–365 days for raw metrics; 90–730 days for recommendations; 7 years for audit).
- Tenant offboarding flow with 30-day soft-delete; hard delete preserves audit-trail rows. See tenant offboarding.
- NDA template signed by every employee and contractor with production access.
Processing integrity
- Analysis audit log captures every anomaly-detection decision: baseline statistics used, current value, each method’s score, composite confidence, classification.
- Inference feedback records every recommendation outcome (accepted / rejected / modified) with reason + post-action measurements; feeds back into ML training.
- Action queue tracks every enforcement action through pending → approved → executing → completed/failed with full audit trail.
- Event persistence via the event store (ADR-0011) — cross-instance durable event ledger.
- Daily ingest reconciliation — agent-emitted vs backend-received counts, with a delta alert.
Privacy
- Privacy policy and DPA template on request.
- Personal data collected: email, name, IP / user-agent on auth, timezone preference.
- Subprocessor list available on request and republished annually.
- Cookie policy on the marketing site.
- DSAR process documented; DSARs handled within 30 days.
Multi-tenant RLS
Every customer is a tenant; every row in every table that holds tenant data carries atenant_id column and is protected by Postgres Row Level Security.
app.tenant_id per request via a Sequelize hook; cross-tenant reads are unreachable by construction. The only path that bypasses RLS is runAsBypass(tenantId, fn) from @tensorcost/db-utils, used by:
- gRPC handlers (after the agent’s
tenantIdis verified by HMAC). - Cron-driven sweeps (anomaly detection, recommenders).
- Cross-service joins.
RUN-AS-BYPASS-LINT rule is on the way; until then, code review enforces the discipline. A botched RLS migration silently returns zero rows, which during a customer demo is indistinguishable from a feature that “doesn’t work” — so each table’s rollout requires both the migration and a wrap+verify pass on every reader before merge.
Agent ingress
Per ADR-0010, the unified GPU agent connects to TensorCost over a long-lived gRPC stream on TCP/50051, fronted by an AWS Network Load Balancer with TLS-only listeners. The task security group accepts0.0.0.0/0 on TCP/50051; four compensating controls make this safe:
| Control | Implementation |
|---|---|
| CC-1: HMAC + replay guard | Every AgentHello carries a tenant-bound HMAC-SHA256, ±300s skew window, and a Redis-backed nonce reuse rejection (SETNX nonce:{tenantId}:{keyId}:{hex(nonce)} with 600s TTL). Verified with timingSafeEqual against a per-deployment Secrets-Manager pepper. |
| CC-2: Per-stream tenant binding | The verified AgentHello populates session.tenantId; every DB write runs inside runAsBypass with the resolved tenantId so RLS policies cannot leak cross-tenant. |
| CC-3: TLS-only at the NLB | Plaintext gRPC connections fail before reaching the handler. mTLS is the planned upgrade path once we have an Envoy / ingress gateway. |
| CC-4: Per-tenant per-stream rate limit | In flight; tracked as a follow-up. Today’s compensating control is the NLB connection-per-source-IP soft limit. |
GPU_GRPC_ALLOW_UNREGISTERED_AGENTS=true was removed; every agent must carry a verified hello.
Redaction at ingestion
Raw prompts and responses never enter our storage.- Bedrock invocation logs are read read-only from customer-owned S3 buckets (or CloudWatch Logs).
- The ingestion parser computes
prompt_hashandresponse_hash(SHA-256) and writes those. - Unit tests assert no raw prompt or response text escapes the parser.
- Customers concerned about content can disable request-metadata capture entirely and still receive model/cost/latency-level attribution and recommendations.
- Anonymized aggregates that feed our public benchmark report are gated behind explicit per-tenant consent.
Customer onboarding — least-privilege IAM
The CFN onboarding stack ships with the minimum read-only IAM policy required. Modes:| Mode | What the role can do | Where the stack deploys |
|---|---|---|
SingleAccount (default) | Read-only Bedrock + CloudWatch + EC2 + Cost Explorer in one account | The single account being monitored |
Organization | All of SingleAccount plus sts:AssumeRole into OrganizationAccountAccessRole (or AWSControlTowerExecution) per member account | The Organization’s management account |
- Consolidated billing with separate payer — payer-account CUR read + member-account Bedrock log read.
- SCP-restricted environments —
RolePathPrefixparameter for Organizations that mandate a custom path. - AWS Control Tower / Landing Zone — defaults to
AWSControlTowerExecutionjump role. - Cross-account CUR — separate bucket-policy snippet shipped as a distinct artifact (security teams routinely review these independently).
Secret rotation
Per the secret-rotation runbook, all platform secrets rotate on a documented cadence:| Secret | Cadence | Owner | Mechanism |
|---|---|---|---|
Agent HMAC pepper (AGENT_HMAC_PEPPER) | Quarterly | Platform | Secrets Manager rotation lambda; old + new accepted for 24h overlap |
| Database master credentials | Quarterly | Platform | RDS-managed rotation |
| Cognito signing keys | Annually | Platform | Cognito-managed |
| LaunchDarkly SDK keys | On suspected compromise + annually | Platform | LD console + Secrets Manager |
| Customer agent credentials | Customer-driven | Customer | POST /v1/identity/agent-credentials for new key, then revoke old |
| Customer integration credentials | 90 days suggested | Customer | POST /v1/integration/connections/:id/rotate-secret |
integration.connection_secret (RLS-enforced).
Tenant offboarding
A documented state machine —active → offboarding_pending → offboarding_archive → deleted — covers every churn or GDPR Art. 17 request:
- Soft delete (30 days) — tenant moves to
offboarding_pending. Ingestion stops, dashboards become read-only, recommendations freeze. Data is retained for 30 days to allow recovery. - Archive (7 days) — tenant moves to
offboarding_archive. A signed export bundle (CSV + JSON) is delivered to the tenant’s offboarding contact. - Hard delete — every tenant-scoped row is purged via chunked DELETE per schema (large schemas like
ai.ai_spend_eventsare deleted in batches to avoid bloat). Customer-owned S3 buckets are not touched — those are the customer’s to manage. - Audit-trail preservation —
audit.*rows survive offboarding indefinitely; required for compliance and legal-hold preservation.
Penetration testing
| Cadence | Scope |
|---|---|
| Annual | Full external + authenticated app + API |
| On major release | Targeted (the new surface) |
| On request | Customer-driven re-test; latest summary available |
Background checks and security training
- Background checks on every team member with production access.
- Annual security awareness training (KnowBe4) for all staff.
- Quarterly tabletop incident-response exercises.
What an enterprise prospect typically asks for
- Security questionnaire (SIG / CAIQ) — completed on request, typically 48–72 hours.
- Architecture diagrams at three levels (exec, platform-lead, security-reviewer). Available on request.
- Subprocessor list and DPA. Available on request.
- SOC 2 bridge letter — issued during the Type I observation window (Q2-Q3 2026).
- Penetration-test executive summary — under NDA.
- Sample audit-trail export — under NDA.
Reporting a vulnerability
Please report suspected vulnerabilities by emailing security@tensorcost.com. Include:- A description of the issue and potential impact.
- Reproduction steps (PoC, logs, screenshots).
- Affected versions / commit hashes.
- Suggested mitigations if any.