Documentation Index
Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt
Use this file to discover all available pages before exploring further.
Bedrock integration
Amazon Bedrock is TensorCost’s lead managed-inference adapter. Connect a Bedrock account and within 48 hours you’ll see model-routing, prompt-cache, provisioned-throughput, and runaway-loop recommendations with dollar-quantified impact. This page is the public version of our internal Bedrock onboarding runbook. The internal runbook adds AE / SE choreography; the steps here are everything a customer admin needs.What you get
| Recommender | What it surfaces |
|---|---|
| Model routing | Prompts going to flagship models that cluster like efficient-model traffic — proposes Opus → Haiku, GPT-4o → GPT-4o-mini, etc., with sample request IDs and projected savings |
| Prompt cache | Repeated prefix patterns where Bedrock prompt caching cuts input-token cost ~90%, with the exact code change |
| Provisioned-throughput break-even | On-demand vs PT math per (model, application) — flags both under- and over-commitment |
| Runaway-loop / cost-anomaly | Per-agent rolling cost + invocation baseline; alerts on >3σ spikes, long invocation chains, and cost-per-session anomalies |
Data sources we ingest
Read-only, all four where available. The adapter degrades gracefully — connect what you have, and recommendations get richer as more sources come online.| Source | Latency | Required IAM | What we get |
|---|---|---|---|
| Cost and Usage Reports (CUR 2.0) | 24 h | s3:GetObject on the CUR bucket, cur:DescribeReportDefinitions | Daily line items: service, model ID, operation, region, usage type, cost, tag keys |
CloudWatch metrics (AWS/Bedrock) | 1–5 min | cloudwatch:GetMetricData, cloudwatch:ListMetrics | Invocations, InputTokenCount, OutputTokenCount, InvocationLatency, cache fields |
| Bedrock invocation logs (S3 or CloudWatch Logs) | Near real-time | s3:GetObject or logs:FilterLogEvents, plus KMS decrypt if encrypted | Per-request: model, tokens, latency, prompt-cache fields |
| Application Inference Profiles | On-demand | bedrock:ListInferenceProfiles, bedrock:GetInferenceProfile | Profile ARN → model + region routing + cost-allocation tags |
Onboarding (≤15 minutes)
Open the wizard
Integrations → Add AWS Bedrock. The wizard auto-suggests an
ExternalId — accept it, don’t pick your own.Choose onboarding mode
Pick once; switching modes after deploy means re-deploying the stack.
In Organization mode, TensorCost iterates member accounts via
| Mode | Use when | Stack lives in |
|---|---|---|
SingleAccount (default) | One AWS account, or you’re fine connecting accounts one at a time | The account you want monitored |
Organization | Multi-account AWS Organization (the common enterprise shape) | The Organization’s management account |
OrganizationAccountAccessRole (or AWSControlTowerExecution if your Landing Zone provisioned that name). Day-1 ingest in a 200-account org is roughly 200× slower than SingleAccount; subsequent passes hit the AssumeRole cache.Verify member-account coverage (Organization mode only)
AWS creates Missing accounts will silently produce zero rows. Either deploy the role via StackSets, override
OrganizationAccountAccessRole only when an account is created within the Organization. Accounts invited and absorbed often lack it. Run this in the management account before deploying:OrganizationAccountAccessRoleName to a role that exists, or accept partial coverage.Enable Bedrock model-invocation logging
AWS console → Bedrock → Settings → Model invocation logging → CloudWatch destination (or S3 destination). Note the log-group ARN.
Deploy the CloudFormation stack
The wizard provides a one-click launch link. Or run:The stack creates exactly one IAM role (
TensorCost-BedrockReader-<ExternalId>) with the minimum read-only Bedrock + CloudWatch Logs permissions. In OnboardingMode=Organization it also attaches a managed policy granting organizations:* reads + sts:AssumeRole on the per-member jump role.Multi-account variants
Most of our enterprise customers fall into one of these patterns. Walk the customer through them BEFORE the day-1 deploy or you spend day 2 backing out.Variant A — Consolidated billing with a separate payer
Most enterprise AWS Organizations have one payer (management) account holding the AWS Marketplace subscription + CUR, and N member accounts where workloads run.- Bedrock model-invocation logging happens in each member account (where
InvokeModelfires). - The CUR lives in the payer’s S3 bucket.
- Our IAM role must be deployable in BOTH the payer (for CUR read) AND each member (for Bedrock log read + cost-tag read).
OnboardingMode=Organization into the payer account. Outputs BedrockReaderRoleArn. The CUR-read IAM policy attaches to the same role. Customer confirms the CUR S3 bucket name; we record it on the connection. Member-account onboarding: TensorCost assumes OrganizationAccountAccessRole from the payer into each member, reads Bedrock logs + cost tags. Day-1 cost-data lag is 24h (CUR delivery delay).
Variant B — SCPs blocking IAM creates
Many enterprises attach an SCP at the OU or root level forbidding IAM-role creation in member accounts unless a path prefix matches. Symptom: CFN fails withiam:CreateRole AccessDenied despite the user being account admin.
Mitigation: the wizard exposes a RolePathPrefix parameter (default empty). Set it to /customer-managed/ (or whatever your SCP allows). The SCP statement the customer needs to verify allows arn:aws:iam::*:role/customer-managed/TensorCost-*.
Variant C — AWS Control Tower / Landing Zone
About 30% of enterprise AWS customers run Control Tower (or the predecessor, AWS Landing Zone).- The standard role is
AWSControlTowerExecution, notOrganizationAccountAccessRole. Same trust model, different name. - Account Factory creates accounts via a CodePipeline / CFN flow the customer’s CCoE controls. New accounts inherit the Org’s guardrails (including any SCP restrictions).
- An “Audit” account exists, where CloudTrail and AWS Config aggregate. Customer security teams sometimes ask for our IAM role to live there. Decline — it puts our reads through a high-trust shared account that gets locked down. We need direct access to the workload accounts.
OrganizationAccountAccessRoleName=AWSControlTowerExecution in the wizard.
Variant D — Cross-account CUR replication
The CUR sits in an S3 bucket in the customer’s payer account. We need bucket-policy-level read access from our backend account. The bucket-policy snippet:Top day-1 failures
| Symptom | Likely cause | Remediation |
|---|---|---|
STS AssumeRole AccessDenied mentioning ExternalId | Wizard’s ExternalId doesn’t match the CFN parameter | Re-deploy with the wizard-shown ExternalId, or update the connection row |
STS AssumeRole AccessDenied (no ExternalId mention) | TensorCostBackendAccountId left as the default 000000000000 | Re-deploy with our actual platform account ID (shown in the wizard) |
CloudWatch Logs ResourceNotFoundException | Bedrock model-invocation logging is OFF | Enable it; check the region |
CloudWatch Logs: 0 events found | Logging is on but no Bedrock invocations in the lookback window | Run a single InvokeModel/Converse call; re-poll |
cloudwatch:GetMetricData AccessDenied | Hand-rolled IAM role with too narrow trust policy | Fall back to the published CFN |
organizations:ListAccounts AccessDenied (Organization mode) | Stack deployed in a non-management account | Re-deploy from the management account |
sts:AssumeRole on member-account role: NoSuchEntity (Organization mode) | Member account missing the jump role | StackSets-deploy the role, or override OrganizationAccountAccessRoleName |
| Bedrock invocation logs in a different region | Logging enabled in the wrong region | aws logs describe-log-groups --region <correct> to find the right group |
Data model
Bedrock data is normalized into the sameai_spend_events schema as every other managed-inference adapter and the GPU-hour stream from agents. A single query returns total AI spend across sources.
Fields you can filter on via the API and recommendations feed: source (bedrock, azure_openai, …), account_id, region, model_id, model_family, model_tier, operation, token counts (input, output, cached, cache-write), latency_ms, cost_usd, inference_profile_arn, application, team, environment, user_id, agent_id, workflow_id, request_id, prompt_hash, response_hash.
Security posture
- Read-only IAM only. No
bedrock:Invoke, nos3:Put, nologs:Putanywhere in the customer account. - External ID + AWS account allowlist on every role assumption.
- Customer-owned S3 buckets for raw logs. We never copy raw logs to our account.
- Redaction at ingestion. Hashes only —
prompt_hash,response_hash. Raw prompts and responses never enter our storage. Documented as a guarantee on the security page. - TLS 1.3 in transit, AES-256 at rest, customer-managed KMS keys optional on Enterprise tier.
- Audit log of every data pull, every recommendation generated, every action taken — surfaced to the tenant admin.
What we don’t do (yet)
- Auto-rewrite prompts. We recommend; we don’t mutate. Risky without trust.
- Manage your prompt-cache configuration on your behalf. Requires write-access IAM most customers won’t grant. We give you the exact code change.
- Cross-provider routing (Bedrock Claude → Anthropic API direct). Politically sensitive; on the roadmap.
- Per-user chargeback granularity. Coming with
agent_id+user_idrollups in a future release.
Other managed-inference adapters
The same pattern applies to Azure OpenAI, Vertex AI, OpenAI API, and Anthropic API. Connection wizards live alongside Bedrock under Integrations. Provider-specific differences:- Azure OpenAI — App registration; cost API permissions; per-deployment metrics.
- Vertex AI — GCP service account with Vertex + billing-export read; BigQuery billing export is the cost source.
- OpenAI API — Org-scoped API key with read-only billing scope; usage API for per-request data.
- Anthropic API — Org-scoped API key; usage API for per-request data.
azure-openai-recommenders-enabled, vertex-recommenders-enabled, etc.) — check feature flags for current state.