Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt

Use this file to discover all available pages before exploring further.

Bedrock integration

Amazon Bedrock is TensorCost’s lead managed-inference adapter. Connect a Bedrock account and within 48 hours you’ll see model-routing, prompt-cache, provisioned-throughput, and runaway-loop recommendations with dollar-quantified impact. This page is the public version of our internal Bedrock onboarding runbook. The internal runbook adds AE / SE choreography; the steps here are everything a customer admin needs.

What you get

RecommenderWhat it surfaces
Model routingPrompts going to flagship models that cluster like efficient-model traffic — proposes Opus → Haiku, GPT-4o → GPT-4o-mini, etc., with sample request IDs and projected savings
Prompt cacheRepeated prefix patterns where Bedrock prompt caching cuts input-token cost ~90%, with the exact code change
Provisioned-throughput break-evenOn-demand vs PT math per (model, application) — flags both under- and over-commitment
Runaway-loop / cost-anomalyPer-agent rolling cost + invocation baseline; alerts on >3σ spikes, long invocation chains, and cost-per-session anomalies
Plus all the standard surfaces: cost attribution by model / application / team / user / agent, unit economics (cost per 1K tokens, cost per request, cache-hit rate), and cross-source totals that include any GPU / Azure OpenAI / Vertex / OpenAI / Anthropic data you’ve connected.

Data sources we ingest

Read-only, all four where available. The adapter degrades gracefully — connect what you have, and recommendations get richer as more sources come online.
SourceLatencyRequired IAMWhat we get
Cost and Usage Reports (CUR 2.0)24 hs3:GetObject on the CUR bucket, cur:DescribeReportDefinitionsDaily line items: service, model ID, operation, region, usage type, cost, tag keys
CloudWatch metrics (AWS/Bedrock)1–5 mincloudwatch:GetMetricData, cloudwatch:ListMetricsInvocations, InputTokenCount, OutputTokenCount, InvocationLatency, cache fields
Bedrock invocation logs (S3 or CloudWatch Logs)Near real-times3:GetObject or logs:FilterLogEvents, plus KMS decrypt if encryptedPer-request: model, tokens, latency, prompt-cache fields
Application Inference ProfilesOn-demandbedrock:ListInferenceProfiles, bedrock:GetInferenceProfileProfile ARN → model + region routing + cost-allocation tags
Without invocation logging we can still attribute spend by model and inference profile; we just can’t surface per-prompt caching or routing recommendations.

Onboarding (≤15 minutes)

1

Open the wizard

Integrations → Add AWS Bedrock. The wizard auto-suggests an ExternalId — accept it, don’t pick your own.
2

Choose onboarding mode

Pick once; switching modes after deploy means re-deploying the stack.
ModeUse whenStack lives in
SingleAccount (default)One AWS account, or you’re fine connecting accounts one at a timeThe account you want monitored
OrganizationMulti-account AWS Organization (the common enterprise shape)The Organization’s management account
In Organization mode, TensorCost iterates member accounts via OrganizationAccountAccessRole (or AWSControlTowerExecution if your Landing Zone provisioned that name). Day-1 ingest in a 200-account org is roughly 200× slower than SingleAccount; subsequent passes hit the AssumeRole cache.
3

Verify member-account coverage (Organization mode only)

AWS creates OrganizationAccountAccessRole only when an account is created within the Organization. Accounts invited and absorbed often lack it. Run this in the management account before deploying:
aws organizations list-accounts --query 'Accounts[].Id' --output text \
  | tr '\t' '\n' \
  | while read acct; do
      aws iam get-role \
        --role-name OrganizationAccountAccessRole \
        --profile member-${acct} 2>/dev/null \
        || echo "MISSING in ${acct}"
    done
Missing accounts will silently produce zero rows. Either deploy the role via StackSets, override OrganizationAccountAccessRoleName to a role that exists, or accept partial coverage.
4

Enable Bedrock model-invocation logging

AWS console → Bedrock → Settings → Model invocation logging → CloudWatch destination (or S3 destination). Note the log-group ARN.
The log group must be in the same region as your InvokeModel calls. Logging in us-west-2 while traffic runs in us-east-1 is the most common silent failure — last_sync_status='success' but events_ingested=0 forever.
5

Deploy the CloudFormation stack

The wizard provides a one-click launch link. Or run:
aws cloudformation deploy \
  --template-url https://downloads.tensorcost.com/cfn/bedrock-stack.yml \
  --stack-name tensorcost-bedrock \
  --parameter-overrides \
      ExternalId=$EXTERNAL_ID \
      TensorCostBackendAccountId=$BACKEND_ACCOUNT_ID \
      BedrockLogGroupArn=$LOG_GROUP_ARN \
      OnboardingMode=SingleAccount \
  --capabilities CAPABILITY_NAMED_IAM
The stack creates exactly one IAM role (TensorCost-BedrockReader-<ExternalId>) with the minimum read-only Bedrock + CloudWatch Logs permissions. In OnboardingMode=Organization it also attaches a managed policy granting organizations:* reads + sts:AssumeRole on the per-member jump role.
6

Validate the connection

Paste the BedrockReaderRoleArn output back into the wizard. Click Validate. The wizard polls STS-AssumeRole + a sample CloudWatch read every 5s for up to 2 minutes. Green = done.
Backfill of 90 days of CUR + CloudWatch data runs in the background and typically completes within 30–60 minutes. The four recommenders run nightly; first recommendations land within 48 hours.

Multi-account variants

Most of our enterprise customers fall into one of these patterns. Walk the customer through them BEFORE the day-1 deploy or you spend day 2 backing out.

Variant A — Consolidated billing with a separate payer

Most enterprise AWS Organizations have one payer (management) account holding the AWS Marketplace subscription + CUR, and N member accounts where workloads run.
  • Bedrock model-invocation logging happens in each member account (where InvokeModel fires).
  • The CUR lives in the payer’s S3 bucket.
  • Our IAM role must be deployable in BOTH the payer (for CUR read) AND each member (for Bedrock log read + cost-tag read).
Walk-through: deploy the onboarding CFN with OnboardingMode=Organization into the payer account. Outputs BedrockReaderRoleArn. The CUR-read IAM policy attaches to the same role. Customer confirms the CUR S3 bucket name; we record it on the connection. Member-account onboarding: TensorCost assumes OrganizationAccountAccessRole from the payer into each member, reads Bedrock logs + cost tags. Day-1 cost-data lag is 24h (CUR delivery delay).

Variant B — SCPs blocking IAM creates

Many enterprises attach an SCP at the OU or root level forbidding IAM-role creation in member accounts unless a path prefix matches. Symptom: CFN fails with iam:CreateRole AccessDenied despite the user being account admin. Mitigation: the wizard exposes a RolePathPrefix parameter (default empty). Set it to /customer-managed/ (or whatever your SCP allows). The SCP statement the customer needs to verify allows arn:aws:iam::*:role/customer-managed/TensorCost-*.

Variant C — AWS Control Tower / Landing Zone

About 30% of enterprise AWS customers run Control Tower (or the predecessor, AWS Landing Zone).
  • The standard role is AWSControlTowerExecution, not OrganizationAccountAccessRole. Same trust model, different name.
  • Account Factory creates accounts via a CodePipeline / CFN flow the customer’s CCoE controls. New accounts inherit the Org’s guardrails (including any SCP restrictions).
  • An “Audit” account exists, where CloudTrail and AWS Config aggregate. Customer security teams sometimes ask for our IAM role to live there. Decline — it puts our reads through a high-trust shared account that gets locked down. We need direct access to the workload accounts.
Set OrganizationAccountAccessRoleName=AWSControlTowerExecution in the wizard.

Variant D — Cross-account CUR replication

The CUR sits in an S3 bucket in the customer’s payer account. We need bucket-policy-level read access from our backend account. The bucket-policy snippet:
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "TensorCostCURRead",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::684651436748:role/TensorCost-CURReader"
    },
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::<customer-cur-bucket>",
      "arn:aws:s3:::<customer-cur-bucket>/*"
    ]
  }]
}
We ship this as a separate artifact (not part of the main CFN stack) because customer security teams routinely lock down bucket-policy edits behind a separate ticket, and a tech-lead deploying our CFN often doesn’t have bucket-policy permissions on the CUR bucket.

Top day-1 failures

SymptomLikely causeRemediation
STS AssumeRole AccessDenied mentioning ExternalIdWizard’s ExternalId doesn’t match the CFN parameterRe-deploy with the wizard-shown ExternalId, or update the connection row
STS AssumeRole AccessDenied (no ExternalId mention)TensorCostBackendAccountId left as the default 000000000000Re-deploy with our actual platform account ID (shown in the wizard)
CloudWatch Logs ResourceNotFoundExceptionBedrock model-invocation logging is OFFEnable it; check the region
CloudWatch Logs: 0 events foundLogging is on but no Bedrock invocations in the lookback windowRun a single InvokeModel/Converse call; re-poll
cloudwatch:GetMetricData AccessDeniedHand-rolled IAM role with too narrow trust policyFall back to the published CFN
organizations:ListAccounts AccessDenied (Organization mode)Stack deployed in a non-management accountRe-deploy from the management account
sts:AssumeRole on member-account role: NoSuchEntity (Organization mode)Member account missing the jump roleStackSets-deploy the role, or override OrganizationAccountAccessRoleName
Bedrock invocation logs in a different regionLogging enabled in the wrong regionaws logs describe-log-groups --region <correct> to find the right group

Data model

Bedrock data is normalized into the same ai_spend_events schema as every other managed-inference adapter and the GPU-hour stream from agents. A single query returns total AI spend across sources. Fields you can filter on via the API and recommendations feed: source (bedrock, azure_openai, …), account_id, region, model_id, model_family, model_tier, operation, token counts (input, output, cached, cache-write), latency_ms, cost_usd, inference_profile_arn, application, team, environment, user_id, agent_id, workflow_id, request_id, prompt_hash, response_hash.

Security posture

  • Read-only IAM only. No bedrock:Invoke, no s3:Put, no logs:Put anywhere in the customer account.
  • External ID + AWS account allowlist on every role assumption.
  • Customer-owned S3 buckets for raw logs. We never copy raw logs to our account.
  • Redaction at ingestion. Hashes only — prompt_hash, response_hash. Raw prompts and responses never enter our storage. Documented as a guarantee on the security page.
  • TLS 1.3 in transit, AES-256 at rest, customer-managed KMS keys optional on Enterprise tier.
  • Audit log of every data pull, every recommendation generated, every action taken — surfaced to the tenant admin.

What we don’t do (yet)

  • Auto-rewrite prompts. We recommend; we don’t mutate. Risky without trust.
  • Manage your prompt-cache configuration on your behalf. Requires write-access IAM most customers won’t grant. We give you the exact code change.
  • Cross-provider routing (Bedrock Claude → Anthropic API direct). Politically sensitive; on the roadmap.
  • Per-user chargeback granularity. Coming with agent_id + user_id rollups in a future release.

Other managed-inference adapters

The same pattern applies to Azure OpenAI, Vertex AI, OpenAI API, and Anthropic API. Connection wizards live alongside Bedrock under Integrations. Provider-specific differences:
  • Azure OpenAI — App registration; cost API permissions; per-deployment metrics.
  • Vertex AI — GCP service account with Vertex + billing-export read; BigQuery billing export is the cost source.
  • OpenAI API — Org-scoped API key with read-only billing scope; usage API for per-request data.
  • Anthropic API — Org-scoped API key; usage API for per-request data.
The recommender suite is rolling out per provider behind feature flags (azure-openai-recommenders-enabled, vertex-recommenders-enabled, etc.) — check feature flags for current state.