Bedrock integration

Amazon Bedrock is TensorCost’s lead managed-inference adapter. Connect a Bedrock account and within 48 hours you’ll see model-routing, prompt-cache, provisioned-throughput, and runaway-loop recommendations with dollar-quantified impact. This page is the public version of our internal Bedrock onboarding runbook. The internal runbook adds AE / SE choreography; the steps here are everything a customer admin needs.

What you get

Recommender	What it surfaces
Model routing	Prompts going to flagship models that cluster like efficient-model traffic — proposes Opus → Haiku, GPT-4o → GPT-4o-mini, etc., with sample request IDs and projected savings
Prompt cache	Repeated prefix patterns where Bedrock prompt caching cuts input-token cost ~90%, with the exact code change
Provisioned-throughput break-even	On-demand vs PT math per `(model, application)` — flags both under- and over-commitment
Runaway-loop / cost-anomaly	Per-agent rolling cost + invocation baseline; alerts on >3σ spikes, long invocation chains, and cost-per-session anomalies

Plus all the standard surfaces: cost attribution by model / application / team / user / agent, unit economics (cost per 1K tokens, cost per request, cache-hit rate), and cross-source totals that include any GPU / Azure OpenAI / Vertex / OpenAI / Anthropic data you’ve connected.

Data sources we ingest

Read-only, all four where available. The adapter degrades gracefully — connect what you have, and recommendations get richer as more sources come online.

Source	Latency	Required IAM	What we get
Cost and Usage Reports (CUR 2.0)	24 h	`s3:GetObject` on the CUR bucket, `cur:DescribeReportDefinitions`	Daily line items: service, model ID, operation, region, usage type, cost, tag keys
CloudWatch metrics (`AWS/Bedrock`)	1–5 min	`cloudwatch:GetMetricData`, `cloudwatch:ListMetrics`	`Invocations`, `InputTokenCount`, `OutputTokenCount`, `InvocationLatency`, cache fields
Bedrock invocation logs (S3 or CloudWatch Logs)	Near real-time	`s3:GetObject` or `logs:FilterLogEvents`, plus KMS decrypt if encrypted	Per-request: model, tokens, latency, prompt-cache fields
Application Inference Profiles	On-demand	`bedrock:ListInferenceProfiles`, `bedrock:GetInferenceProfile`	Profile ARN → model + region routing + cost-allocation tags

Without invocation logging we can still attribute spend by model and inference profile; we just can’t surface per-prompt caching or routing recommendations.

Onboarding (≤15 minutes)

Open the wizard

Integrations → Add AWS Bedrock. The wizard auto-suggests an ExternalId — accept it, don’t pick your own.

Choose onboarding mode

Pick once; switching modes after deploy means re-deploying the stack.

Mode	Use when	Stack lives in
`SingleAccount` (default)	One AWS account, or you’re fine connecting accounts one at a time	The account you want monitored
`Organization`	Multi-account AWS Organization (the common enterprise shape)	The Organization’s management account

In Organization mode, TensorCost iterates member accounts via OrganizationAccountAccessRole (or AWSControlTowerExecution if your Landing Zone provisioned that name). Day-1 ingest in a 200-account org is roughly 200× slower than SingleAccount; subsequent passes hit the AssumeRole cache.

Verify member-account coverage (Organization mode only)

AWS creates OrganizationAccountAccessRole only when an account is created within the Organization. Accounts invited and absorbed often lack it. Run this in the management account before deploying:

aws organizations list-accounts --query 'Accounts[].Id' --output text \
  | tr '\t' '\n' \
  | while read acct; do
      aws iam get-role \
        --role-name OrganizationAccountAccessRole \
        --profile member-${acct} 2>/dev/null \
        || echo "MISSING in ${acct}"
    done

Missing accounts will silently produce zero rows. Either deploy the role via StackSets, override OrganizationAccountAccessRoleName to a role that exists, or accept partial coverage.

Enable Bedrock model-invocation logging

AWS console → Bedrock → Settings → Model invocation logging → CloudWatch destination (or S3 destination). Note the log-group ARN.

The log group must be in the same region as your InvokeModel calls. Logging in us-west-2 while traffic runs in us-east-1 is the most common silent failure — last_sync_status='success' but events_ingested=0 forever.

Deploy the CloudFormation stack

The wizard provides a one-click launch link. Or run:

aws cloudformation deploy \
  --template-url https://downloads.tensorcost.com/cfn/bedrock-stack.yml \
  --stack-name tensorcost-bedrock \
  --parameter-overrides \
      ExternalId=$EXTERNAL_ID \
      TensorCostBackendAccountId=$BACKEND_ACCOUNT_ID \
      BedrockLogGroupArn=$LOG_GROUP_ARN \
      OnboardingMode=SingleAccount \
  --capabilities CAPABILITY_NAMED_IAM

The stack creates exactly one IAM role (TensorCost-BedrockReader-<ExternalId>) with the minimum read-only Bedrock + CloudWatch Logs permissions. In OnboardingMode=Organization it also attaches a managed policy granting organizations:* reads + sts:AssumeRole on the per-member jump role.

Validate the connection

Paste the BedrockReaderRoleArn output back into the wizard. Click Validate. The wizard polls STS-AssumeRole + a sample CloudWatch read every 5s for up to 2 minutes. Green = done.

Backfill of 90 days of CUR + CloudWatch data runs in the background and typically completes within 30–60 minutes. The four recommenders run nightly; first recommendations land within 48 hours.

Multi-account variants

Most of our enterprise customers fall into one of these patterns. Walk the customer through them BEFORE the day-1 deploy or you spend day 2 backing out.

Variant A — Consolidated billing with a separate payer

Most enterprise AWS Organizations have one payer (management) account holding the AWS Marketplace subscription + CUR, and N member accounts where workloads run.

Bedrock model-invocation logging happens in each member account (where InvokeModel fires).
The CUR lives in the payer’s S3 bucket.
Our IAM role must be deployable in BOTH the payer (for CUR read) AND each member (for Bedrock log read + cost-tag read).

Walk-through: deploy the onboarding CFN with OnboardingMode=Organization into the payer account. Outputs BedrockReaderRoleArn. The CUR-read IAM policy attaches to the same role. Customer confirms the CUR S3 bucket name; we record it on the connection. Member-account onboarding: TensorCost assumes OrganizationAccountAccessRole from the payer into each member, reads Bedrock logs + cost tags. Day-1 cost-data lag is 24h (CUR delivery delay).

Variant B — SCPs blocking IAM creates

Many enterprises attach an SCP at the OU or root level forbidding IAM-role creation in member accounts unless a path prefix matches. Symptom: CFN fails with iam:CreateRole AccessDenied despite the user being account admin. Mitigation: the wizard exposes a RolePathPrefix parameter (default empty). Set it to /customer-managed/ (or whatever your SCP allows). The SCP statement the customer needs to verify allows arn:aws:iam::*:role/customer-managed/TensorCost-*.

Variant C — AWS Control Tower / Landing Zone

About 30% of enterprise AWS customers run Control Tower (or the predecessor, AWS Landing Zone).

The standard role is AWSControlTowerExecution, not OrganizationAccountAccessRole. Same trust model, different name.
Account Factory creates accounts via a CodePipeline / CFN flow the customer’s CCoE controls. New accounts inherit the Org’s guardrails (including any SCP restrictions).
An “Audit” account exists, where CloudTrail and AWS Config aggregate. Customer security teams sometimes ask for our IAM role to live there. Decline — it puts our reads through a high-trust shared account that gets locked down. We need direct access to the workload accounts.

Set OrganizationAccountAccessRoleName=AWSControlTowerExecution in the wizard.

Variant D — Cross-account CUR replication

The CUR sits in an S3 bucket in the customer’s payer account. We need bucket-policy-level read access from our backend account. The bucket-policy snippet:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "TensorCostCURRead",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::684651436748:role/TensorCost-CURReader"
    },
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::<customer-cur-bucket>",
      "arn:aws:s3:::<customer-cur-bucket>/*"
    ]
  }]
}

We ship this as a separate artifact (not part of the main CFN stack) because customer security teams routinely lock down bucket-policy edits behind a separate ticket, and a tech-lead deploying our CFN often doesn’t have bucket-policy permissions on the CUR bucket.

Top day-1 failures

Symptom	Likely cause	Remediation
`STS AssumeRole AccessDenied` mentioning `ExternalId`	Wizard’s `ExternalId` doesn’t match the CFN parameter	Re-deploy with the wizard-shown ExternalId, or update the connection row
`STS AssumeRole AccessDenied` (no ExternalId mention)	`TensorCostBackendAccountId` left as the default `000000000000`	Re-deploy with our actual platform account ID (shown in the wizard)
`CloudWatch Logs ResourceNotFoundException`	Bedrock model-invocation logging is OFF	Enable it; check the region
`CloudWatch Logs: 0 events found`	Logging is on but no Bedrock invocations in the lookback window	Run a single `InvokeModel`/`Converse` call; re-poll
`cloudwatch:GetMetricData AccessDenied`	Hand-rolled IAM role with too narrow trust policy	Fall back to the published CFN
`organizations:ListAccounts AccessDenied` (Organization mode)	Stack deployed in a non-management account	Re-deploy from the management account
`sts:AssumeRole on member-account role: NoSuchEntity` (Organization mode)	Member account missing the jump role	StackSets-deploy the role, or override `OrganizationAccountAccessRoleName`
Bedrock invocation logs in a different region	Logging enabled in the wrong region	`aws logs describe-log-groups --region <correct>` to find the right group

Data model

Bedrock data is normalized into the same ai_spend_events schema as every other managed-inference adapter and the GPU-hour stream from agents. A single query returns total AI spend across sources. Fields you can filter on via the API and recommendations feed: source (bedrock, azure_openai, …), account_id, region, model_id, model_family, model_tier, operation, token counts (input, output, cached, cache-write), latency_ms, cost_usd, inference_profile_arn, application, team, environment, user_id, agent_id, workflow_id, request_id, prompt_hash, response_hash.

Security posture

Read-only IAM only. No bedrock:Invoke, no s3:Put, no logs:Put anywhere in the customer account.
External ID + AWS account allowlist on every role assumption.
Customer-owned S3 buckets for raw logs. We never copy raw logs to our account.
Redaction at ingestion. Hashes only — prompt_hash, response_hash. Raw prompts and responses never enter our storage. Documented as a guarantee on the security page.
TLS 1.3 in transit, AES-256 at rest, customer-managed KMS keys optional on Enterprise tier.
Audit log of every data pull, every recommendation generated, every action taken — surfaced to the tenant admin.

What we don’t do (yet)

Auto-rewrite prompts. We recommend; we don’t mutate. Risky without trust.
Manage your prompt-cache configuration on your behalf. Requires write-access IAM most customers won’t grant. We give you the exact code change.
Cross-provider routing (Bedrock Claude → Anthropic API direct). Politically sensitive; on the roadmap.
Per-user chargeback granularity. Coming with agent_id + user_id rollups in a future release.

Other managed-inference adapters

The same pattern applies to Azure OpenAI, Vertex AI, OpenAI API, and Anthropic API. Connection wizards live alongside Bedrock under Integrations. Provider-specific differences:

Azure OpenAI — App registration; cost API permissions; per-deployment metrics.
Vertex AI — GCP service account with Vertex + billing-export read; BigQuery billing export is the cost source.
OpenAI API — Org-scoped API key with read-only billing scope; usage API for per-request data.
Anthropic API — Org-scoped API key; usage API for per-request data.

The recommender suite is rolling out per provider behind feature flags (azure-openai-recommenders-enabled, vertex-recommenders-enabled, etc.) — check feature flags for current state.

Getting Started

Architecture

Setup

Features

Reference

Bedrock integration

Bedrock integration

What you get

Data sources we ingest

Onboarding (≤15 minutes)

Multi-account variants

Variant A — Consolidated billing with a separate payer

Variant B — SCPs blocking IAM creates

Variant C — AWS Control Tower / Landing Zone

Variant D — Cross-account CUR replication

Top day-1 failures

Data model

Security posture

What we don’t do (yet)

Other managed-inference adapters

Getting Started

Architecture

Setup

Features

Reference

Documentation Index

​Bedrock integration

​What you get

​Data sources we ingest

​Onboarding (≤15 minutes)

​Multi-account variants

​Variant A — Consolidated billing with a separate payer

​Variant B — SCPs blocking IAM creates

​Variant C — AWS Control Tower / Landing Zone

​Variant D — Cross-account CUR replication

​Top day-1 failures

​Data model

​Security posture

​What we don’t do (yet)

​Other managed-inference adapters

Bedrock integration

What you get

Data sources we ingest

Onboarding (≤15 minutes)

Multi-account variants

Variant A — Consolidated billing with a separate payer

Variant B — SCPs blocking IAM creates

Variant C — AWS Control Tower / Landing Zone

Variant D — Cross-account CUR replication

Top day-1 failures

Data model

Security posture

What we don’t do (yet)

Other managed-inference adapters