Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt

Use this file to discover all available pages before exploring further.

Get started

This guide takes you from zero to your first verified recommendation. Three steps, and the activation event — your first accepted recommendation — is typically reached within 48 hours of connecting your first account.
TensorCost is built around three workload classes — GPU fleets, managed inference (Bedrock and friends), and agent workloads. You don’t have to connect all three to get value; most customers start with whichever one is dominating their bill.

Prerequisites

  • A TensorCost workspace. Sign up at tensorcost.com or accept your design-partner invite.
  • One of:
    • An AWS account with Bedrock usage (or CUR 2.0 enabled), or
    • An Azure subscription with Azure OpenAI usage, or
    • A GCP project with Vertex AI usage, or
    • GPU instances on AWS / Azure / GCP / Kubernetes / bare-metal that you can install the unified agent on.
  • An admin (or someone who can deploy a CloudFormation stack / Terraform module / Helm chart in your environment).

Step 1 — Create your tenant and invite your team

1

Sign up

Visit tensorcost.com or follow the design-partner invite email. Cognito-backed sign-up; SSO is enabled via your tenant admin once the tenant is provisioned.
2

Invite teammates

From the shell sidebar, open Settings → Members. Three roles ship by default:
RoleWhat they can do
memberRead dashboards, accept/dismiss recommendations, see their own team’s spend.
adminAll of member plus connect cloud accounts, manage agents, configure alert routes, set budgets.
ownerAll of admin plus billing, tenant deletion, RBAC changes.
3

Map your tags (optional but recommended)

Open Settings → Tag mapping and bind your existing AWS/Azure/GCP cost-allocation tags to the TensorCost dimensions: application, team, environment, owner. This is what powers attribution; without it, everything rolls up under “untagged.”

Step 2 — Connect your first source

Pick the path that matches what’s burning the most money first. You can layer in the others later.

Path A — Amazon Bedrock (lead managed-inference path)

This is the fastest path to a first recommendation because it requires no agent install.
1

Open the Bedrock wizard

Integrations → Add AWS Bedrock. The wizard auto-suggests an ExternalId — accept it.
2

Choose onboarding mode

SingleAccount (default — one AWS account) or Organization (consolidated billing with payer + member-account jump roles). Most early customers run SingleAccount. See bedrock integration for the multi-account variant.
3

Enable Bedrock model-invocation logging

AWS console → Bedrock → Settings → Model invocation logging → CloudWatch destination. Note the log-group ARN.
The log group must be in the same region as your InvokeModel calls. Logging in us-west-2 while your traffic runs in us-east-1 is the most common day-1 silent failure.
4

Deploy the CloudFormation stack

Click the one-click CFN link in the wizard. The stack creates exactly one IAM role (TensorCost-BedrockReader-<ExternalId>) with read-only Bedrock + CloudWatch permissions and an external-ID-bound trust policy. Nothing else.
5

Validate the connection

Paste the role ARN back into the wizard and click Validate. The wizard polls STS-AssumeRole + a sample CloudWatch read for up to two minutes.
Your dashboard backfills 90 days of CUR + CloudWatch data within 30–60 minutes. Within 48 hours, the four MVP recommenders surface routing, prompt-cache, provisioned-throughput, and runaway-loop findings with $-impact estimates.

Path B — Install the unified GPU agent

For GPU fleets running on EC2, EKS/GKE/AKS, on-prem Slurm, or Ray. Full guide in agent installation.
aws cloudformation deploy \
  --template-url https://downloads.tensorcost.com/cfn/agent-stack.yml \
  --stack-name tensorcost-agent \
  --parameter-overrides \
      TenantId=$TENANT_ID \
      ExternalId=$EXTERNAL_ID \
  --capabilities CAPABILITY_NAMED_IAM
The agent auto-detects EC2 metadata via IMDSv2, signs the gRPC handshake with HMAC-SHA256, and connects to the regional NLB on TCP/50051. Metrics start flowing within five minutes of a successful handshake.

Path C — Azure OpenAI / Vertex / OpenAI API / Anthropic API

Same pattern as Bedrock, with provider-specific credentials. Integrations → Add provider → pick the source. Each adapter ingests:
  • Per-request: model, input tokens, output tokens, latency, cache-hit rate
  • Daily billing: cost normalized to ai_spend_events
  • Tags / metadata: mapped to your application / team / environment / owner
Raw prompts and responses are never stored. Hashes only. See SOC 2 readiness for the redaction guarantee.

Step 3 — See your first recommendation

Within 48 hours of connecting your first source, the Recommendations feed populates. Each entry includes:
  • A specific, dollar-quantified change (“route 14% of customer-support-agent traffic from Claude Opus 4.6 to Haiku 4.5 — $4,200/month”).
  • The evidence (sample request IDs, cost breakdown, A/B plan).
  • Accept / dismiss-with-reason / snooze actions.
Acceptance is the activation event. Once you accept, the savings ledger starts tracking realized savings against the baseline. Verified savings populate after a 30-day window.

What to do next

Set up alert routes

Slack, PagerDuty, email, Microsoft Teams, custom webhook.

Define budget hierarchies

Tenant → team → application. Burn-rate alerts at 50%, 80%, 100%.

Connect your second source

Coverage compounds. Customers with all three workload classes see 2× the recommendations.

Wire up MCP

Query TensorCost from Claude Desktop or your own agents.

When you get stuck

  • Check the Sync history drawer on the connection — every error from STS, CloudWatch, or the IAM trust policy surfaces here with a remediation link.
  • Common day-1 failures (and remediations) are catalogued in our customer onboarding runbook.
  • Email support@tensorcost.com — design partners get a shared Slack Connect channel.