Configuration
This page covers the tenant-level settings that an admin or owner manages — what they configure, where, and how the configuration interacts with the rest of TensorCost.RBAC and members
TensorCost ships three built-in roles. Custom roles are on the roadmap; today the three cover most needs.| Role | Read | Write | Connect accounts | Manage RBAC | Billing |
|---|---|---|---|---|---|
member | All tenant data | Accept / dismiss recommendations | — | — | — |
admin | All tenant data | All write actions | Yes | — | — |
owner | All tenant data | All write actions | Yes | Yes | Yes |
Role gating in the UI and API
Both the shell and the gateway check the same@tensorcost/rbac primitive. A member who hits an admin-only REST route gets 403; the same user’s MFs hide the navigation entries they cannot use. RBAC is also enforced in the MCP server — every tool call is scope-checked at dispatch.
Tag mapping
TensorCost attribution depends on mapping your existing AWS / Azure / GCP cost-allocation tags to four canonical dimensions:| Canonical dimension | Typical source tag |
|---|---|
application | app, service, Application |
team | team, Team, Owner |
environment | env, environment, Environment |
owner | owner, cost_center, business_unit |
untagged. The savings methodology PDF (linked from the dashboard) explains how untagged is allocated when you set a “default owner” rule.
Budgets and burn-rate alerts
Budgets are hierarchical:burn-rate-alerts-enabled flag.
Notification channels
Alerts and policy events deliver through pluggable channels. Configure under Settings → Notification channels.| Type | Config | Notes |
|---|---|---|
| Slack | Incoming webhook URL | Severity-colored blocks; threaded acknowledgments |
| Microsoft Teams | Incoming webhook URL | MessageCard format with theme colors |
| PagerDuty | Events API v2 routing key | Severity mapping: critical→P1, high→P2, medium→P3 |
| Recipient list | HTML; supports digest mode | |
| Custom webhook | URL + HTTP method + headers | JSON body; HMAC-signed via X-TensorCost-Signature |
Per-channel filtering
| Filter | Effect |
|---|---|
| Alert types | Allow-list of cost_threshold, idle_gpu, runaway_loop, security_incident, etc. Empty = all. |
| Minimum priority | Drops anything below low / medium / high / critical. |
| Digest mode | instant or batched. Batched flushes every digest_interval_minutes (default 30). |
Test button
Every channel has Test that delivers a sample notification through the same code path as a real alert — including the signature header for webhooks.Alert rules
Define monitoring thresholds under Alerts → Rules. Field reference:| Field | Description |
|---|---|
metric | gpu_utilization, cpu_utilization, memory_utilization, temperature, daily_cost, hourly_cost, inference_cost_per_request, cache_hit_rate, agent_call_count, error_rate |
operator | gt, lt, gte, lte, eq, not_eq |
threshold | Numeric compare value |
duration_minutes | Sustain duration before firing (0 = immediate) |
severity | low, medium, high, critical |
scope | all, tagged (with scope_filter), or specific_instance |
notification_channel_ids | Where to deliver |
cooldown_minutes | Prevents alert storms |
Enforcement policies
Automated remediation rules. Three execution modes — start in Notify only and graduate.| Mode | Behavior |
|---|---|
notify_only | Fires the alert, takes no action. |
approval_required | Queues actions for admin approval. |
auto | Executes immediately. Use only for tested policies. |
Templates
Pre-built policies you can clone:| Template | What it does |
|---|---|
| Idle GPU auto-stop | Stops instances idle ≥15 minutes |
| Weekend cost saver | Scales non-prod 75% on Sat / Sun |
| Dev/test auto-shutdown | Stops dev instances at 7pm local |
| Training-job cost guard | Aborts training runs that exceed a cost threshold |
| Inference right-size | Suggests downsizing for underutilized inference endpoints |
| Spot fallback | Switches to on-demand on spot interruption |
| Runaway-loop circuit-breaker | Pauses an agent that crosses an invocation/cost spike threshold |
Composite conditions
active_hours, active_days, timezone) and maintenance-window suppression apply.
Maintenance windows
Schedule periods where alerts and enforcement are suppressed. Useful for deploys, reboots, and known-noisy events.Branding and custom domain
Settings → Customization → Branding.| Setting | Detail |
|---|---|
| Logo | 200×50 PNG/SVG, sidebar-rendered |
| Favicon | 16×16 / 32×32 |
| Primary / secondary colors | Override the MUI theme |
| Custom CSS | Advanced — applies after theme |
| Theme | Light / dark / auto |
| App name | Browser tab title |
| Disclaimer / footer | Legal disclaimer text |
Data retention
| Setting | Default | Range |
|---|---|---|
| Raw metric retention | 90 days | 30–365 days |
| Recommendation history | 365 days | 90–730 days |
| Audit trail | 7 years | Fixed (compliance) |
Cloud-account configuration
AWS
We use STS AssumeRole with external ID. The CFN onboarding stack creates the role; you paste the ARN back into the wizard. Temp credentials are 15-minute, never persisted. In Organization mode, the same role assumesOrganizationAccountAccessRole (or AWSControlTowerExecution if that’s what your Landing Zone provisioned) into each member account on demand. SCP-aware: configurable role-path prefix for OUs that restrict IAM creates.
Azure
Service principal with Reader + Cost Management Reader. UseDefaultAzureCredential for local testing; in production, prefer Managed Identity on the agent host.
GCP
Service account with Compute Viewer + BigQuery Data Viewer (for billing export). Application Default Credentials are honored.Managed-inference providers
| Provider | Auth | Notes |
|---|---|---|
| Amazon Bedrock | IAM role + external ID (same as AWS account) | See bedrock integration |
| Azure OpenAI | App registration + cost API permission | |
| Vertex AI | GCP service account with Vertex + billing export read | |
| OpenAI API | Org-scoped API key with read-only billing scope | |
| Anthropic API | Org-scoped API key |
integration.connection_secret (RLS-enforced). Rotation is supported via POST /v1/integration/connections/:id/rotate-secret.
Feature flags surface
Tenant-visible feature flags appear under Settings → Feature flags for admins. The full pattern — LaunchDarkly +useFeature() + the quarterly stale-flag cleanup ritual — is documented in feature flags.
Audit trail
Every config change writes to the cross-tenant audit ledger:- Who (user ID + email)
- What (resource + before/after diff)
- When (UTC timestamp)
- Where (IP, user agent)
- Why (free-text reason for high-severity changes)
GET /v1/identity/audit?format=csv.