Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensorcost.com/llms.txt

Use this file to discover all available pages before exploring further.

Feature flags

TensorCost gates every non-trivial new behavior behind a feature flag. Flags let us roll out per tenant, per plan tier, or per environment without a code push, A/B-test recommenders, and recover quickly when a launch goes sideways.

How it works

Both the backend (@tensorcost/feature-flags) and the frontend (mf-sdk/context.tsx’s useFeature() hook) evaluate flags through LaunchDarkly, with environment-variable fallbacks for local dev and self-hosted installs. Every evaluation passes a tenant context (tenantId, plan, region, customerType) and the deployment environment, so the same flag can evaluate differently across customers.

Frontend pattern

Microfrontends consume useFeature() from @tensorcost/mf-sdk:
import { useFeature } from '@tensorcost/mf-sdk/context';

export function BurnRateBanner() {
  const enabled = useFeature('burn-rate-alerts-enabled');
  if (!enabled) return null;
  return <BurnRateAlertsCard />;
}
The shell hydrates the feature-flag context at login from GET /v1/tenant/feature-flags. The context refreshes every 5 minutes; routes and sidebar entries gated by a flag appear / disappear without a page reload.

Backend pattern

import { isEnabled } from '@tensorcost/feature-flags';

if (await isEnabled('runaway-loop-detector', { tenantId, plan })) {
  await detectRunawayLoops(events);
}
For long-running services (job runners, gRPC handlers), evaluate at the start of the work unit so a flip propagates within seconds.

Live flag inventory

This is the public-facing summary; admins can see the full list under Settings → Feature flags.
FlagCategoryDefaultPurpose
grpc-enabledservicesONLong-lived agent gRPC stream. Disabling forces HTTPS sync fallback.
anomaly-detection-enabledservicesONStatistical anomaly detection on GPU + inference metrics.
alert-rules-enabledservicesONCustom alert-rule evaluation engine.
escalation-enabledservicesONMulti-level escalation chains.
burn-rate-alerts-enabledservicesOFF (rolling)50 / 80 / 100 % burn-rate alerts on budgets.
runaway-loop-detectorservicesONPer-agent loop / retry-storm detection.
auto-scaling-enabledservicesOFFAuto-scaling orchestrator that acts on recommendations.
data-retention-enabledjobsONNightly data cleanup within configured retention.
aggregation-enabledjobsONHourly metric + cost rollups.
cost-forecast-enabledjobsONDaily cost forecasting + budget breach prediction.
tenant-cleanup-enabledjobsONSoft-deleted tenant purge after retention window.
bedrock-recommenders-enabledrecommendersONThe four MVP Bedrock recommenders (routing, cache, provisioned-throughput, runaway-loop).
azure-openai-recommenders-enabledrecommendersOFF (rolling)Azure OpenAI counterparts.
vertex-recommenders-enabledrecommendersOFFVertex counterparts.
mcp-write-tools-enabledexperimentalOFFMCP write surface for the few customers that have granted write scope.
ml-enabledexperimentalOFFSecond-layer ML anomaly detection.
tanstack-query-migrationplatformOFFPer-MF migration off RTK Query onto TanStack Query.
services flags toggle long-running components. jobs flags toggle scheduled tasks. recommenders flags toggle individual recommendation engines per provider. experimental flags are typically gated to specific tenants in private beta.

Reading flag state

Admins call:
GET /v1/tenant/feature-flags
{
  "success": true,
  "data": {
    "grpc-enabled":           { "enabled": true,  "category": "services",     "source": "launchdarkly" },
    "burn-rate-alerts-enabled": { "enabled": false, "category": "services",   "source": "launchdarkly" },
    "ml-enabled":             { "enabled": false, "category": "experimental", "source": "launchdarkly" }
  },
  "source": "launchdarkly",
  "tenantId": "..."
}
source is launchdarkly (live evaluation) or env (env-var fallback in self-hosted).

Plan-tier gating

Plan tiers (free, growth, enterprise) are part of the LaunchDarkly context. Common patterns:
  • mcp-write-tools-enabled targeted to plan in ['enterprise'] and a specific tenant allow-list.
  • bedrock-recommenders-enabled open to all plans; azure-openai-recommenders-enabled rolling out by percentage.
  • Experimental flags targeted to specific tenant.key values for design partners.

Self-hosted deployments

Self-hosted instances skip LaunchDarkly. Each flag has a matching upper-snake-case env var:
GRPC_ENABLED=true
BURN_RATE_ALERTS_ENABLED=false
ML_ENABLED=false
Setting any to false disables the corresponding capability at startup.

Stale-flag quarterly cleanup ritual

Every flag is technical debt waiting to happen. We hold a 30–45 minute cleanup ritual on the first Monday of March, June, September, and December:
1

Pull the live flag list

Pull the LaunchDarkly project flag list and join it against grep -r 'useFeature\\|isEnabled' apps-new/.
2

Tag each flag

Each flag gets one of: Live & owned (keep), Permanent gate (rename to is-feature-x and document), Stale (no usage in 90 days), Orphan (no code reference).
3

Delete stale and orphan flags

Stale flags: archive in LaunchDarkly, leave the env-var fallback in code for one quarter, then remove. Orphans: delete immediately.
4

Document permanent gates

Permanent gates (e.g. plan-tier or region gates that never sunset) move to a permanent-gates.md doc and out of the flag list.
Owner: Raj. Calendar reminder lives outside the repo.

When to reach for a flag vs configuration

  • Feature flag — runtime toggle, varies per tenant, safe to flip without a deploy, typically binary.
  • Configuration — static setting, set via env var or admin UI at deploy time, varies by environment not tenant.
If you need to change behavior for one customer without affecting others, it’s a flag. If you need to point at a different Redis cluster, it’s configuration.