Feature flags

TensorCost gates every non-trivial new behavior behind a feature flag. Flags let us roll out per tenant, per plan tier, or per environment without a code push, A/B-test recommenders, and recover quickly when a launch goes sideways.

How it works

Both the backend (@tensorcost/feature-flags) and the frontend (mf-sdk/context.tsx’s useFeature() hook) evaluate flags through LaunchDarkly, with environment-variable fallbacks for local dev and self-hosted installs. Every evaluation passes a tenant context (tenantId, plan, region, customerType) and the deployment environment, so the same flag can evaluate differently across customers.

Frontend pattern

Microfrontends consume useFeature() from @tensorcost/mf-sdk:

import { useFeature } from '@tensorcost/mf-sdk/context';

export function BurnRateBanner() {
  const enabled = useFeature('burn-rate-alerts-enabled');
  if (!enabled) return null;
  return <BurnRateAlertsCard />;
}

The shell hydrates the feature-flag context at login from GET /v1/tenant/feature-flags. The context refreshes every 5 minutes; routes and sidebar entries gated by a flag appear / disappear without a page reload.

Backend pattern

import { isEnabled } from '@tensorcost/feature-flags';

if (await isEnabled('runaway-loop-detector', { tenantId, plan })) {
  await detectRunawayLoops(events);
}

For long-running services (job runners, gRPC handlers), evaluate at the start of the work unit so a flip propagates within seconds.

Live flag inventory

This is the public-facing summary; admins can see the full list under Settings → Feature flags.

Flag	Category	Default	Purpose
`grpc-enabled`	services	ON	Long-lived agent gRPC stream. Disabling forces HTTPS sync fallback.
`anomaly-detection-enabled`	services	ON	Statistical anomaly detection on GPU + inference metrics.
`alert-rules-enabled`	services	ON	Custom alert-rule evaluation engine.
`escalation-enabled`	services	ON	Multi-level escalation chains.
`burn-rate-alerts-enabled`	services	OFF (rolling)	50 / 80 / 100 % burn-rate alerts on budgets.
`runaway-loop-detector`	services	ON	Per-agent loop / retry-storm detection.
`auto-scaling-enabled`	services	OFF	Auto-scaling orchestrator that acts on recommendations.
`data-retention-enabled`	jobs	ON	Nightly data cleanup within configured retention.
`aggregation-enabled`	jobs	ON	Hourly metric + cost rollups.
`cost-forecast-enabled`	jobs	ON	Daily cost forecasting + budget breach prediction.
`tenant-cleanup-enabled`	jobs	ON	Soft-deleted tenant purge after retention window.
`bedrock-recommenders-enabled`	recommenders	ON	The four MVP Bedrock recommenders (routing, cache, provisioned-throughput, runaway-loop).
`azure-openai-recommenders-enabled`	recommenders	OFF (rolling)	Azure OpenAI counterparts.
`vertex-recommenders-enabled`	recommenders	OFF	Vertex counterparts.
`mcp-write-tools-enabled`	experimental	OFF	MCP write surface for the few customers that have granted write scope.
`ml-enabled`	experimental	OFF	Second-layer ML anomaly detection.
`tanstack-query-migration`	platform	OFF	Per-MF migration off RTK Query onto TanStack Query.

services flags toggle long-running components. jobs flags toggle scheduled tasks. recommenders flags toggle individual recommendation engines per provider. experimental flags are typically gated to specific tenants in private beta.

Reading flag state

Admins call:

GET /v1/tenant/feature-flags

{
  "success": true,
  "data": {
    "grpc-enabled":           { "enabled": true,  "category": "services",     "source": "launchdarkly" },
    "burn-rate-alerts-enabled": { "enabled": false, "category": "services",   "source": "launchdarkly" },
    "ml-enabled":             { "enabled": false, "category": "experimental", "source": "launchdarkly" }
  },
  "source": "launchdarkly",
  "tenantId": "..."
}

source is launchdarkly (live evaluation) or env (env-var fallback in self-hosted).

Plan-tier gating

Plan tiers (free, growth, enterprise) are part of the LaunchDarkly context. Common patterns:

mcp-write-tools-enabled targeted to plan in ['enterprise'] and a specific tenant allow-list.
bedrock-recommenders-enabled open to all plans; azure-openai-recommenders-enabled rolling out by percentage.
Experimental flags targeted to specific tenant.key values for design partners.

Self-hosted deployments

Self-hosted instances skip LaunchDarkly. Each flag has a matching upper-snake-case env var:

GRPC_ENABLED=true
BURN_RATE_ALERTS_ENABLED=false
ML_ENABLED=false

Setting any to false disables the corresponding capability at startup.

Stale-flag quarterly cleanup ritual

Every flag is technical debt waiting to happen. We hold a 30–45 minute cleanup ritual on the first Monday of March, June, September, and December:

Pull the live flag list

Pull the LaunchDarkly project flag list and join it against grep -r 'useFeature\\|isEnabled' apps-new/.

Tag each flag

Each flag gets one of: Live & owned (keep), Permanent gate (rename to is-feature-x and document), Stale (no usage in 90 days), Orphan (no code reference).

Delete stale and orphan flags

Stale flags: archive in LaunchDarkly, leave the env-var fallback in code for one quarter, then remove. Orphans: delete immediately.

Document permanent gates

Permanent gates (e.g. plan-tier or region gates that never sunset) move to a permanent-gates.md doc and out of the flag list.

Owner: Raj. Calendar reminder lives outside the repo.

When to reach for a flag vs configuration

Feature flag — runtime toggle, varies per tenant, safe to flip without a deploy, typically binary.
Configuration — static setting, set via env var or admin UI at deploy time, varies by environment not tenant.

If you need to change behavior for one customer without affecting others, it’s a flag. If you need to point at a different Redis cluster, it’s configuration.

Getting Started

Architecture

Setup

Features

Reference

Feature flags

Feature flags

How it works

Frontend pattern

Backend pattern

Live flag inventory

Reading flag state

Plan-tier gating

Self-hosted deployments

Stale-flag quarterly cleanup ritual

When to reach for a flag vs configuration

Getting Started

Architecture

Setup

Features

Reference

Documentation Index

​Feature flags

​How it works

​Frontend pattern

​Backend pattern

​Live flag inventory

​Reading flag state

​Plan-tier gating

​Self-hosted deployments

​Stale-flag quarterly cleanup ritual

​When to reach for a flag vs configuration

Feature flags

How it works

Frontend pattern

Backend pattern

Live flag inventory

Reading flag state

Plan-tier gating

Self-hosted deployments

Stale-flag quarterly cleanup ritual

When to reach for a flag vs configuration