Helix Runtime

Running agents in production with lifecycle management, health checks, and scaling

Helix is Teleon’s production-grade runtime system for AI agents. It provides agent lifecycle management, process monitoring, health checking, auto-scaling, and LLM-oriented runtime features.

For the full runtime API surface (methods, types/enums, configuration objects, and production features), see Helix API in API Reference.

Helix API

Production runtime

What this guide covers

Enabling Helix on an agent via @client.agent(... helix={...})
Direct runtime orchestration with AgentRuntime
Health checking primitives and custom health checks
Auto-scaling policies and evaluation
LLM runtime features: token counting, tracking, and budget management

Enable Helix on an agent

from teleon import TeleonClient
 
client = TeleonClient(api_key="tlk_live_xxx")
 
@client.agent(
    name="my-agent",
    helix={
        "min_instances": 2,
        "max_instances": 10,
        "memory_limit_mb": 512,
    },
)
async def my_agent(query: str):
    return "ok"

Direct runtime usage

Helix can also be used directly for runtime orchestration.

from teleon.helix import AgentRuntime, RuntimeConfig, ResourceConfig
 
config = RuntimeConfig(
    environment="production",
    hot_reload=False,
    max_workers=20,
)
 
runtime = AgentRuntime(config)
 
await runtime.register_agent(
    agent_id="my-agent",
    agent_callable=my_agent_function,
    resources=ResourceConfig(min_instances=2, max_instances=10),
)
 
await runtime.start()
await runtime.start_agent("my-agent")

Health checking

Helix supports liveness, readiness, and custom health checks.

from teleon.helix import HealthCheck, CheckType
 
async def custom_check():
    return True
 
health_check = HealthCheck(
    name="database-check",
    check_type=CheckType.CUSTOM,
    check_fn=custom_check,
    interval=30,
    timeout=10,
    failure_threshold=3,
    success_threshold=1,
    initial_delay=5,
)

Auto-scaling

Use a scaling policy to compute desired instance counts.

from teleon.helix import ScalingPolicy, Scaler, ScalingMetrics
 
policy = ScalingPolicy(
    min_instances=1,
    max_instances=10,
    target_cpu_percent=70.0,
    target_memory_percent=80.0,
    scale_up_cooldown=60,
    scale_down_cooldown=300,
    scale_up_step=1,
    scale_down_step=1,
)
 
scaler = Scaler()
await scaler.register_policy("my-agent", policy)
 
metrics = ScalingMetrics(cpu_percent=85.0, memory_percent=60.0)
desired = await scaler.evaluate_scaling(
    target_id="my-agent",
    metrics=metrics,
    current_instances=2,
)

LLM runtime features

Helix includes LLM metrics, token tracking, and token budget management.

Token throughput and LLM resource tracking

from teleon.helix import LLMResourceTracker
 
tracker = LLMResourceTracker(
    agent_id="chat-agent",
    model="gpt-4",
    window_size=300,
)
 
await tracker.record_request(
    input_tokens=150,
    output_tokens=200,
    latency_ms=1500.0,
    ttft_ms=250.0,
    cost=0.0045,
    wait_time_ms=100.0,
)
 
metrics = await tracker.get_metrics()
stats = await tracker.get_statistics()

from teleon.helix import TokenCounter
 
counter = TokenCounter()
tokens = counter.count_tokens("Hello", model="gpt-4")

from teleon.helix import get_token_tracker
 
tracker = get_token_tracker()
await tracker.record_tokens(
    agent_id="chat-agent",
    model="gpt-4",
    input_tokens=100,
    output_tokens=150,
    operation="completion",
    metadata={"user_id": "123"},
)

from teleon.helix import get_token_budget_manager, TokenPeriod
 
budget_manager = get_token_budget_manager()
await budget_manager.set_budget(amount=1_000_000, period=TokenPeriod.DAILY)
status = await budget_manager.check_budget()