Running agents in production with lifecycle management, health checks, and scaling
Helix is Teleon’s production-grade runtime system for AI agents. It provides agent lifecycle management, process monitoring, health checking, auto-scaling, and LLM-oriented runtime features.
For the full runtime API surface (methods, types/enums, configuration objects, and production features), see Helix API in API Reference.
Helix API
Production runtime
Enabling Helix on an agent via @client.agent(... helix={...})
Direct runtime orchestration with AgentRuntime
Health checking primitives and custom health checks
Auto-scaling policies and evaluation
LLM runtime features: token counting, tracking, and budget management
from teleon import TeleonClient
client = TeleonClient( api_key = "tlk_live_xxx" )
@client.agent (
name = "my-agent" ,
helix = {
"min_instances" : 2 ,
"max_instances" : 10 ,
"memory_limit_mb" : 512 ,
},
)
async def my_agent (query: str ):
return "ok"
Helix can also be used directly for runtime orchestration.
from teleon.helix import AgentRuntime, RuntimeConfig, ResourceConfig
config = RuntimeConfig(
environment = "production" ,
hot_reload = False ,
max_workers = 20 ,
)
runtime = AgentRuntime(config)
await runtime.register_agent(
agent_id = "my-agent" ,
agent_callable = my_agent_function,
resources = ResourceConfig( min_instances = 2 , max_instances = 10 ),
)
await runtime.start()
await runtime.start_agent( "my-agent" )
Helix supports liveness, readiness, and custom health checks.
from teleon.helix import HealthCheck, CheckType
async def custom_check ():
return True
health_check = HealthCheck(
name = "database-check" ,
check_type = CheckType. CUSTOM ,
check_fn = custom_check,
interval = 30 ,
timeout = 10 ,
failure_threshold = 3 ,
success_threshold = 1 ,
initial_delay = 5 ,
)
Use a scaling policy to compute desired instance counts.
from teleon.helix import ScalingPolicy, Scaler, ScalingMetrics
policy = ScalingPolicy(
min_instances = 1 ,
max_instances = 10 ,
target_cpu_percent = 70.0 ,
target_memory_percent = 80.0 ,
scale_up_cooldown = 60 ,
scale_down_cooldown = 300 ,
scale_up_step = 1 ,
scale_down_step = 1 ,
)
scaler = Scaler()
await scaler.register_policy( "my-agent" , policy)
metrics = ScalingMetrics( cpu_percent = 85.0 , memory_percent = 60.0 )
desired = await scaler.evaluate_scaling(
target_id = "my-agent" ,
metrics = metrics,
current_instances = 2 ,
)
Helix includes LLM metrics, token tracking, and token budget management.
from teleon.helix import LLMResourceTracker
tracker = LLMResourceTracker(
agent_id = "chat-agent" ,
model = "gpt-4" ,
window_size = 300 ,
)
await tracker.record_request(
input_tokens = 150 ,
output_tokens = 200 ,
latency_ms = 1500.0 ,
ttft_ms = 250.0 ,
cost = 0.0045 ,
wait_time_ms = 100.0 ,
)
metrics = await tracker.get_metrics()
stats = await tracker.get_statistics()
from teleon.helix import TokenCounter
counter = TokenCounter()
tokens = counter.count_tokens( "Hello" , model = "gpt-4" )
from teleon.helix import get_token_tracker
tracker = get_token_tracker()
await tracker.record_tokens(
agent_id = "chat-agent" ,
model = "gpt-4" ,
input_tokens = 100 ,
output_tokens = 150 ,
operation = "completion" ,
metadata = { "user_id" : "123" },
)
from teleon.helix import get_token_budget_manager, TokenPeriod
budget_manager = get_token_budget_manager()
await budget_manager.set_budget( amount = 1_000_000 , period = TokenPeriod. DAILY )
status = await budget_manager.check_budget()