Helix API

Helix is Teleon’s production runtime system for AI agents.

Role

Helix manages the agent lifecycle in production: process management, health checking, auto-scaling, and runtime configuration (CPU/memory limits, timeouts). It also includes LLM-oriented runtime features like token tracking and cost / budget management.

When to use

Run agents reliably in production with health checks and controlled resource limits.
Scale up and down with min_instances / max_instances.
Track LLM usage (tokens, costs) and apply budgets for long-running workloads.

Quick start

from teleon import TeleonClient
 
client = TeleonClient(api_key="tlk_live_xxx")
 
@client.agent(
    name="support",
    helix={
        "min_instances": 1,
        "max_instances": 10,
        "memory_limit_mb": 512,
    },
)
async def support_agent(query: str):
    return "ok"

Direct runtime usage

from teleon.helix import AgentRuntime, RuntimeConfig, ResourceConfig
 
config = RuntimeConfig(
    environment="production",
    hot_reload=False,
    max_workers=20,
)
runtime = AgentRuntime(config)
 
await runtime.register_agent(
    agent_id="my-agent",
    agent_callable=my_agent_function,
    resources=ResourceConfig(min_instances=2, max_instances=10),
)
 
await runtime.start()
await runtime.start_agent("my-agent")

Core components

Component	Purpose
`AgentRuntime`	Main runtime for agent lifecycle management
`ProcessManager`	Process spawning and monitoring
`HealthChecker`	Health and readiness checks
`Scaler`	Automatic scaling based on metrics
`FileWatcher`	Hot reload for development

Configuration

`RuntimeConfig`

from teleon.helix import RuntimeConfig, ResourceConfig
 
config = RuntimeConfig(
    environment="development",
    debug=False,
    hot_reload=True,
    watch_paths=["teleon", "agents"],
    max_workers=10,
    log_level="INFO",
    default_resources=ResourceConfig(min_instances=1, max_instances=5),
)

`ResourceConfig`

from teleon.helix import ResourceConfig
 
resources = ResourceConfig(
    cpu_limit=2.0,
    memory_limit_mb=512,
    min_instances=1,
    max_instances=10,
    startup_timeout=30,
    shutdown_timeout=30,
    health_check_enabled=True,
    health_check_interval=30,
)

Configuration options

Option	Type	Default	Description
`environment`	str	`"development"`	Runtime environment
`debug`	bool	`False`	Enable debug mode
`hot_reload`	bool	`True`	Enable hot reload (dev only)
`watch_paths`	list	`["teleon", "agents"]`	Paths to watch for changes
`max_workers`	int	`10`	Maximum worker processes
`log_level`	str	`"INFO"`	Log level

API reference

`AgentRuntime`

Main runtime for managing agent processes.

`register_agent()`

await runtime.register_agent(
    agent_id="support-agent",
    agent_callable=support_handler,
    resources=ResourceConfig(min_instances=2),
    health_check=HealthCheck(name="support-health"),
)

Parameters:

agent_id (str): Unique agent identifier
agent_callable (Callable): Agent function
resources (ResourceConfig, optional): Resource configuration
health_check (HealthCheck, optional): Health check configuration

`register_llm_agent()`

await runtime.register_llm_agent(
    agent_id="chat-agent",
    agent_callable=chat_handler,
    model="gpt-4",
    max_tokens=2000,
    cost_budget=10.0,
    resources=ResourceConfig(min_instances=3),
)

Parameters:

agent_id (str): Unique agent identifier
agent_callable (Callable): Agent function
model (str): LLM model name
max_tokens (int): Max tokens per request (default: 2000)
cost_budget (float, optional): Cost budget per hour
resources (ResourceConfig, optional): Resource configuration

`start_agent()`

process_ids = await runtime.start_agent("my-agent")
print(f"Started {len(process_ids)} instances")

Returns: list of process IDs

`stop_agent()`

await runtime.stop_agent("my-agent")
await runtime.stop_agent("my-agent", force=True)

Parameters:

agent_id (str): Agent identifier
force (bool): Force kill if True (default: False)

`scale_agent()`

await runtime.scale_agent("my-agent", instances=5)

`get_agent_status()`

status = await runtime.get_agent_status("my-agent")
 
print(f"Status: {status['status']}")
print(f"Instances: {status['instances']}")
print(f"Health: {status['health']}")

Returns: dict with keys status, instances, resources, processes, health

`list_agents()`

agents = await runtime.list_agents()
for agent in agents:
    print(f"{agent['agent_id']}: {agent['status']}")

`ProcessManager`

`ProcessStatus`

from teleon.helix import ProcessStatus
 
ProcessStatus.STARTING
ProcessStatus.RUNNING
ProcessStatus.STOPPING
ProcessStatus.STOPPED
ProcessStatus.FAILED
ProcessStatus.CRASHED

`ProcessInfo`

from teleon.helix import ProcessInfo
 
process = await manager.get_process(process_id)
print(f"PID: {process.pid}")
print(f"Status: {process.status}")
print(f"CPU: {process.cpu_percent}%")
print(f"Memory: {process.memory_mb} MB")
print(f"Restarts: {process.restart_count}")

Health checking

`HealthStatus`

from teleon.helix import HealthStatus
 
HealthStatus.HEALTHY
HealthStatus.UNHEALTHY
HealthStatus.DEGRADED
HealthStatus.UNKNOWN

`CheckType`

from teleon.helix import CheckType
 
CheckType.READINESS
CheckType.LIVENESS
CheckType.CUSTOM

`HealthCheck`

from teleon.helix import HealthCheck, CheckType
 
async def custom_check():
    return await database.ping()
 
health_check = HealthCheck(
    name="database-check",
    check_type=CheckType.CUSTOM,
    check_fn=custom_check,
    interval=30,
    timeout=10,
    failure_threshold=3,
    success_threshold=1,
    initial_delay=5,
)

`HealthChecker`

from teleon.helix import HealthChecker
 
checker = HealthChecker()
await checker.register_check("my-agent", health_check)
 
result = await checker.check_health("my-agent")
overall = await checker.get_overall_health()

LLM health checks

result = await checker.check_llm_health(
    target_id="chat-agent",
    llm_metrics=metrics,
    cost_budget=10.0,
)

Auto-scaling

`ScalingPolicy`

from teleon.helix import ScalingPolicy
 
policy = ScalingPolicy(
    min_instances=1,
    max_instances=10,
    target_cpu_percent=70.0,
    target_memory_percent=80.0,
    scale_up_cooldown=60,
    scale_down_cooldown=300,
    scale_up_step=1,
    scale_down_step=1,
)

`Scaler`

from teleon.helix import Scaler, ScalingMetrics
 
scaler = Scaler()
await scaler.register_policy("my-agent", policy)
 
metrics = ScalingMetrics(cpu_percent=85.0, memory_percent=60.0)
desired = await scaler.evaluate_scaling(
    target_id="my-agent",
    metrics=metrics,
    current_instances=2,
)

Hot reload

`WatcherConfig`

from teleon.helix import WatcherConfig
 
config = WatcherConfig(
    watch_paths=["teleon", "agents", "handlers"],
    ignore_patterns=["__pycache__", "*.pyc", ".git", "venv"],
    debounce_seconds=1.0,
)

`FileWatcher`

from teleon.helix import FileWatcher
 
watcher = FileWatcher(config, runtime)
await watcher.start()

LLM features

LLM metrics

from teleon.helix import LLMMetrics
 
metrics = await tracker.get_metrics()
print(f"Tokens/sec: {metrics.tokens_per_second}")
print(f"Requests/min: {metrics.requests_per_minute}")
print(f"P95 latency: {metrics.p95_latency_ms}ms")
print(f"Cost/hour: ${metrics.cost_per_hour:.2f}")
print(f"Utilization: {metrics.get_utilization()*100:.0f}%")

Token counting

from teleon.helix import TokenCounter
 
counter = TokenCounter()
tokens = counter.count_tokens("Hello", model="gpt-4")
tokens = counter.count_messages_tokens(
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather?"},
    ],
    model="gpt-4",
)

`LLMResourceTracker`

from teleon.helix import LLMResourceTracker
 
tracker = LLMResourceTracker(agent_id="chat-agent", model="gpt-4", window_size=300)
 
await tracker.record_request(
    input_tokens=150,
    output_tokens=200,
    latency_ms=1500.0,
    ttft_ms=250.0,
    cost=0.0045,
    wait_time_ms=100.0,
)
 
await tracker.increment_queue()
await tracker.decrement_queue()
await tracker.increment_concurrent()
await tracker.decrement_concurrent()
 
metrics = await tracker.get_metrics()
stats = await tracker.get_statistics()

Throughput monitor

from teleon.helix import get_throughput_monitor
 
monitor = get_throughput_monitor()
await monitor.register_tracker("agent-1", tracker1)
await monitor.register_tracker("agent-2", tracker2)
aggregate = await monitor.get_aggregate_metrics()

Token tracking

from teleon.helix import get_token_tracker, TokenPeriod
 
tracker = get_token_tracker()
await tracker.record_tokens(
    agent_id="chat-agent",
    model="gpt-4",
    input_tokens=100,
    output_tokens=150,
    operation="completion",
    metadata={"user_id": "123"},
)
 
agent_tokens = await tracker.get_tokens(agent_id="chat-agent")
hourly = await tracker.get_tokens(period=TokenPeriod.HOURLY, period_key="2024-01-15-14")

Token budgets

from teleon.helix import get_token_budget_manager, TokenPeriod
 
budget_manager = get_token_budget_manager()
 
await budget_manager.set_budget(amount=1_000_000, period=TokenPeriod.DAILY)
await budget_manager.set_budget(
    amount=100_000,
    period=TokenPeriod.HOURLY,
    agent_id="chat-agent",
)
 
status = await budget_manager.check_budget()
 
async def on_budget_alert(alert):
    if alert.level == "critical":
        await notify_admin(alert.message)
 
budget_manager.register_alert_callback(on_budget_alert)
alerts = await budget_manager.get_alerts(level="critical", limit=10)

LLM scaling policies

from teleon.helix import TokenAwareScalingPolicy
 
policy = TokenAwareScalingPolicy(
    min_instances=1,
    max_instances=20,
    target_tokens_per_second=1000.0,
    tokens_per_second_buffer=0.2,
    max_queue_depth=10,
    queue_depth_threshold=5,
    target_p95_latency_ms=2000.0,
    target_p99_latency_ms=5000.0,
    scale_up_cooldown=60,
    scale_down_cooldown=300,
)

from teleon.helix import CostAwareScalingPolicy
 
policy = CostAwareScalingPolicy(
    min_instances=1,
    max_instances=10,
    max_cost_per_hour=50.0,
    max_cost_per_day=500.0,
    cost_warning_threshold=0.8,
    cost_critical_threshold=0.95,
    base_scaling_policy=token_policy,
)

from teleon.helix import create_llm_scaler, get_llm_scaler
 
scaler = create_llm_scaler(
    target_tokens_per_second=1000.0,
    max_cost_per_hour=50.0,
    min_instances=2,
    max_instances=20,
)
 
scaler = get_llm_scaler()
desired = await scaler.evaluate_scaling(
    agent_id="chat-agent",
    llm_metrics=metrics,
    current_instances=5,
)

Context-aware routing

from teleon.helix import ModelVariant
 
ModelVariant.SMALL
ModelVariant.MEDIUM
ModelVariant.LARGE
ModelVariant.XLARGE

from teleon.helix import ContextWindowManager
 
manager = ContextWindowManager()
await manager.register_instance(
    instance_id="inst-1",
    agent_id="chat-agent",
    model="gpt-4",
    max_context_tokens=8192,
)
instance = await manager.get_best_instance(
    required_context_tokens=50000,
    prefer_model="gpt-4",
    prefer_cost=True,
)

from teleon.helix import get_request_router
 
router = get_request_router()
decision = await router.route(
    messages=[{"role": "user", "content": "Summarize this document..."}],
    model="gpt-4",
    max_tokens=500,
    prefer_cost=True,
    prefer_speed=False,
)

Batch processing

from teleon.helix import BatchConfig, BatchProcessor
 
async def process_batch(requests):
    results = []
    for req in requests:
        result = await llm.complete(req.messages, model=req.model)
        results.append(result)
    return results
 
processor = BatchProcessor(config=BatchConfig(max_batch_size=10), llm_caller=process_batch)
await processor.start()
result = await processor.process_request(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4",
    priority=0,
)
await processor.stop()

Response caching

from teleon.helix import create_cache, get_response_cache, CacheStrategy
 
cache = create_cache(
    max_size=1000,
    strategy=CacheStrategy.EXACT,
    eviction_policy=CacheEvictionPolicy.LRU,
    default_ttl=3600,
)
 
cache = get_response_cache()
 
messages = [{"role": "user", "content": "What is 2+2?"}]
cached = await cache.get(messages, model="gpt-4")

Production features

Agent metrics reporter

Environment variables

Variable	Default	Description
`TELEON_DEPLOYMENT_ID`	-	Deployment identifier
`TELEON_API_KEY`	-	API key for authentication
`TELEON_PLATFORM_URL`	`https://api.teleon.ai`	Platform API URL
`TELEON_METRICS_INTERVAL`	`10`	Reporting interval (seconds)
`TELEON_METRICS_BATCH_SIZE`	`100`	Max batch size

`AgentMetricsReporter`

from teleon.helix import (
    init_agent_reporter,
    get_agent_reporter,
    shutdown_agent_reporter,
)
 
reporter = await init_agent_reporter(
    deployment_id="deploy-123",
    api_key="tlk_xxx",
    flush_interval=10.0,
    batch_size=100,
)
 
reporter = get_agent_reporter()
await reporter.start()
 
await reporter.report_request(
    input_tokens=100,
    output_tokens=50,
    latency_ms=250.5,
    model="gpt-4",
    success=True,
    cost=0.002,
    error_type=None,
)
 
await reporter.report_health(
    status="healthy",
    active_requests=5,
    queue_depth=2,
)
 
await reporter.flush()
stats = reporter.get_local_stats()
await shutdown_agent_reporter()

Health endpoints

from fastapi import FastAPI
from teleon.helix import setup_health_endpoints, get_health_manager
 
app = FastAPI()
 
health_manager = setup_health_endpoints(
    app,
    service_name="my-agent",
    version="1.0.0",
    prefix="",
)
 
health_manager = get_health_manager()
 
async def check_database():
    try:
        await db.ping()
        return True, "Database connected"
    except Exception as e:
        return False, str(e)
 
health_manager.add_check("database", check_database, critical=True)
health_manager.set_ready(True, reason="Initialization complete")
health_manager.record_request(success=True)
health_manager.mount(app, prefix="/api")

Endpoints

Endpoint	Purpose	Response
`/health`	Overall health	200 if healthy, 503 if unhealthy
`/ready`	Readiness probe	200 if ready, 503 if not
`/live`	Liveness probe	200 if alive
`/metrics`	Prometheus metrics	Text format metrics

Integration

Decorator integration

from teleon import TeleonClient
 
client = TeleonClient(api_key="tlk_xxx")
 
@client.agent(
    name="support-agent",
    helix={
        "min_instances": 2,
        "max_instances": 10,
        "cpu_limit": 2.0,
        "memory_limit_mb": 512,
        "health_check_enabled": True,
        "health_check_interval": 30,
        "startup_timeout": 30,
        "shutdown_timeout": 30,
        "auto_start": False,
    },
)
async def support_agent(query: str, customer_id: str):
    return await process_query(query, customer_id)

Shorthand configuration

@client.agent(
    name="my-agent",
    helix={
        "min": 2,
        "max": 10,
        "cpu": 2.0,
        "memory": 512,
        "health_interval": 30,
    },
)

Manual integration

from teleon.helix import (
    register_agent_with_helix,
    scale_agent,
    get_agent_status,
    restart_agent,
    stop_agent,
)
 
wrapper = await register_agent_with_helix(
    agent_id="my-agent",
    agent_func=my_function,
    helix_config={"min_instances": 2, "max_instances": 10},
)
 
result = await wrapper(input_data)
await scale_agent("my-agent", instances=5)
status = await get_agent_status("my-agent")
await restart_agent("my-agent")
await stop_agent("my-agent", force=False)

Best practices

Set appropriate resource limits (especially memory_limit_mb)
Use health checks for production deployments
Configure scaling (min_instances for baseline, max_instances for budget)
Use token budgets to prevent cost overruns
Enable caching for repeated queries
Batch when possible for throughput
Monitor metrics via the metrics reporter
Set cooldowns to prevent scaling oscillation

Troubleshooting

Agent not scaling

Check max_instances > min_instances
Verify scaling policy is registered
Check cooldown periods
Review scaling history

Health checks failing

Verify check function returns (bool, str)
Check timeout configuration
Review failure_threshold
Check for dependency/network issues

High token costs

Enable response caching
Set token budgets
Use cost-aware scaling policy
Review context sizes with routing

Slow performance

Check queue depth metrics
Enable batch processing
Review context routing for model selection
Consider increasing max_instances

See the full Helix documentation in helix_user_docs.md for runtime APIs and production features.

Helix API

On this page