Documentation

Docs
/
Troubleshooting

Troubleshooting Guide

Solutions for common AgentHub issues and problems

Troubleshooting Guide

This guide covers solutions for the most common issues you might encounter when using AgentHub. Issues are organized by category with step-by-step solutions.

Build Issues

Build Failed: Dependencies Not Found

Symptoms:

  • Build fails during dependency installation
  • Error messages like "ModuleNotFoundError" or "Package not found"
  • Build logs show pip/npm/yarn failures

Solutions:

For Python projects:

# 1. Verify requirements.txt exists and includes all dependencies
pip freeze > requirements.txt

# 2. Test locally first
pip install -r requirements.txt
python main.py

# 3. Use specific versions to avoid conflicts
requests==2.31.0
flask==2.3.3
openai>=1.0.0

For Node.js projects:

# 1. Ensure package.json has all dependencies
npm install  # Test locally

# 2. Lock versions with package-lock.json
npm ci  # Uses exact versions from lock file

# 3. Check for peer dependency warnings
npm ls  # Shows dependency tree and conflicts

For Go projects:

# Ensure go.mod exists with proper module path
module github.com/yourusername/your-agent
go 1.21

Build Failed: Dockerfile Errors

Symptoms:

  • Custom Dockerfile fails to build
  • "COPY failed" or "RUN command failed" errors
  • Wrong base image or missing files

Solutions:

Common Dockerfile issues:

# ❌ Wrong: Relative paths may not work
COPY ./src /app

# ✅ Correct: Use absolute paths or proper context
COPY src/ /app/src/

# ❌ Wrong: Missing WORKDIR
COPY requirements.txt .
RUN pip install -r requirements.txt

# ✅ Correct: Set working directory first
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

# ❌ Wrong: Running as root (security risk)
USER root

# ✅ Correct: Use non-root user
USER 1000:1000

Optimization tips:

# Multi-stage builds for smaller images
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]

Build Timeout

Symptoms:

  • Build stops after 30 minutes
  • "Build timeout exceeded" error
  • Long-running installations hang

Solutions:

  1. Optimize build steps:
# Use faster package mirrors
RUN pip install -r requirements.txt -i https://pypi.org/simple/

# Install system packages efficiently
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*
  1. Use build cache:
# Copy requirements first (changes less often)
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy source code last (changes frequently)
COPY . .
  1. Pre-built base images:
# Use images with pre-installed dependencies
FROM python:3.11-slim  # Instead of ubuntu + python install
FROM node:18-alpine    # Instead of alpine + node install

Instance Issues

Instance Won't Start

Symptoms:

  • Instance status stuck on "Starting"
  • Container exits immediately after starting
  • Health checks failing

Solutions:

1. Check environment variables:

# Common issues:
❌ Missing required variables
❌ Typos in variable names
❌ Wrong variable formats

✅ Verify all required env vars are set:
API_KEY=your-actual-key-here
DATABASE_URL=postgresql://...
PORT=8080

2. Verify port configuration:

# ❌ Wrong: Hardcoded port
app.run(port=5000)

# ✅ Correct: Use PORT environment variable
import os
port = int(os.environ.get('PORT', 8080))
app.run(host='0.0.0.0', port=port)

3. Check startup command:

# ❌ Wrong: Command that exits immediately
CMD ["echo", "Hello World"]

# ✅ Correct: Long-running process
CMD ["python", "main.py"]

# ✅ Correct: Web server
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]

Instance Crashes or Restarts Frequently

Symptoms:

  • Instance status changes from "Running" to "Error" repeatedly
  • High restart count in instance details
  • Container exits with non-zero code

Solutions:

1. Memory issues:

# Increase memory limits
resources:
  memory: "1Gi"        # From 512Mi
  memory_limit: "2Gi"  # Set appropriate limit

2. Add error handling:

import logging
import traceback

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

try:
    # Your main application logic
    app.run()
except Exception as e:
    logger.error(f"Application crashed: {e}")
    logger.error(traceback.format_exc())
    # Don't exit immediately, allow restart
    time.sleep(10)

3. Implement health checks:

from flask import Flask

app = Flask(__name__)

@app.route('/health')
def health():
    try:
        # Check database connection
        # Check external API access
        # Check critical services
        return {'status': 'healthy'}, 200
    except Exception as e:
        return {'status': 'unhealthy', 'error': str(e)}, 500

Instance Performance Issues

Symptoms:

  • High CPU or memory usage
  • Slow response times
  • Resource limit warnings

Solutions:

1. Monitor resource usage:

import psutil
import logging

def log_resource_usage():
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    
    logging.info(f"CPU: {cpu_percent}%, Memory: {memory.percent}%")
    
    if cpu_percent > 80:
        logging.warning("High CPU usage detected")
    if memory.percent > 80:
        logging.warning("High memory usage detected")

2. Optimize code performance:

# Use connection pooling for databases
import sqlalchemy.pool as pool

engine = create_engine(
    database_url,
    poolclass=pool.QueuePool,
    pool_size=5,
    max_overflow=10
)

# Cache expensive operations
from functools import lru_cache

@lru_cache(maxsize=1000)
def expensive_function(param):
    # Heavy computation here
    return result

# Use async for I/O operations
import asyncio
import aiohttp

async def fetch_data(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return responses

3. Scale resources appropriately:

# For CPU-intensive tasks
resources:
  cpu: "2.0"
  cpu_limit: "4.0"

# For memory-intensive tasks  
resources:
  memory: "4Gi"
  memory_limit: "8Gi"

# For high-traffic applications
replicas: 3

Authentication and Access Issues

Can't Sign In

Symptoms:

  • "Authentication failed" errors
  • Stuck on GitHub OAuth flow
  • "Access denied" messages

Solutions:

1. Clear browser data:

# Clear cache and cookies for:
# - https://www.useagenthub.com
# - https://github.com
# - https://prod-agent-hosting-api.useagenthub.com

2. Check GitHub permissions:

  • Go to GitHub → Settings → Applications
  • Find AgentHub in "Authorized OAuth Apps"
  • Verify permissions are granted
  • Revoke and re-authorize if needed

3. Organization membership:

  • Ensure you're a member of the organization
  • Check organization visibility settings
  • Verify your GitHub email is verified

API Key Issues

Symptoms:

  • "Invalid API key" errors
  • "Insufficient permissions" responses
  • API calls returning 401/403 errors

Solutions:

1. Verify API key format:

# Correct format
ah_prod_1234567890abcdef...

# Check key hasn't expired
curl -H "Authorization: Bearer your-api-key" \
  https://prod-agent-hosting-api.useagenthub.com/auth/status

2. Check key permissions:

// API response shows scopes
{
  "authenticated": true,
  "scopes": ["agents:read", "instances:write", "builds:read"]
}

3. Generate new key:

  • Go to Dashboard → Settings → API Keys
  • Click "Generate New Key"
  • Set appropriate scopes
  • Update your applications with new key

Repository and GitHub Issues

Repository Access Denied

Symptoms:

  • "Cannot access repository" errors
  • Build fails to clone repository
  • "Repository not found" messages

Solutions:

1. Check repository permissions:

# Verify you own the repository or have access
# For organization repos, ensure proper team membership

2. Grant AgentHub access:

  • Go to GitHub → Settings → Applications
  • Find AgentHub OAuth app
  • Grant access to specific repositories or all repositories

3. Private repository issues:

# Ensure GitHub token has repo scope
# Check organization third-party access settings

Webhook Issues

Symptoms:

  • Auto-builds not triggering on push
  • "Webhook delivery failed" in GitHub
  • Stale builds despite code changes

Solutions:

1. Check webhook configuration:

  • Go to GitHub repository → Settings → Webhooks
  • Verify AgentHub webhook exists and is active
  • Check webhook URL and events

2. Re-establish webhooks:

  • Disconnect repository in AgentHub
  • Reconnect with proper permissions
  • Verify webhook creation

Environment Variable Issues

Environment Variables Not Working

Symptoms:

  • Agent can't access configuration values
  • "Environment variable not found" errors
  • Default values being used instead of configured ones

Solutions:

1. Verify variable names:

# ❌ Wrong: Case sensitive mismatch
api_key = os.environ.get('api_key')  # Looking for lowercase
# But configured as: API_KEY=value

# ✅ Correct: Exact case match
api_key = os.environ.get('API_KEY')

2. Check variable scope:

# Instance-level variables (user sets these)
env_vars:
  API_KEY: ""         # Empty in agent template
  USER_CONFIG: ""     # User provides actual values

# Build-time variables (set in agent config)
env_vars:
  NODE_ENV: "production"  # Set by developer
  LOG_LEVEL: "INFO"       # Default value

3. Handle missing variables gracefully:

import os

# ❌ Wrong: Will crash if missing
api_key = os.environ['API_KEY']

# ✅ Correct: Provide defaults and validation
api_key = os.environ.get('API_KEY')
if not api_key:
    raise ValueError("API_KEY environment variable is required")

# ✅ Best: Use default values
log_level = os.environ.get('LOG_LEVEL', 'INFO')
timeout = int(os.environ.get('TIMEOUT_SECONDS', '30'))

Network and Connectivity Issues

Agent Can't Access External APIs

Symptoms:

  • Timeout errors when calling external services
  • "Connection refused" or "DNS resolution failed"
  • Works locally but fails in AgentHub

Solutions:

1. Add retry logic:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session():
    session = requests.Session()
    
    retry_strategy = Retry(
        total=3,
        status_forcelist=[429, 500, 502, 503, 504],
        backoff_factor=1
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    return session

# Use session for all requests
session = create_session()
response = session.get('https://api.example.com/data')

2. Handle rate limits:

import time
import requests

def api_call_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:  # Rate limited
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
        else:
            response.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} attempts")

3. Configure timeouts:

# Set reasonable timeouts
response = requests.get(
    'https://api.example.com/data',
    timeout=(5, 30)  # (connection_timeout, read_timeout)
)

Monitoring and Logging Issues

Missing or Incomplete Logs

Symptoms:

  • Empty log viewer in dashboard
  • Missing application logs
  • Only system logs visible

Solutions:

1. Use proper logging:

import logging
import sys

# Configure logging for containerized environment
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    stream=sys.stdout  # Important: output to stdout
)

logger = logging.getLogger(__name__)

# Use structured logging
logger.info("Processing started", extra={
    'user_id': user_id,
    'action': 'process_data',
    'input_size': len(data)
})

2. Flush output buffers:

import sys

# Ensure immediate output
print("Status update", flush=True)

# For Python apps, set unbuffered output
# In Dockerfile: ENV PYTHONUNBUFFERED=1

3. Log to the right streams:

import sys

# Application logs → stdout
print("INFO: Application started", file=sys.stdout)

# Error logs → stderr  
print("ERROR: Something failed", file=sys.stderr)

Performance Monitoring Issues

Symptoms:

  • Missing metrics in dashboard
  • Inaccurate resource usage data
  • Health check failures

Solutions:

1. Implement proper health checks:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/health')
def health_check():
    checks = {
        'database': check_database_connection(),
        'external_api': check_external_api(),
        'memory_usage': check_memory_usage()
    }
    
    all_healthy = all(checks.values())
    status_code = 200 if all_healthy else 503
    
    return jsonify({
        'status': 'healthy' if all_healthy else 'unhealthy',
        'checks': checks,
        'timestamp': datetime.utcnow().isoformat()
    }), status_code

2. Expose custom metrics:

from prometheus_client import Counter, Histogram, generate_latest
from flask import Response

REQUEST_COUNT = Counter('requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('request_duration_seconds', 'Request duration')

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')

Getting Help

Before Contacting Support

1. Collect diagnostic information:

  • Instance ID and organization name
  • Error messages (full text)
  • Screenshots of issues
  • Steps to reproduce the problem
  • Timeline of when issue started

2. Check status page:

  • Visit AgentHub Status for service outages
  • Check if issue affects multiple users

3. Search documentation:

  • Use search in documentation
  • Check relevant API reference sections
  • Review best practices guides

Contact Options

Dashboard Support:

  • Use support chat in dashboard (fastest response)
  • Include instance/agent IDs for faster resolution
  • Describe exact steps that led to the issue

Community Resources:

  • GitHub Discussions
  • Share non-sensitive debugging information
  • Learn from other developers' experiences

Emergency Issues:

  • For production outages affecting business operations
  • Use priority support channels if available
  • Include impact assessment and urgency level

Need more help? Visit our comprehensive documentation or use the support chat in your dashboard.