Troubleshooting Guide
This guide covers solutions for the most common issues you might encounter when using AgentHub. Issues are organized by category with step-by-step solutions.
Build Issues
Build Failed: Dependencies Not Found
Symptoms:
- Build fails during dependency installation
 - Error messages like "ModuleNotFoundError" or "Package not found"
 - Build logs show pip/npm/yarn failures
 
Solutions:
For Python projects:
# 1. Verify requirements.txt exists and includes all dependencies
pip freeze > requirements.txt
# 2. Test locally first
pip install -r requirements.txt
python main.py
# 3. Use specific versions to avoid conflicts
requests==2.31.0
flask==2.3.3
openai>=1.0.0
For Node.js projects:
# 1. Ensure package.json has all dependencies
npm install  # Test locally
# 2. Lock versions with package-lock.json
npm ci  # Uses exact versions from lock file
# 3. Check for peer dependency warnings
npm ls  # Shows dependency tree and conflicts
For Go projects:
# Ensure go.mod exists with proper module path
module github.com/yourusername/your-agent
go 1.21
Build Failed: Dockerfile Errors
Symptoms:
- Custom Dockerfile fails to build
 - "COPY failed" or "RUN command failed" errors
 - Wrong base image or missing files
 
Solutions:
Common Dockerfile issues:
# ❌ Wrong: Relative paths may not work
COPY ./src /app
# ✅ Correct: Use absolute paths or proper context
COPY src/ /app/src/
# ❌ Wrong: Missing WORKDIR
COPY requirements.txt .
RUN pip install -r requirements.txt
# ✅ Correct: Set working directory first
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# ❌ Wrong: Running as root (security risk)
USER root
# ✅ Correct: Use non-root user
USER 1000:1000
Optimization tips:
# Multi-stage builds for smaller images
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]
Build Timeout
Symptoms:
- Build stops after 30 minutes
 - "Build timeout exceeded" error
 - Long-running installations hang
 
Solutions:
- Optimize build steps:
 
# Use faster package mirrors
RUN pip install -r requirements.txt -i https://pypi.org/simple/
# Install system packages efficiently
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*
- Use build cache:
 
# Copy requirements first (changes less often)
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy source code last (changes frequently)
COPY . .
- Pre-built base images:
 
# Use images with pre-installed dependencies
FROM python:3.11-slim  # Instead of ubuntu + python install
FROM node:18-alpine    # Instead of alpine + node install
Instance Issues
Instance Won't Start
Symptoms:
- Instance status stuck on "Starting"
 - Container exits immediately after starting
 - Health checks failing
 
Solutions:
1. Check environment variables:
# Common issues:
❌ Missing required variables
❌ Typos in variable names
❌ Wrong variable formats
✅ Verify all required env vars are set:
API_KEY=your-actual-key-here
DATABASE_URL=postgresql://...
PORT=8080
2. Verify port configuration:
# ❌ Wrong: Hardcoded port
app.run(port=5000)
# ✅ Correct: Use PORT environment variable
import os
port = int(os.environ.get('PORT', 8080))
app.run(host='0.0.0.0', port=port)
3. Check startup command:
# ❌ Wrong: Command that exits immediately
CMD ["echo", "Hello World"]
# ✅ Correct: Long-running process
CMD ["python", "main.py"]
# ✅ Correct: Web server
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]
Instance Crashes or Restarts Frequently
Symptoms:
- Instance status changes from "Running" to "Error" repeatedly
 - High restart count in instance details
 - Container exits with non-zero code
 
Solutions:
1. Memory issues:
# Increase memory limits
resources:
  memory: "1Gi"        # From 512Mi
  memory_limit: "2Gi"  # Set appropriate limit
2. Add error handling:
import logging
import traceback
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
try:
    # Your main application logic
    app.run()
except Exception as e:
    logger.error(f"Application crashed: {e}")
    logger.error(traceback.format_exc())
    # Don't exit immediately, allow restart
    time.sleep(10)
3. Implement health checks:
from flask import Flask
app = Flask(__name__)
@app.route('/health')
def health():
    try:
        # Check database connection
        # Check external API access
        # Check critical services
        return {'status': 'healthy'}, 200
    except Exception as e:
        return {'status': 'unhealthy', 'error': str(e)}, 500
Instance Performance Issues
Symptoms:
- High CPU or memory usage
 - Slow response times
 - Resource limit warnings
 
Solutions:
1. Monitor resource usage:
import psutil
import logging
def log_resource_usage():
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    
    logging.info(f"CPU: {cpu_percent}%, Memory: {memory.percent}%")
    
    if cpu_percent > 80:
        logging.warning("High CPU usage detected")
    if memory.percent > 80:
        logging.warning("High memory usage detected")
2. Optimize code performance:
# Use connection pooling for databases
import sqlalchemy.pool as pool
engine = create_engine(
    database_url,
    poolclass=pool.QueuePool,
    pool_size=5,
    max_overflow=10
)
# Cache expensive operations
from functools import lru_cache
@lru_cache(maxsize=1000)
def expensive_function(param):
    # Heavy computation here
    return result
# Use async for I/O operations
import asyncio
import aiohttp
async def fetch_data(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return responses
3. Scale resources appropriately:
# For CPU-intensive tasks
resources:
  cpu: "2.0"
  cpu_limit: "4.0"
# For memory-intensive tasks  
resources:
  memory: "4Gi"
  memory_limit: "8Gi"
# For high-traffic applications
replicas: 3
Authentication and Access Issues
Can't Sign In
Symptoms:
- "Authentication failed" errors
 - Stuck on GitHub OAuth flow
 - "Access denied" messages
 
Solutions:
1. Clear browser data:
# Clear cache and cookies for:
# - https://www.useagenthub.com
# - https://github.com
# - https://prod-agent-hosting-api.useagenthub.com
2. Check GitHub permissions:
- Go to GitHub → Settings → Applications
 - Find AgentHub in "Authorized OAuth Apps"
 - Verify permissions are granted
 - Revoke and re-authorize if needed
 
3. Organization membership:
- Ensure you're a member of the organization
 - Check organization visibility settings
 - Verify your GitHub email is verified
 
API Key Issues
Symptoms:
- "Invalid API key" errors
 - "Insufficient permissions" responses
 - API calls returning 401/403 errors
 
Solutions:
1. Verify API key format:
# Correct format
ah_prod_1234567890abcdef...
# Check key hasn't expired
curl -H "Authorization: Bearer your-api-key" \
  https://prod-agent-hosting-api.useagenthub.com/auth/status
2. Check key permissions:
// API response shows scopes
{
  "authenticated": true,
  "scopes": ["agents:read", "instances:write", "builds:read"]
}
3. Generate new key:
- Go to Dashboard → Settings → API Keys
 - Click "Generate New Key"
 - Set appropriate scopes
 - Update your applications with new key
 
Repository and GitHub Issues
Repository Access Denied
Symptoms:
- "Cannot access repository" errors
 - Build fails to clone repository
 - "Repository not found" messages
 
Solutions:
1. Check repository permissions:
# Verify you own the repository or have access
# For organization repos, ensure proper team membership
2. Grant AgentHub access:
- Go to GitHub → Settings → Applications
 - Find AgentHub OAuth app
 - Grant access to specific repositories or all repositories
 
3. Private repository issues:
# Ensure GitHub token has repo scope
# Check organization third-party access settings
Webhook Issues
Symptoms:
- Auto-builds not triggering on push
 - "Webhook delivery failed" in GitHub
 - Stale builds despite code changes
 
Solutions:
1. Check webhook configuration:
- Go to GitHub repository → Settings → Webhooks
 - Verify AgentHub webhook exists and is active
 - Check webhook URL and events
 
2. Re-establish webhooks:
- Disconnect repository in AgentHub
 - Reconnect with proper permissions
 - Verify webhook creation
 
Environment Variable Issues
Environment Variables Not Working
Symptoms:
- Agent can't access configuration values
 - "Environment variable not found" errors
 - Default values being used instead of configured ones
 
Solutions:
1. Verify variable names:
# ❌ Wrong: Case sensitive mismatch
api_key = os.environ.get('api_key')  # Looking for lowercase
# But configured as: API_KEY=value
# ✅ Correct: Exact case match
api_key = os.environ.get('API_KEY')
2. Check variable scope:
# Instance-level variables (user sets these)
env_vars:
  API_KEY: ""         # Empty in agent template
  USER_CONFIG: ""     # User provides actual values
# Build-time variables (set in agent config)
env_vars:
  NODE_ENV: "production"  # Set by developer
  LOG_LEVEL: "INFO"       # Default value
3. Handle missing variables gracefully:
import os
# ❌ Wrong: Will crash if missing
api_key = os.environ['API_KEY']
# ✅ Correct: Provide defaults and validation
api_key = os.environ.get('API_KEY')
if not api_key:
    raise ValueError("API_KEY environment variable is required")
# ✅ Best: Use default values
log_level = os.environ.get('LOG_LEVEL', 'INFO')
timeout = int(os.environ.get('TIMEOUT_SECONDS', '30'))
Network and Connectivity Issues
Agent Can't Access External APIs
Symptoms:
- Timeout errors when calling external services
 - "Connection refused" or "DNS resolution failed"
 - Works locally but fails in AgentHub
 
Solutions:
1. Add retry logic:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_session():
    session = requests.Session()
    
    retry_strategy = Retry(
        total=3,
        status_forcelist=[429, 500, 502, 503, 504],
        backoff_factor=1
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    return session
# Use session for all requests
session = create_session()
response = session.get('https://api.example.com/data')
2. Handle rate limits:
import time
import requests
def api_call_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:  # Rate limited
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
        else:
            response.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} attempts")
3. Configure timeouts:
# Set reasonable timeouts
response = requests.get(
    'https://api.example.com/data',
    timeout=(5, 30)  # (connection_timeout, read_timeout)
)
Monitoring and Logging Issues
Missing or Incomplete Logs
Symptoms:
- Empty log viewer in dashboard
 - Missing application logs
 - Only system logs visible
 
Solutions:
1. Use proper logging:
import logging
import sys
# Configure logging for containerized environment
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    stream=sys.stdout  # Important: output to stdout
)
logger = logging.getLogger(__name__)
# Use structured logging
logger.info("Processing started", extra={
    'user_id': user_id,
    'action': 'process_data',
    'input_size': len(data)
})
2. Flush output buffers:
import sys
# Ensure immediate output
print("Status update", flush=True)
# For Python apps, set unbuffered output
# In Dockerfile: ENV PYTHONUNBUFFERED=1
3. Log to the right streams:
import sys
# Application logs → stdout
print("INFO: Application started", file=sys.stdout)
# Error logs → stderr  
print("ERROR: Something failed", file=sys.stderr)
Performance Monitoring Issues
Symptoms:
- Missing metrics in dashboard
 - Inaccurate resource usage data
 - Health check failures
 
Solutions:
1. Implement proper health checks:
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/health')
def health_check():
    checks = {
        'database': check_database_connection(),
        'external_api': check_external_api(),
        'memory_usage': check_memory_usage()
    }
    
    all_healthy = all(checks.values())
    status_code = 200 if all_healthy else 503
    
    return jsonify({
        'status': 'healthy' if all_healthy else 'unhealthy',
        'checks': checks,
        'timestamp': datetime.utcnow().isoformat()
    }), status_code
2. Expose custom metrics:
from prometheus_client import Counter, Histogram, generate_latest
from flask import Response
REQUEST_COUNT = Counter('requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('request_duration_seconds', 'Request duration')
@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')
Getting Help
Before Contacting Support
1. Collect diagnostic information:
- Instance ID and organization name
 - Error messages (full text)
 - Screenshots of issues
 - Steps to reproduce the problem
 - Timeline of when issue started
 
2. Check status page:
- Visit AgentHub Status for service outages
 - Check if issue affects multiple users
 
3. Search documentation:
- Use search in documentation
 - Check relevant API reference sections
 - Review best practices guides
 
Contact Options
Dashboard Support:
- Use support chat in dashboard (fastest response)
 - Include instance/agent IDs for faster resolution
 - Describe exact steps that led to the issue
 
Community Resources:
- GitHub Discussions
 - Share non-sensitive debugging information
 - Learn from other developers' experiences
 
Emergency Issues:
- For production outages affecting business operations
 - Use priority support channels if available
 - Include impact assessment and urgency level
 
Need more help? Visit our comprehensive documentation or use the support chat in your dashboard.