Troubleshooting Guide
This guide covers solutions for the most common issues you might encounter when using AgentHub. Issues are organized by category with step-by-step solutions.
Build Issues
Build Failed: Dependencies Not Found
Symptoms:
- Build fails during dependency installation
- Error messages like "ModuleNotFoundError" or "Package not found"
- Build logs show pip/npm/yarn failures
Solutions:
For Python projects:
# 1. Verify requirements.txt exists and includes all dependencies
pip freeze > requirements.txt
# 2. Test locally first
pip install -r requirements.txt
python main.py
# 3. Use specific versions to avoid conflicts
requests==2.31.0
flask==2.3.3
openai>=1.0.0
For Node.js projects:
# 1. Ensure package.json has all dependencies
npm install # Test locally
# 2. Lock versions with package-lock.json
npm ci # Uses exact versions from lock file
# 3. Check for peer dependency warnings
npm ls # Shows dependency tree and conflicts
For Go projects:
# Ensure go.mod exists with proper module path
module github.com/yourusername/your-agent
go 1.21
Build Failed: Dockerfile Errors
Symptoms:
- Custom Dockerfile fails to build
- "COPY failed" or "RUN command failed" errors
- Wrong base image or missing files
Solutions:
Common Dockerfile issues:
# ❌ Wrong: Relative paths may not work
COPY ./src /app
# ✅ Correct: Use absolute paths or proper context
COPY src/ /app/src/
# ❌ Wrong: Missing WORKDIR
COPY requirements.txt .
RUN pip install -r requirements.txt
# ✅ Correct: Set working directory first
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# ❌ Wrong: Running as root (security risk)
USER root
# ✅ Correct: Use non-root user
USER 1000:1000
Optimization tips:
# Multi-stage builds for smaller images
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]
Build Timeout
Symptoms:
- Build stops after 30 minutes
- "Build timeout exceeded" error
- Long-running installations hang
Solutions:
- Optimize build steps:
# Use faster package mirrors
RUN pip install -r requirements.txt -i https://pypi.org/simple/
# Install system packages efficiently
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
- Use build cache:
# Copy requirements first (changes less often)
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy source code last (changes frequently)
COPY . .
- Pre-built base images:
# Use images with pre-installed dependencies
FROM python:3.11-slim # Instead of ubuntu + python install
FROM node:18-alpine # Instead of alpine + node install
Instance Issues
Instance Won't Start
Symptoms:
- Instance status stuck on "Starting"
- Container exits immediately after starting
- Health checks failing
Solutions:
1. Check environment variables:
# Common issues:
❌ Missing required variables
❌ Typos in variable names
❌ Wrong variable formats
✅ Verify all required env vars are set:
API_KEY=your-actual-key-here
DATABASE_URL=postgresql://...
PORT=8080
2. Verify port configuration:
# ❌ Wrong: Hardcoded port
app.run(port=5000)
# ✅ Correct: Use PORT environment variable
import os
port = int(os.environ.get('PORT', 8080))
app.run(host='0.0.0.0', port=port)
3. Check startup command:
# ❌ Wrong: Command that exits immediately
CMD ["echo", "Hello World"]
# ✅ Correct: Long-running process
CMD ["python", "main.py"]
# ✅ Correct: Web server
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]
Instance Crashes or Restarts Frequently
Symptoms:
- Instance status changes from "Running" to "Error" repeatedly
- High restart count in instance details
- Container exits with non-zero code
Solutions:
1. Memory issues:
# Increase memory limits
resources:
memory: "1Gi" # From 512Mi
memory_limit: "2Gi" # Set appropriate limit
2. Add error handling:
import logging
import traceback
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
try:
# Your main application logic
app.run()
except Exception as e:
logger.error(f"Application crashed: {e}")
logger.error(traceback.format_exc())
# Don't exit immediately, allow restart
time.sleep(10)
3. Implement health checks:
from flask import Flask
app = Flask(__name__)
@app.route('/health')
def health():
try:
# Check database connection
# Check external API access
# Check critical services
return {'status': 'healthy'}, 200
except Exception as e:
return {'status': 'unhealthy', 'error': str(e)}, 500
Instance Performance Issues
Symptoms:
- High CPU or memory usage
- Slow response times
- Resource limit warnings
Solutions:
1. Monitor resource usage:
import psutil
import logging
def log_resource_usage():
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
logging.info(f"CPU: {cpu_percent}%, Memory: {memory.percent}%")
if cpu_percent > 80:
logging.warning("High CPU usage detected")
if memory.percent > 80:
logging.warning("High memory usage detected")
2. Optimize code performance:
# Use connection pooling for databases
import sqlalchemy.pool as pool
engine = create_engine(
database_url,
poolclass=pool.QueuePool,
pool_size=5,
max_overflow=10
)
# Cache expensive operations
from functools import lru_cache
@lru_cache(maxsize=1000)
def expensive_function(param):
# Heavy computation here
return result
# Use async for I/O operations
import asyncio
import aiohttp
async def fetch_data(urls):
async with aiohttp.ClientSession() as session:
tasks = [session.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return responses
3. Scale resources appropriately:
# For CPU-intensive tasks
resources:
cpu: "2.0"
cpu_limit: "4.0"
# For memory-intensive tasks
resources:
memory: "4Gi"
memory_limit: "8Gi"
# For high-traffic applications
replicas: 3
Authentication and Access Issues
Can't Sign In
Symptoms:
- "Authentication failed" errors
- Stuck on GitHub OAuth flow
- "Access denied" messages
Solutions:
1. Clear browser data:
# Clear cache and cookies for:
# - https://www.useagenthub.com
# - https://github.com
# - https://prod-agent-hosting-api.useagenthub.com
2. Check GitHub permissions:
- Go to GitHub → Settings → Applications
- Find AgentHub in "Authorized OAuth Apps"
- Verify permissions are granted
- Revoke and re-authorize if needed
3. Organization membership:
- Ensure you're a member of the organization
- Check organization visibility settings
- Verify your GitHub email is verified
API Key Issues
Symptoms:
- "Invalid API key" errors
- "Insufficient permissions" responses
- API calls returning 401/403 errors
Solutions:
1. Verify API key format:
# Correct format
ah_prod_1234567890abcdef...
# Check key hasn't expired
curl -H "Authorization: Bearer your-api-key" \
https://prod-agent-hosting-api.useagenthub.com/auth/status
2. Check key permissions:
// API response shows scopes
{
"authenticated": true,
"scopes": ["agents:read", "instances:write", "builds:read"]
}
3. Generate new key:
- Go to Dashboard → Settings → API Keys
- Click "Generate New Key"
- Set appropriate scopes
- Update your applications with new key
Repository and GitHub Issues
Repository Access Denied
Symptoms:
- "Cannot access repository" errors
- Build fails to clone repository
- "Repository not found" messages
Solutions:
1. Check repository permissions:
# Verify you own the repository or have access
# For organization repos, ensure proper team membership
2. Grant AgentHub access:
- Go to GitHub → Settings → Applications
- Find AgentHub OAuth app
- Grant access to specific repositories or all repositories
3. Private repository issues:
# Ensure GitHub token has repo scope
# Check organization third-party access settings
Webhook Issues
Symptoms:
- Auto-builds not triggering on push
- "Webhook delivery failed" in GitHub
- Stale builds despite code changes
Solutions:
1. Check webhook configuration:
- Go to GitHub repository → Settings → Webhooks
- Verify AgentHub webhook exists and is active
- Check webhook URL and events
2. Re-establish webhooks:
- Disconnect repository in AgentHub
- Reconnect with proper permissions
- Verify webhook creation
Environment Variable Issues
Environment Variables Not Working
Symptoms:
- Agent can't access configuration values
- "Environment variable not found" errors
- Default values being used instead of configured ones
Solutions:
1. Verify variable names:
# ❌ Wrong: Case sensitive mismatch
api_key = os.environ.get('api_key') # Looking for lowercase
# But configured as: API_KEY=value
# ✅ Correct: Exact case match
api_key = os.environ.get('API_KEY')
2. Check variable scope:
# Instance-level variables (user sets these)
env_vars:
API_KEY: "" # Empty in agent template
USER_CONFIG: "" # User provides actual values
# Build-time variables (set in agent config)
env_vars:
NODE_ENV: "production" # Set by developer
LOG_LEVEL: "INFO" # Default value
3. Handle missing variables gracefully:
import os
# ❌ Wrong: Will crash if missing
api_key = os.environ['API_KEY']
# ✅ Correct: Provide defaults and validation
api_key = os.environ.get('API_KEY')
if not api_key:
raise ValueError("API_KEY environment variable is required")
# ✅ Best: Use default values
log_level = os.environ.get('LOG_LEVEL', 'INFO')
timeout = int(os.environ.get('TIMEOUT_SECONDS', '30'))
Network and Connectivity Issues
Agent Can't Access External APIs
Symptoms:
- Timeout errors when calling external services
- "Connection refused" or "DNS resolution failed"
- Works locally but fails in AgentHub
Solutions:
1. Add retry logic:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_session():
session = requests.Session()
retry_strategy = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
backoff_factor=1
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
# Use session for all requests
session = create_session()
response = session.get('https://api.example.com/data')
2. Handle rate limits:
import time
import requests
def api_call_with_backoff(url, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code == 200:
return response.json()
elif response.status_code == 429: # Rate limited
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
else:
response.raise_for_status()
raise Exception(f"Failed after {max_retries} attempts")
3. Configure timeouts:
# Set reasonable timeouts
response = requests.get(
'https://api.example.com/data',
timeout=(5, 30) # (connection_timeout, read_timeout)
)
Monitoring and Logging Issues
Missing or Incomplete Logs
Symptoms:
- Empty log viewer in dashboard
- Missing application logs
- Only system logs visible
Solutions:
1. Use proper logging:
import logging
import sys
# Configure logging for containerized environment
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
stream=sys.stdout # Important: output to stdout
)
logger = logging.getLogger(__name__)
# Use structured logging
logger.info("Processing started", extra={
'user_id': user_id,
'action': 'process_data',
'input_size': len(data)
})
2. Flush output buffers:
import sys
# Ensure immediate output
print("Status update", flush=True)
# For Python apps, set unbuffered output
# In Dockerfile: ENV PYTHONUNBUFFERED=1
3. Log to the right streams:
import sys
# Application logs → stdout
print("INFO: Application started", file=sys.stdout)
# Error logs → stderr
print("ERROR: Something failed", file=sys.stderr)
Performance Monitoring Issues
Symptoms:
- Missing metrics in dashboard
- Inaccurate resource usage data
- Health check failures
Solutions:
1. Implement proper health checks:
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/health')
def health_check():
checks = {
'database': check_database_connection(),
'external_api': check_external_api(),
'memory_usage': check_memory_usage()
}
all_healthy = all(checks.values())
status_code = 200 if all_healthy else 503
return jsonify({
'status': 'healthy' if all_healthy else 'unhealthy',
'checks': checks,
'timestamp': datetime.utcnow().isoformat()
}), status_code
2. Expose custom metrics:
from prometheus_client import Counter, Histogram, generate_latest
from flask import Response
REQUEST_COUNT = Counter('requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('request_duration_seconds', 'Request duration')
@app.route('/metrics')
def metrics():
return Response(generate_latest(), mimetype='text/plain')
Getting Help
Before Contacting Support
1. Collect diagnostic information:
- Instance ID and organization name
- Error messages (full text)
- Screenshots of issues
- Steps to reproduce the problem
- Timeline of when issue started
2. Check status page:
- Visit AgentHub Status for service outages
- Check if issue affects multiple users
3. Search documentation:
- Use search in documentation
- Check relevant API reference sections
- Review best practices guides
Contact Options
Dashboard Support:
- Use support chat in dashboard (fastest response)
- Include instance/agent IDs for faster resolution
- Describe exact steps that led to the issue
Community Resources:
- GitHub Discussions
- Share non-sensitive debugging information
- Learn from other developers' experiences
Emergency Issues:
- For production outages affecting business operations
- Use priority support channels if available
- Include impact assessment and urgency level
Need more help? Visit our comprehensive documentation or use the support chat in your dashboard.