Level 4: Expert

Architecture & Scale

Deep dive into AgentAIShield's system architecture: request routing internals, Trust Score algorithm mathematics, job scheduler design, performance optimization strategies, high-availability patterns, API rate limits, and custom deployment configurations for on-premise or air-gapped environments.

75 minutes

8 sections

Expert

System Architecture Overview

AgentAIShield is built on a modern, scalable stack optimized for real-time AI traffic analysis.

Technology Stack

Backend: Node.js 20+ with Express framework
Database: PostgreSQL 15+ with pgvector extension for semantic search (production), SQLite (development)
Database Abstraction: Repository pattern in `db/index.js` supporting both SQLite and PostgreSQL with identical codebase
Real-time: WebSocket (Socket.IO) for live dashboard updates
Caching: Redis for session management and rate limiting
Message Queue: BullMQ for background job processing
ML Runtime: ONNX Runtime for inference (CPU-optimized)
Frontend: Vanilla JavaScript with Phosphor Icons, dark theme design system (#090b0d, #84cc16 AAIS signal)

Database Abstraction Layer

AgentAIShield uses a clean database abstraction layer that allows seamless switching between SQLite (local development) and PostgreSQL (production).

// db/index.js - Unified interface
class Database {
  constructor() {
    const dbType = process.env.DB_TYPE || 'sqlite';
    if (dbType === 'postgres') {
      this.client = new PostgresClient();
    } else {
      this.client = new SqliteClient();
    }
  }

  async query(sql, params) {
    return await this.client.query(sql, params);
  }

  async all(sql, params) {
    return await this.client.all(sql, params);
  }

  async run(sql, params) {
    return await this.client.run(sql, params);
  }
}

// Same codebase works locally (SQLite) and in production (Postgres)
const db = new Database();
const agents = await db.all('SELECT * FROM agents WHERE org_id = ?', [orgId]);

Benefits of Database Abstraction:

Local development: No need to run PostgreSQL locally — SQLite works out of the box
Production deployment: Automatically uses PostgreSQL on Railway/AWS
Same codebase: Repository pattern isolates SQL queries from business logic
Easy testing: Tests run against SQLite in-memory database (fast)

Automated Testing Framework

AgentAIShield implements a 4-layer automated testing pyramid to catch bugs before production:

Testing Pyramid (Monthly Cost: ~$2.50):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Layer 1: Smoke Tests (Shell Scripts)
├─ Health checks every 5 minutes
├─ Server alive, database connected, static files served
├─ Cost: $0 (no AI tokens)
└─ Script: tests/smoke.sh

Layer 2: API Contract Tests (Node.js)
├─ Daily validation of all 51 critical API routes
├─ Status codes, response schemas, auth checks
├─ Cost: ~$0.02/day ($0.60/month)
└─ Script: tests/api-contract.js

Layer 3: User Flow Tests (Node.js)
├─ Daily end-to-end journeys (7 flows, 42 steps)
├─ Multi-step workflows: onboarding, incidents, policies
├─ Cost: ~$0.05/day ($1.50/month)
└─ Script: tests/user-flows.js

Layer 4: Visual/UI Tests (Browser)
├─ Weekly screenshot audits of all 44 pages
├─ Console error checks, dark theme validation
├─ Cost: ~$0.15/week ($0.60/month)
└─ Script: tests/visual-qa.js

Automated Test Cron Jobs:

aais-nightly-qa: Runs daily at 3 AM (API contract + user flows)
aais-weekly-visual-qa: Runs Sundays at 2 AM (browser screenshots)

// Example: API contract test
const tests = [
  { name: 'POST /v1/monitor', method: 'POST', path: '/v1/monitor', 
    body: { agent_id: 'test', messages: [...] }, expectedStatus: 200 },
  { name: 'GET /v1/trust/agents', method: 'GET', path: '/v1/trust/agents', 
    expectedStatus: 200 },
  { name: 'POST /v1/trustshield/verify', method: 'POST', 
    path: '/v1/trustshield/verify', body: { agent_id: 'test', response: '...' }, 
    expectedStatus: 200 }
];

for (const test of tests) {
  const response = await fetch(`${API_BASE}${test.path}`, {
    method: test.method,
    headers: { 'X-API-Key': TEST_KEY },
    body: test.body ? JSON.stringify(test.body) : undefined
  });
  
  if (response.status !== test.expectedStatus) {
    console.error(`FAIL: ${test.name} — Expected ${test.expectedStatus}, got ${response.status}`);
  } else {
    console.log(`PASS: ${test.name}`);
  }
}

Test Coverage

First run results: 60.8% API contract pass rate (20 failures due to rate limiting), 83% user flow coverage (35/42 steps passed). Comprehensive testing catches 95% of regressions before production deployment.

// Core dependency versions (package.json)
{
  "dependencies": {
    "express": "^4.19.2",
    "pg": "^8.11.3",
    "pgvector": "^0.1.8",
    "socket.io": "^4.7.2",
    "redis": "^4.6.13",
    "bullmq": "^5.4.0",
    "onnxruntime-node": "^1.17.0",
    "openai": "^4.28.0",
    "anthropic": "^0.20.0"
  }
}

High-Level Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT APPLICATIONS                      │
│  (SDKs, Proxy Mode, Direct API Integration, Dashboard WebUI)   │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                       LOAD BALANCER / CDN                       │
│              (NGINX, Cloudflare, AWS ALB, etc.)                 │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AAIS API GATEWAY (Express)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Auth Layer   │  │ Rate Limiter │  │ Request ID   │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└────┬───────────────────────┬──────────────────────┬────────────┘
     │                       │                      │
     ▼                       ▼                      ▼
┌──────────────┐  ┌──────────────────┐  ┌──────────────────┐
│  PostgreSQL  │  │  Redis Cache     │  │  BullMQ Jobs     │
│  (Data)      │  │  (Sessions,      │  │  (Background)    │
│  pgvector    │  │   Rate Limits)   │  │                  │
└──────────────┘  └──────────────────┘  └──────────────────┘
     │                       │                      │
     └───────────────────────┴──────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PROCESSING PIPELINE                        │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌──────────┐ │
│  │ PII Detect │→ │ Injection  │→ │ Behavioral │→ │ Logging  │ │
│  │  (NER+Regex│  │ Classifier │  │ Fingerprint│  │ + Metrics│ │
│  └────────────┘  └────────────┘  └────────────┘  └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                   LLM PROVIDER (Proxy Mode)                     │
│   OpenAI, Anthropic, Google, Mistral, Groq, etc.               │
└─────────────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                     OUTPUT PROCESSING                           │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐               │
│  │ PII Filter │→ │ TrustShield│→ │ Policy     │               │
│  │  (Output)  │  │ Verify     │  │ Enforcement│               │
│  └────────────┘  └────────────┘  └────────────┘               │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│               WEBSOCKET SERVER (Real-time Updates)              │
│         Dashboard, Alerts, Live Metrics, Trust Scores           │
└─────────────────────────────────────────────────────────────────┘

Request Flow

Average request latency: 45ms (PII scan) + 12ms (injection detect) + LLM latency + 8ms (output filtering) = ~65ms overhead plus LLM time. P99 latency: 180ms overhead.

REST API Design

AgentAIShield exposes a RESTful API following best practices:

Base URL: https://agentaishield.com/api/v1/
Authentication: Bearer token (sk_live_... or sk_test_...)
Versioning: URL-based (/v1/, /v2/) for backward compatibility
Idempotency: Idempotency keys for POST/PATCH requests
Pagination: Cursor-based for large datasets
Error handling: Consistent JSON error responses with request IDs

// Example API request
POST /v1/analyze
Authorization: Bearer sk_live_abc123
Content-Type: application/json
X-Idempotency-Key: unique-request-id-123

{
  "agent_id": "my-chatbot",
  "messages": [
    { "role": "user", "content": "Hello, what's my account balance?" }
  ],
  "options": {
    "pii_detection": true,
    "injection_detection": true,
    "output_filtering": true
  }
}

// Success response (HTTP 200)
{
  "request_id": "req_xyz789",
  "agent_id": "my-chatbot",
  "threats_detected": 0,
  "pii_found": [],
  "trust_score": 92,
  "processing_time_ms": 67,
  "safe": true
}

// Error response (HTTP 429 - Rate Limit)
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Monthly request limit reached (50,000)",
    "request_id": "req_abc456",
    "docs": "https://docs.agentaishield.com/errors/rate-limit"
  }
}

Proxy Gateway Internals

Proxy Mode intercepts LLM traffic, analyzes it in real-time, and forwards to the provider. Let's examine the request flow.

Request Routing Flow

1. Client → AAIS Proxy Gateway
   ├─ URL: https://agentaishield.com/api/v1/proxy/openai/chat/completions
   ├─ Headers: Authorization (AAIS key), X-Target-Provider (OpenAI key)
   └─ Body: Standard OpenAI request format

2. Authentication & Authorization
   ├─ Validate AAIS API key
   ├─ Check tier limits (Free: 50K/mo, Starter: 500K/mo)
   ├─ Verify X-Target-Provider key format
   └─ Rate limit check (Redis-backed sliding window)

3. Pre-Processing Pipeline
   ├─ Extract messages from request body
   ├─ Run PII detection (NER + Regex) ─────→ 15-25ms
   ├─ Run injection detection (ML classifier) ─→ 10-15ms
   ├─ Behavioral fingerprinting ─────────────→ 5ms
   └─ Decision: Block or Forward?

4. Provider Request (if allowed)
   ├─ Select provider endpoint (OpenAI, Anthropic, etc.)
   ├─ Transform request to provider format
   ├─ Add retry logic (exponential backoff)
   ├─ Forward with provider API key
   └─ Stream response back to client ─────→ Provider latency

5. Post-Processing Pipeline
   ├─ Capture full response body
   ├─ Run output PII filtering ───────────→ 8-12ms
   ├─ TrustShield verification (optional) → 120ms
   ├─ Policy enforcement checks ──────────→ 3ms
   └─ Log metrics (async, non-blocking)

6. Response → Client
   ├─ Stream final response
   ├─ Add custom headers (X-AAIS-Request-ID, X-Trust-Score)
   └─ WebSocket notification (live dashboard update)

Provider Abstraction Layer

AAIS supports 10+ LLM providers with a unified interface:

// Provider adapter pattern (simplified)
class ProviderAdapter {
  constructor(provider, apiKey) {
    this.provider = provider;
    this.apiKey = apiKey;
    this.baseURL = this.getBaseURL(provider);
  }

  async chat(messages, options) {
    const request = this.transformRequest(messages, options);
    const response = await this.sendRequest(request);
    return this.transformResponse(response);
  }

  transformRequest(messages, options) {
    // Convert to provider-specific format
    switch (this.provider) {
      case 'openai':
        return { model: options.model, messages, ...options };
      case 'anthropic':
        return { 
          model: options.model, 
          messages: this.convertToAnthropicFormat(messages),
          max_tokens: options.max_tokens 
        };
      case 'google':
        return this.convertToGeminiFormat(messages, options);
      // ... other providers
    }
  }

  async sendRequest(request) {
    return await fetch(this.baseURL, {
      method: 'POST',
      headers: this.getHeaders(),
      body: JSON.stringify(request)
    });
  }
}

// Supported providers
const PROVIDERS = {
  'openai': { endpoint: 'https://api.openai.com/v1/chat/completions' },
  'anthropic': { endpoint: 'https://api.anthropic.com/v1/messages' },
  'google': { endpoint: 'https://generativelanguage.googleapis.com/v1beta/...' },
  'mistral': { endpoint: 'https://api.mistral.ai/v1/chat/completions' },
  'groq': { endpoint: 'https://api.groq.com/openai/v1/chat/completions' },
  'together': { endpoint: 'https://api.together.xyz/v1/chat/completions' },
  'anyscale': { endpoint: 'https://api.endpoints.anyscale.com/v1/chat/completions' }
};

Latency Optimization

Multiple strategies minimize proxy overhead:

Parallel processing: PII detection and injection classification run concurrently
Streaming passthrough: Start streaming response before full completion
Async logging: Metrics/logs written to queue, not inline
Connection pooling: Reuse HTTP connections to providers
Edge caching: Cache model metadata, configuration
Request batching: Batch analytics queries (every 10 seconds vs per-request)

// Parallel processing example
async function analyzeRequest(messages) {
  const [piiResults, injectionResults, behaviorResults] = await Promise.all([
    detectPII(messages),           // 15ms
    detectInjection(messages),     // 12ms
    analyzeBehavior(messages)      // 5ms
  ]);
  
  // Total time: ~15ms (longest task), not 32ms (sum)
  return { pii: piiResults, injection: injectionResults, behavior: behaviorResults };
}

Retry Logic with Exponential Backoff

When provider requests fail, AAIS retries intelligently:

async function sendWithRetry(request, maxRetries = 3) {
  let attempt = 0;
  let delay = 1000; // Start with 1 second

  while (attempt < maxRetries) {
    try {
      const response = await fetch(providerURL, request);
      
      if (response.status === 200) {
        return response;
      }
      
      // Retry on specific errors
      if ([429, 500, 502, 503, 504].includes(response.status)) {
        attempt++;
        if (attempt >= maxRetries) throw new Error('Max retries exceeded');
        
        // Exponential backoff with jitter
        await sleep(delay + Math.random() * 1000);
        delay *= 2; // 1s → 2s → 4s
        continue;
      }
      
      // Don't retry on client errors (400, 401, 403)
      throw new Error(`HTTP ${response.status}`);
      
    } catch (error) {
      if (attempt >= maxRetries) throw error;
      attempt++;
      await sleep(delay + Math.random() * 1000);
      delay *= 2;
    }
  }
}

Fallback Strategies

When primary providers are unavailable:

Provider failover: Automatically switch to backup provider (OpenAI → Anthropic)
Model downgrade: Fall back to cheaper/faster model (GPT-4 → GPT-3.5)
Cached responses: Serve similar past responses for identical queries
Graceful degradation: Return safe error with partial analysis results

Failover Configuration

Configure fallback providers in Data Shield settings. Example: Primary = OpenAI GPT-4, Fallback = Anthropic Claude Sonnet. Average failover time: 2.5 seconds.

Trust Score Algorithm

Trust Scores (A+ to F) are calculated using a weighted formula updated in real-time.

Weighted Formula Components

Four metrics contribute to the overall Trust Score:

Trust Score = (
  Error Rate          × 0.30 +
  PII Exposure        × 0.25 +
  Injection Attempts  × 0.25 +
  Behavioral Consistency × 0.20
) × 100

// Each component scored 0-100, then weighted
// Final score: 0-100, mapped to letter grades

1. Error Rate (30% weight)

Percentage of requests resulting in LLM errors, timeouts, or refusals:

Error Rate Score = 100 - (errors / total_requests × 100)

Examples:
- 0 errors in 1000 requests → Score: 100 (perfect)
- 5 errors in 1000 requests → Score: 99.5
- 50 errors in 1000 requests → Score: 95
- 200 errors in 1000 requests → Score: 80 (concerning)

2. PII Exposure (25% weight)

How often PII is detected in requests or responses:

PII Exposure Score = 100 - (pii_incidents / total_requests × 100 × severity_multiplier)

Severity Multipliers:
- Low (email, phone): 1.0x
- Medium (name, address): 2.0x
- High (SSN, credit card, medical): 5.0x

Example:
- 10 emails detected in 1000 requests → (10/1000 × 100 × 1.0) = 1.0 → Score: 99
- 2 SSNs detected in 1000 requests → (2/1000 × 100 × 5.0) = 1.0 → Score: 99

3. Injection Attempts (25% weight)

Frequency and severity of detected injection attacks:

Injection Score = 100 - (injections_detected / total_requests × 100 × confidence_factor)

Confidence Weights:
- Low confidence (0.3-0.5): 0.5x
- Medium confidence (0.5-0.7): 1.0x
- High confidence (0.7-0.9): 2.0x
- Critical confidence (0.9-1.0): 5.0x

Example:
- 15 low-confidence injections → (15/1000 × 100 × 0.5) = 0.75 → Score: 99.25
- 3 critical injections → (3/1000 × 100 × 5.0) = 1.5 → Score: 98.5

4. Behavioral Consistency (20% weight)

How stable the agent's behavior is over time (drift detection):

Behavioral Score = 100 - (drift_events × drift_severity)

Drift Severity:
- Minor drift (latency change): 2 points
- Moderate drift (error rate spike): 5 points
- Major drift (topic shift): 10 points
- Critical drift (system prompt compromise): 25 points

Example:
- 2 minor drift events → Score: 100 - (2 × 2) = 96
- 1 major drift event → Score: 100 - (1 × 10) = 90

Exponential Moving Average (EMA)

Scores are smoothed using EMA to prevent wild swings from single events:

// EMA formula with α (alpha) = 0.2 (configurable)
EMA_new = α × current_score + (1 - α) × EMA_previous

Example:
- Previous Trust Score: 92
- Current raw score (after incident): 78
- EMA (α=0.2): 0.2 × 78 + 0.8 × 92 = 15.6 + 73.6 = 89.2

// Score gradually recovers as agent demonstrates good behavior
// Prevents single false positive from tanking the grade

Real-Time Recalculation

Trust Scores update with configurable frequency:

Per-request mode: Recalculate after every API call (Dashboard live view)
Batched mode: Recalculate every 10 seconds (production default)
Event-driven: Recalculate immediately on critical events (injection detected)
Manual: On-demand recalculation via API

// Recalculation trigger example
async function onRequest(agentId, requestData, result) {
  // Update metrics
  await updateMetrics(agentId, {
    total_requests: 1,
    errors: result.error ? 1 : 0,
    pii_detected: result.pii_count,
    injections: result.injection_detected ? 1 : 0
  });

  // Trigger recalculation if critical event
  if (result.injection_confidence > 0.9 || result.pii_severity === 'high') {
    await recalculateTrustScore(agentId);
    await notifyWebSocket(agentId, 'trust_score_updated');
  }
}

Grade Thresholds (A+ to F)

Letter Grade Mapping:
━━━━━━━━━━━━━━━━━━━━━━━━━━
A+  →  98-100  (Exceptional)
A   →  93-97   (Excellent)
A-  →  90-92   (Very Good)
B+  →  87-89   (Good)
B   →  83-86   (Above Average)
B-  →  80-82   (Average)
C+  →  77-79   (Below Average)
C   →  73-76   (Needs Improvement)
C-  →  70-72   (Poor)
D   →  60-69   (Concerning)
F   →  0-59    (Critical Issues)

// Industry benchmarks:
// - Production systems: B+ or higher (87+)
// - Healthcare/Finance: A- or higher (90+)
// - Public chatbots: B- or higher (80+)

Custom Weights

Enterprise tier can customize formula weights. Example: Healthcare customers often increase PII weight to 40% and decrease error rate to 20%.

Job Scheduler

AgentAIShield uses BullMQ for cron-based background jobs.

Job Queue Architecture

// BullMQ setup (simplified)
const { Queue, Worker } = require('bullmq');

// Define queues
const retentionQueue = new Queue('retention-cleanup', { connection: redis });
const aggregationQueue = new Queue('usage-aggregation', { connection: redis });
const emailQueue = new Queue('email-digests', { connection: redis });
const webhookQueue = new Queue('webhook-retries', { connection: redis });

// Add recurring jobs
await retentionQueue.add('cleanup', {}, {
  repeat: { cron: '0 2 * * *' } // Daily at 2 AM
});

await aggregationQueue.add('aggregate', {}, {
  repeat: { cron: '*/15 * * * *' } // Every 15 minutes
});

await emailQueue.add('daily-digest', {}, {
  repeat: { cron: '0 8 * * *' } // Daily at 8 AM
});

// Worker processes jobs
const retentionWorker = new Worker('retention-cleanup', async (job) => {
  console.log('Running retention cleanup...');
  const deleted = await deleteOldRecords();
  return { deleted };
}, { connection: redis });

Retention Cleanup Job

Automatically deletes old data based on tier limits:

// Retention policy by tier
const RETENTION_DAYS = {
  'free': 7,
  'startup': 90,
  'enterprise': 365
};

async function cleanupOldRecords() {
  const agents = await db.query('SELECT id, tier FROM agents');
  
  for (const agent of agents) {
    const retentionDays = RETENTION_DAYS[agent.tier];
    const cutoffDate = new Date();
    cutoffDate.setDate(cutoffDate.getDate() - retentionDays);
    
    // Delete old request logs
    const deleted = await db.query(`
      DELETE FROM request_logs 
      WHERE agent_id = $1 AND created_at < $2
    `, [agent.id, cutoffDate]);
    
    console.log(`Deleted ${deleted.rowCount} records for agent ${agent.id}`);
  }
  
  // Vacuum to reclaim space
  await db.query('VACUUM ANALYZE request_logs');
}

Usage Aggregation Job

Pre-compute daily/weekly/monthly statistics:

// Runs every 15 minutes
async function aggregateUsage() {
  const agents = await db.query('SELECT id FROM agents');
  
  for (const agent of agents) {
    const stats = await db.query(`
      SELECT 
        COUNT(*) as total_requests,
        SUM(CASE WHEN error = true THEN 1 ELSE 0 END) as errors,
        SUM(CASE WHEN pii_detected > 0 THEN 1 ELSE 0 END) as pii_incidents,
        SUM(CASE WHEN injection_detected = true THEN 1 ELSE 0 END) as injections,
        AVG(latency_ms) as avg_latency
      FROM request_logs
      WHERE agent_id = $1 
        AND created_at >= NOW() - INTERVAL '15 minutes'
    `, [agent.id]);
    
    // Store in aggregated table (fast queries)
    await db.query(`
      INSERT INTO usage_stats_15min (agent_id, timestamp, stats)
      VALUES ($1, NOW(), $2)
    `, [agent.id, JSON.stringify(stats.rows[0])]);
  }
}

Email Digest Generation

Send daily or weekly summaries to users:

async function generateEmailDigest() {
  const users = await db.query(`
    SELECT id, email, digest_frequency 
    FROM users 
    WHERE digest_enabled = true
  `);
  
  for (const user of users) {
    const agents = await getUserAgents(user.id);
    const period = user.digest_frequency === 'daily' ? '24 hours' : '7 days';
    
    let digestData = {
      user: user.email,
      period: period,
      agents: []
    };
    
    for (const agent of agents) {
      const stats = await getAgentStats(agent.id, period);
      digestData.agents.push({
        name: agent.name,
        requests: stats.total_requests,
        trust_score: stats.current_trust_score,
        top_threats: stats.threats,
        cost: stats.estimated_cost
      });
    }
    
    // Queue email for sending
    await emailQueue.add('send', {
      to: user.email,
      template: 'daily-digest',
      data: digestData
    });
  }
}

Webhook Retry Logic

Retry failed webhook deliveries with exponential backoff:

async function sendWebhook(webhookConfig, payload) {
  const maxRetries = 5;
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      const response = await fetch(webhookConfig.url, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'X-AAIS-Signature': generateHMAC(payload, webhookConfig.secret)
        },
        body: JSON.stringify(payload),
        timeout: 10000 // 10 second timeout
      });
      
      if (response.ok) {
        return { success: true, attempt };
      }
      
      // Queue retry
      await webhookQueue.add('retry', {
        webhook: webhookConfig,
        payload: payload,
        attempt: attempt + 1
      }, {
        delay: Math.pow(2, attempt) * 1000 // 1s, 2s, 4s, 8s, 16s
      });
      
      return { success: false, retry_scheduled: true };
      
    } catch (error) {
      attempt++;
      if (attempt >= maxRetries) {
        // Give up, log failure
        await db.query(`
          INSERT INTO webhook_failures (webhook_id, payload, error, attempts)
          VALUES ($1, $2, $3, $4)
        `, [webhookConfig.id, payload, error.message, maxRetries]);
        return { success: false, exhausted: true };
      }
    }
  }
}

Job Monitoring

All background jobs emit metrics (success rate, duration, failures). Monitor via the Admin Dashboard → Jobs tab. Failed jobs trigger alerts after 3 consecutive failures.

Performance Optimization

Strategies to handle high traffic and maintain low latency.

Caching Strategies (Redis)

Multiple cache layers reduce database load:

1. Session Cache

// User sessions cached for 24 hours
await redis.set(`session:${userId}`, JSON.stringify(sessionData), 'EX', 86400);

2. Agent Configuration Cache

// Agent settings cached for 5 minutes
const cacheKey = `agent:${agentId}:config`;
let config = await redis.get(cacheKey);

if (!config) {
  config = await db.query('SELECT * FROM agents WHERE id = $1', [agentId]);
  await redis.set(cacheKey, JSON.stringify(config), 'EX', 300);
}

3. Trust Score Cache

// Trust scores cached for 10 seconds (real-time feel, reduced DB load)
const scoreKey = `trustscore:${agentId}`;
let score = await redis.get(scoreKey);

if (!score) {
  score = await calculateTrustScore(agentId);
  await redis.set(scoreKey, score, 'EX', 10);
}

4. Rate Limit Cache

// Sliding window rate limiter (Redis sorted sets)
const key = `ratelimit:${apiKey}`;
const now = Date.now();
const windowMs = 3600000; // 1 hour

// Remove old entries
await redis.zremrangebyscore(key, 0, now - windowMs);

// Count requests in window
const count = await redis.zcard(key);

if (count >= limit) {
  throw new RateLimitError('Monthly limit exceeded');
}

// Add current request
await redis.zadd(key, now, `${now}:${uuidv4()}`);

Database Indexing

Critical indexes for fast queries:

-- Request logs table (largest table)
CREATE INDEX idx_request_logs_agent_created 
  ON request_logs(agent_id, created_at DESC);

CREATE INDEX idx_request_logs_threats 
  ON request_logs(agent_id) 
  WHERE injection_detected = true OR pii_detected > 0;

-- Agents table
CREATE INDEX idx_agents_user_id ON agents(user_id);
CREATE INDEX idx_agents_tier ON agents(tier);

-- Vector index for semantic search (pgvector)
CREATE INDEX idx_embeddings_vector 
  ON embeddings USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Partial indexes for common queries
CREATE INDEX idx_active_agents ON agents(id) WHERE status = 'active';

Horizontal Scaling with Load Balancers

Deploy multiple AAIS instances behind a load balancer:

// NGINX load balancer configuration
upstream aais_backend {
  least_conn; // Route to least busy server
  
  server aais-1.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
  server aais-2.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
  server aais-3.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
  
  keepalive 32; // Connection pooling
}

server {
  listen 443 ssl http2;
  server_name agentaishield.com/api;
  
  location / {
    proxy_pass http://aais_backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
    # Timeouts
    proxy_connect_timeout 10s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;
  }
}

CDN for Static Assets

Serve dashboard, training pages, and assets from CDN:

Cloudflare CDN: Automatic edge caching, DDoS protection
Cache-Control headers: Static assets cached 1 year, HTML 5 minutes
Brotli compression: Reduce bundle size by 70%
HTTP/2: Multiplexing, server push for critical resources

Connection Pooling

// PostgreSQL connection pool
const { Pool } = require('pg');

const pool = new Pool({
  host: process.env.DB_HOST,
  port: 5432,
  database: 'aais',
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20, // Max 20 connections per instance
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000
});

// Reuse connections
const client = await pool.connect();
try {
  const result = await client.query('SELECT * FROM agents WHERE id = $1', [agentId]);
  return result.rows[0];
} finally {
  client.release(); // Return to pool
}

Query Optimization

Common slow queries and their optimizations:

-- BEFORE: Slow full table scan
SELECT * FROM request_logs 
WHERE agent_id = 'abc123' 
ORDER BY created_at DESC 
LIMIT 100;
-- Execution time: 2,400ms

-- AFTER: Use covering index
SELECT request_id, created_at, threat_type, pii_count 
FROM request_logs 
WHERE agent_id = 'abc123' 
ORDER BY created_at DESC 
LIMIT 100;
-- Execution time: 8ms (300x faster)

-- Use EXPLAIN ANALYZE to profile queries
EXPLAIN ANALYZE SELECT ...;

Performance Metrics

After optimization: P50 latency 45ms, P95 120ms, P99 280ms. Database CPU usage reduced from 85% to 12%. Redis hit rate: 94%. Handles 5,000 req/sec per instance.

High Availability

Design for 99.9% uptime with redundancy and failover.

Multi-Region Deployments

Region Configuration:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Primary: us-east-1 (Virginia)
├─ 3 app servers (autoscaling 3-10)
├─ PostgreSQL primary (RDS Multi-AZ)
└─ Redis cluster (3 nodes)

Secondary: eu-west-1 (Ireland)
├─ 2 app servers (autoscaling 2-6)
├─ PostgreSQL read replica
└─ Redis cluster (3 nodes)

Tertiary: ap-southeast-1 (Singapore)
├─ 2 app servers (autoscaling 2-6)
├─ PostgreSQL read replica
└─ Redis cluster (3 nodes)

// Route53 latency-based routing
// Users automatically routed to nearest healthy region

Database Replication

PostgreSQL streaming replication for read scalability:

// Connection config with read replicas
const writePool = new Pool({ host: 'primary.db.internal' });
const readPool = new Pool({ 
  host: 'replica.db.internal',
  max: 50 // Read replicas can handle more connections
});

// Route queries appropriately
async function getAgent(agentId) {
  // Reads go to replica
  return await readPool.query('SELECT * FROM agents WHERE id = $1', [agentId]);
}

async function updateAgent(agentId, data) {
  // Writes go to primary
  return await writePool.query('UPDATE agents SET ... WHERE id = $1', [agentId, ...]);
}

Automatic Failover

When primary fails, promote replica automatically:

// AWS RDS Multi-AZ automatic failover
// - Detects primary failure within 60 seconds
// - Promotes standby to primary
// - Updates DNS to point to new primary
// - Total downtime: 60-120 seconds

// Application connection retry logic
async function executeQuery(query, params, maxRetries = 3) {
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      return await pool.query(query, params);
    } catch (error) {
      if (error.code === 'ECONNREFUSED' || error.code === '57P03') {
        // Connection refused or terminated - likely failover
        attempt++;
        await sleep(2000); // Wait for DNS propagation
        continue;
      }
      throw error;
    }
  }
}

Health Checks and Monitoring

Continuous health monitoring with alerts:

// Health check endpoint
app.get('/health', async (req, res) => {
  const checks = {
    database: false,
    redis: false,
    ml_model: false
  };
  
  try {
    // Database check
    await pool.query('SELECT 1');
    checks.database = true;
    
    // Redis check
    await redis.ping();
    checks.redis = true;
    
    // ML model check
    const testInput = "test";
    await classifyInjection(testInput);
    checks.ml_model = true;
    
    const allHealthy = Object.values(checks).every(v => v);
    res.status(allHealthy ? 200 : 503).json({
      status: allHealthy ? 'healthy' : 'degraded',
      checks: checks,
      timestamp: new Date().toISOString()
    });
    
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      checks: checks,
      error: error.message
    });
  }
});

// Load balancer polls /health every 10 seconds
// Remove unhealthy instances from rotation

Monitoring Stack

Datadog for metrics, PagerDuty for alerts, Sentry for error tracking. SLA: 99.9% uptime (8.76 hours downtime per year). Current uptime: 99.97% (2.6 hours downtime in 2025).

API Rate Limits

Per-tier limits prevent abuse and ensure fair usage.

Tier Limits

Rate Limits by Tier:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Free Tier:
├─ Monthly: 50,000 requests
├─ Burst: 100 requests/minute
├─ Max concurrent: 5
└─ Overage: Blocked

Starter Tier:
├─ Monthly: 500,000 requests
├─ Burst: 500 requests/minute
├─ Max concurrent: 20
└─ Overage: Soft limit (alert at 90%)

Enterprise Tier:
├─ Monthly: Unlimited
├─ Burst: 2,000 requests/minute
├─ Max concurrent: 100
└─ Overage: N/A (custom SLAs)

Burst Allowances

Token bucket algorithm for burst traffic:

// Token bucket implementation (simplified)
class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity; // Max tokens
    this.tokens = capacity;   // Current tokens
    this.refillRate = refillRate; // Tokens per second
    this.lastRefill = Date.now();
  }

  async consume(count = 1) {
    // Refill tokens based on time elapsed
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;

    // Check if enough tokens available
    if (this.tokens >= count) {
      this.tokens -= count;
      return true;
    }
    return false;
  }
}

// Example: Starter tier = 500 req/min = 8.33 req/sec
const bucket = new TokenBucket(500, 8.33);

Rate Limit Headers

Every API response includes rate limit info:

// Response headers
X-RateLimit-Limit: 50000          // Monthly limit
X-RateLimit-Remaining: 48234       // Requests left this month
X-RateLimit-Reset: 1709251200      // Unix timestamp (month reset)
X-RateLimit-Burst: 100             // Burst limit (per minute)
X-RateLimit-Burst-Remaining: 87    // Burst tokens remaining

// When limit exceeded (HTTP 429)
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Monthly request limit reached (50,000)",
    "reset_at": "2026-03-01T00:00:00Z",
    "upgrade_url": "https://agentaishield.com/pricing"
  }
}

Rate Limit Best Practices

Check X-RateLimit-Remaining header before bulk operations. Implement exponential backoff on 429 responses. Cache AAIS results where possible to reduce API calls.

Custom Deployment

For enterprises requiring on-premise or air-gapped deployments.

Docker Containers

# Dockerfile (simplified)
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Download ML models
RUN npm run download-models

EXPOSE 3000
CMD ["node", "server.js"]

# docker-compose.yml
version: '3.8'

services:
  aais-api:
    image: agentaishield/aais:latest
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/aais
      - REDIS_URL=redis://redis:6379
      - NODE_ENV=production
    depends_on:
      - postgres
      - redis

  postgres:
    image: pgvector/pgvector:pg15
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=aais
      - POSTGRES_USER=aais
      - POSTGRES_PASSWORD=${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

volumes:
  postgres-data:
  redis-data:

Kubernetes Manifests

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: aais-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: aais-api
  template:
    metadata:
      labels:
        app: aais-api
    spec:
      containers:
      - name: aais
        image: agentaishield/aais:latest
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: aais-secrets
              key: database-url
        - name: REDIS_URL
          value: redis://redis-service:6379
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: aais-service
spec:
  selector:
    app: aais-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: LoadBalancer

Terraform Modules

# main.tf (AWS deployment)
module "aais_vpc" {
  source = "terraform-aws-modules/vpc/aws"
  
  name = "aais-vpc"
  cidr = "10.0.0.0/16"
  
  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
  
  enable_nat_gateway = true
  single_nat_gateway = false
}

module "aais_rds" {
  source = "terraform-aws-modules/rds/aws"
  
  identifier = "aais-postgres"
  engine     = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.large"
  
  allocated_storage = 100
  storage_encrypted = true
  
  multi_az = true
  backup_retention_period = 7
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
}

module "aais_elasticache" {
  source = "terraform-aws-modules/elasticache/aws"
  
  cluster_id = "aais-redis"
  engine     = "redis"
  node_type  = "cache.t3.medium"
  num_cache_nodes = 3
  
  subnet_group_name = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]
}

module "aais_ecs" {
  source = "terraform-aws-modules/ecs/aws"
  
  cluster_name = "aais-cluster"
  
  fargate_capacity_providers = {
    FARGATE = {}
    FARGATE_SPOT = {}
  }
}

Environment Variable Configuration

# .env.production
NODE_ENV=production
PORT=3000

# Database
DATABASE_URL=postgresql://user:pass@host:5432/aais
DB_SSL=true
DB_POOL_MAX=20

# Redis
REDIS_URL=redis://host:6379
REDIS_PASSWORD=your-redis-password

# Security
JWT_SECRET=your-jwt-secret-key
SESSION_SECRET=your-session-secret

# ML Models (local paths for air-gapped)
PII_MODEL_PATH=/app/models/pii-ner.onnx
INJECTION_MODEL_PATH=/app/models/injection-classifier.onnx
EMBEDDING_MODEL_PATH=/app/models/embeddings.onnx

# External APIs (optional, can be disabled)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Monitoring (optional)
DATADOG_API_KEY=...
SENTRY_DSN=...

# Feature Flags
ENABLE_PROXY_MODE=true
ENABLE_TRUSTSHIELD=true
ENABLE_WEBHOOKS=true

On-Premise Support

Enterprise customers receive dedicated support for custom deployments including installation, configuration, and ongoing maintenance. Contact [email protected] for deployment guides.

Architecture Mastery Complete

You now understand how AgentAIShield works under the hood: from request routing and Trust Score calculations to job scheduling, performance optimization, and custom deployments. This knowledge enables you to:

Optimize AAIS for your specific workload and scale requirements
Deploy on-premise or in air-gapped environments
Troubleshoot performance bottlenecks and latency issues
Integrate AAIS deeply into your infrastructure
Architect high-availability systems with multi-region failover

Expert Training Complete!

You've completed all AgentAIShield training modules. You're now equipped to build, secure, monitor, and scale production AI agents with confidence. Ready to put it into practice?

Previous: Compliance & Enterprise Back to Training Hub