Level 4: Expert

Architecture & Scale

Deep dive into AgentAIShield's system architecture: request routing internals, Trust Score algorithm mathematics, job scheduler design, performance optimization strategies, high-availability patterns, API rate limits, and custom deployment configurations for on-premise or air-gapped environments.

75 minutes
8 sections
Expert

System Architecture Overview

AgentAIShield is built on a modern, scalable stack optimized for real-time AI traffic analysis.

Technology Stack

Database Abstraction Layer

AgentAIShield uses a clean database abstraction layer that allows seamless switching between SQLite (local development) and PostgreSQL (production).

// db/index.js - Unified interface class Database { constructor() { const dbType = process.env.DB_TYPE || 'sqlite'; if (dbType === 'postgres') { this.client = new PostgresClient(); } else { this.client = new SqliteClient(); } } async query(sql, params) { return await this.client.query(sql, params); } async all(sql, params) { return await this.client.all(sql, params); } async run(sql, params) { return await this.client.run(sql, params); } } // Same codebase works locally (SQLite) and in production (Postgres) const db = new Database(); const agents = await db.all('SELECT * FROM agents WHERE org_id = ?', [orgId]);

Benefits of Database Abstraction:

Automated Testing Framework

AgentAIShield implements a 4-layer automated testing pyramid to catch bugs before production:

Testing Pyramid (Monthly Cost: ~$2.50): ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Layer 1: Smoke Tests (Shell Scripts) ├─ Health checks every 5 minutes ├─ Server alive, database connected, static files served ├─ Cost: $0 (no AI tokens) └─ Script: tests/smoke.sh Layer 2: API Contract Tests (Node.js) ├─ Daily validation of all 51 critical API routes ├─ Status codes, response schemas, auth checks ├─ Cost: ~$0.02/day ($0.60/month) └─ Script: tests/api-contract.js Layer 3: User Flow Tests (Node.js) ├─ Daily end-to-end journeys (7 flows, 42 steps) ├─ Multi-step workflows: onboarding, incidents, policies ├─ Cost: ~$0.05/day ($1.50/month) └─ Script: tests/user-flows.js Layer 4: Visual/UI Tests (Browser) ├─ Weekly screenshot audits of all 44 pages ├─ Console error checks, dark theme validation ├─ Cost: ~$0.15/week ($0.60/month) └─ Script: tests/visual-qa.js

Automated Test Cron Jobs:

// Example: API contract test const tests = [ { name: 'POST /v1/monitor', method: 'POST', path: '/v1/monitor', body: { agent_id: 'test', messages: [...] }, expectedStatus: 200 }, { name: 'GET /v1/trust/agents', method: 'GET', path: '/v1/trust/agents', expectedStatus: 200 }, { name: 'POST /v1/trustshield/verify', method: 'POST', path: '/v1/trustshield/verify', body: { agent_id: 'test', response: '...' }, expectedStatus: 200 } ]; for (const test of tests) { const response = await fetch(`${API_BASE}${test.path}`, { method: test.method, headers: { 'X-API-Key': TEST_KEY }, body: test.body ? JSON.stringify(test.body) : undefined }); if (response.status !== test.expectedStatus) { console.error(`FAIL: ${test.name} — Expected ${test.expectedStatus}, got ${response.status}`); } else { console.log(`PASS: ${test.name}`); } }
Test Coverage

First run results: 60.8% API contract pass rate (20 failures due to rate limiting), 83% user flow coverage (35/42 steps passed). Comprehensive testing catches 95% of regressions before production deployment.

// Core dependency versions (package.json) { "dependencies": { "express": "^4.19.2", "pg": "^8.11.3", "pgvector": "^0.1.8", "socket.io": "^4.7.2", "redis": "^4.6.13", "bullmq": "^5.4.0", "onnxruntime-node": "^1.17.0", "openai": "^4.28.0", "anthropic": "^0.20.0" } }

High-Level Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐ │ CLIENT APPLICATIONS │ │ (SDKs, Proxy Mode, Direct API Integration, Dashboard WebUI) │ └────────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ LOAD BALANCER / CDN │ │ (NGINX, Cloudflare, AWS ALB, etc.) │ └────────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ AAIS API GATEWAY (Express) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Auth Layer │ │ Rate Limiter │ │ Request ID │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └────┬───────────────────────┬──────────────────────┬────────────┘ │ │ │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ PostgreSQL │ │ Redis Cache │ │ BullMQ Jobs │ │ (Data) │ │ (Sessions, │ │ (Background) │ │ pgvector │ │ Rate Limits) │ │ │ └──────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ └───────────────────────┴──────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ PROCESSING PIPELINE │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │ │ │ PII Detect │→ │ Injection │→ │ Behavioral │→ │ Logging │ │ │ │ (NER+Regex│ │ Classifier │ │ Fingerprint│ │ + Metrics│ │ │ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ LLM PROVIDER (Proxy Mode) │ │ OpenAI, Anthropic, Google, Mistral, Groq, etc. │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ OUTPUT PROCESSING │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ PII Filter │→ │ TrustShield│→ │ Policy │ │ │ │ (Output) │ │ Verify │ │ Enforcement│ │ │ └────────────┘ └────────────┘ └────────────┘ │ └────────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ WEBSOCKET SERVER (Real-time Updates) │ │ Dashboard, Alerts, Live Metrics, Trust Scores │ └─────────────────────────────────────────────────────────────────┘
Request Flow

Average request latency: 45ms (PII scan) + 12ms (injection detect) + LLM latency + 8ms (output filtering) = ~65ms overhead plus LLM time. P99 latency: 180ms overhead.

REST API Design

AgentAIShield exposes a RESTful API following best practices:

// Example API request POST /v1/analyze Authorization: Bearer sk_live_abc123 Content-Type: application/json X-Idempotency-Key: unique-request-id-123 { "agent_id": "my-chatbot", "messages": [ { "role": "user", "content": "Hello, what's my account balance?" } ], "options": { "pii_detection": true, "injection_detection": true, "output_filtering": true } } // Success response (HTTP 200) { "request_id": "req_xyz789", "agent_id": "my-chatbot", "threats_detected": 0, "pii_found": [], "trust_score": 92, "processing_time_ms": 67, "safe": true } // Error response (HTTP 429 - Rate Limit) { "error": { "type": "rate_limit_exceeded", "message": "Monthly request limit reached (50,000)", "request_id": "req_abc456", "docs": "https://docs.agentaishield.com/errors/rate-limit" } }

Proxy Gateway Internals

Proxy Mode intercepts LLM traffic, analyzes it in real-time, and forwards to the provider. Let's examine the request flow.

Request Routing Flow

1. Client → AAIS Proxy Gateway ├─ URL: https://agentaishield.com/api/v1/proxy/openai/chat/completions ├─ Headers: Authorization (AAIS key), X-Target-Provider (OpenAI key) └─ Body: Standard OpenAI request format 2. Authentication & Authorization ├─ Validate AAIS API key ├─ Check tier limits (Free: 50K/mo, Starter: 500K/mo) ├─ Verify X-Target-Provider key format └─ Rate limit check (Redis-backed sliding window) 3. Pre-Processing Pipeline ├─ Extract messages from request body ├─ Run PII detection (NER + Regex) ─────→ 15-25ms ├─ Run injection detection (ML classifier) ─→ 10-15ms ├─ Behavioral fingerprinting ─────────────→ 5ms └─ Decision: Block or Forward? 4. Provider Request (if allowed) ├─ Select provider endpoint (OpenAI, Anthropic, etc.) ├─ Transform request to provider format ├─ Add retry logic (exponential backoff) ├─ Forward with provider API key └─ Stream response back to client ─────→ Provider latency 5. Post-Processing Pipeline ├─ Capture full response body ├─ Run output PII filtering ───────────→ 8-12ms ├─ TrustShield verification (optional) → 120ms ├─ Policy enforcement checks ──────────→ 3ms └─ Log metrics (async, non-blocking) 6. Response → Client ├─ Stream final response ├─ Add custom headers (X-AAIS-Request-ID, X-Trust-Score) └─ WebSocket notification (live dashboard update)

Provider Abstraction Layer

AAIS supports 10+ LLM providers with a unified interface:

// Provider adapter pattern (simplified) class ProviderAdapter { constructor(provider, apiKey) { this.provider = provider; this.apiKey = apiKey; this.baseURL = this.getBaseURL(provider); } async chat(messages, options) { const request = this.transformRequest(messages, options); const response = await this.sendRequest(request); return this.transformResponse(response); } transformRequest(messages, options) { // Convert to provider-specific format switch (this.provider) { case 'openai': return { model: options.model, messages, ...options }; case 'anthropic': return { model: options.model, messages: this.convertToAnthropicFormat(messages), max_tokens: options.max_tokens }; case 'google': return this.convertToGeminiFormat(messages, options); // ... other providers } } async sendRequest(request) { return await fetch(this.baseURL, { method: 'POST', headers: this.getHeaders(), body: JSON.stringify(request) }); } } // Supported providers const PROVIDERS = { 'openai': { endpoint: 'https://api.openai.com/v1/chat/completions' }, 'anthropic': { endpoint: 'https://api.anthropic.com/v1/messages' }, 'google': { endpoint: 'https://generativelanguage.googleapis.com/v1beta/...' }, 'mistral': { endpoint: 'https://api.mistral.ai/v1/chat/completions' }, 'groq': { endpoint: 'https://api.groq.com/openai/v1/chat/completions' }, 'together': { endpoint: 'https://api.together.xyz/v1/chat/completions' }, 'anyscale': { endpoint: 'https://api.endpoints.anyscale.com/v1/chat/completions' } };

Latency Optimization

Multiple strategies minimize proxy overhead:

// Parallel processing example async function analyzeRequest(messages) { const [piiResults, injectionResults, behaviorResults] = await Promise.all([ detectPII(messages), // 15ms detectInjection(messages), // 12ms analyzeBehavior(messages) // 5ms ]); // Total time: ~15ms (longest task), not 32ms (sum) return { pii: piiResults, injection: injectionResults, behavior: behaviorResults }; }

Retry Logic with Exponential Backoff

When provider requests fail, AAIS retries intelligently:

async function sendWithRetry(request, maxRetries = 3) { let attempt = 0; let delay = 1000; // Start with 1 second while (attempt < maxRetries) { try { const response = await fetch(providerURL, request); if (response.status === 200) { return response; } // Retry on specific errors if ([429, 500, 502, 503, 504].includes(response.status)) { attempt++; if (attempt >= maxRetries) throw new Error('Max retries exceeded'); // Exponential backoff with jitter await sleep(delay + Math.random() * 1000); delay *= 2; // 1s → 2s → 4s continue; } // Don't retry on client errors (400, 401, 403) throw new Error(`HTTP ${response.status}`); } catch (error) { if (attempt >= maxRetries) throw error; attempt++; await sleep(delay + Math.random() * 1000); delay *= 2; } } }

Fallback Strategies

When primary providers are unavailable:

Failover Configuration

Configure fallback providers in Data Shield settings. Example: Primary = OpenAI GPT-4, Fallback = Anthropic Claude Sonnet. Average failover time: 2.5 seconds.

Trust Score Algorithm

Trust Scores (A+ to F) are calculated using a weighted formula updated in real-time.

Weighted Formula Components

Four metrics contribute to the overall Trust Score:

Trust Score = ( Error Rate × 0.30 + PII Exposure × 0.25 + Injection Attempts × 0.25 + Behavioral Consistency × 0.20 ) × 100 // Each component scored 0-100, then weighted // Final score: 0-100, mapped to letter grades

1. Error Rate (30% weight)

Percentage of requests resulting in LLM errors, timeouts, or refusals:

Error Rate Score = 100 - (errors / total_requests × 100) Examples: - 0 errors in 1000 requests → Score: 100 (perfect) - 5 errors in 1000 requests → Score: 99.5 - 50 errors in 1000 requests → Score: 95 - 200 errors in 1000 requests → Score: 80 (concerning)

2. PII Exposure (25% weight)

How often PII is detected in requests or responses:

PII Exposure Score = 100 - (pii_incidents / total_requests × 100 × severity_multiplier) Severity Multipliers: - Low (email, phone): 1.0x - Medium (name, address): 2.0x - High (SSN, credit card, medical): 5.0x Example: - 10 emails detected in 1000 requests → (10/1000 × 100 × 1.0) = 1.0 → Score: 99 - 2 SSNs detected in 1000 requests → (2/1000 × 100 × 5.0) = 1.0 → Score: 99

3. Injection Attempts (25% weight)

Frequency and severity of detected injection attacks:

Injection Score = 100 - (injections_detected / total_requests × 100 × confidence_factor) Confidence Weights: - Low confidence (0.3-0.5): 0.5x - Medium confidence (0.5-0.7): 1.0x - High confidence (0.7-0.9): 2.0x - Critical confidence (0.9-1.0): 5.0x Example: - 15 low-confidence injections → (15/1000 × 100 × 0.5) = 0.75 → Score: 99.25 - 3 critical injections → (3/1000 × 100 × 5.0) = 1.5 → Score: 98.5

4. Behavioral Consistency (20% weight)

How stable the agent's behavior is over time (drift detection):

Behavioral Score = 100 - (drift_events × drift_severity) Drift Severity: - Minor drift (latency change): 2 points - Moderate drift (error rate spike): 5 points - Major drift (topic shift): 10 points - Critical drift (system prompt compromise): 25 points Example: - 2 minor drift events → Score: 100 - (2 × 2) = 96 - 1 major drift event → Score: 100 - (1 × 10) = 90

Exponential Moving Average (EMA)

Scores are smoothed using EMA to prevent wild swings from single events:

// EMA formula with α (alpha) = 0.2 (configurable) EMA_new = α × current_score + (1 - α) × EMA_previous Example: - Previous Trust Score: 92 - Current raw score (after incident): 78 - EMA (α=0.2): 0.2 × 78 + 0.8 × 92 = 15.6 + 73.6 = 89.2 // Score gradually recovers as agent demonstrates good behavior // Prevents single false positive from tanking the grade

Real-Time Recalculation

Trust Scores update with configurable frequency:

// Recalculation trigger example async function onRequest(agentId, requestData, result) { // Update metrics await updateMetrics(agentId, { total_requests: 1, errors: result.error ? 1 : 0, pii_detected: result.pii_count, injections: result.injection_detected ? 1 : 0 }); // Trigger recalculation if critical event if (result.injection_confidence > 0.9 || result.pii_severity === 'high') { await recalculateTrustScore(agentId); await notifyWebSocket(agentId, 'trust_score_updated'); } }

Grade Thresholds (A+ to F)

Letter Grade Mapping: ━━━━━━━━━━━━━━━━━━━━━━━━━━ A+ → 98-100 (Exceptional) A → 93-97 (Excellent) A- → 90-92 (Very Good) B+ → 87-89 (Good) B → 83-86 (Above Average) B- → 80-82 (Average) C+ → 77-79 (Below Average) C → 73-76 (Needs Improvement) C- → 70-72 (Poor) D → 60-69 (Concerning) F → 0-59 (Critical Issues) // Industry benchmarks: // - Production systems: B+ or higher (87+) // - Healthcare/Finance: A- or higher (90+) // - Public chatbots: B- or higher (80+)
Custom Weights

Enterprise tier can customize formula weights. Example: Healthcare customers often increase PII weight to 40% and decrease error rate to 20%.

Job Scheduler

AgentAIShield uses BullMQ for cron-based background jobs.

Job Queue Architecture

// BullMQ setup (simplified) const { Queue, Worker } = require('bullmq'); // Define queues const retentionQueue = new Queue('retention-cleanup', { connection: redis }); const aggregationQueue = new Queue('usage-aggregation', { connection: redis }); const emailQueue = new Queue('email-digests', { connection: redis }); const webhookQueue = new Queue('webhook-retries', { connection: redis }); // Add recurring jobs await retentionQueue.add('cleanup', {}, { repeat: { cron: '0 2 * * *' } // Daily at 2 AM }); await aggregationQueue.add('aggregate', {}, { repeat: { cron: '*/15 * * * *' } // Every 15 minutes }); await emailQueue.add('daily-digest', {}, { repeat: { cron: '0 8 * * *' } // Daily at 8 AM }); // Worker processes jobs const retentionWorker = new Worker('retention-cleanup', async (job) => { console.log('Running retention cleanup...'); const deleted = await deleteOldRecords(); return { deleted }; }, { connection: redis });

Retention Cleanup Job

Automatically deletes old data based on tier limits:

// Retention policy by tier const RETENTION_DAYS = { 'free': 7, 'startup': 90, 'enterprise': 365 }; async function cleanupOldRecords() { const agents = await db.query('SELECT id, tier FROM agents'); for (const agent of agents) { const retentionDays = RETENTION_DAYS[agent.tier]; const cutoffDate = new Date(); cutoffDate.setDate(cutoffDate.getDate() - retentionDays); // Delete old request logs const deleted = await db.query(` DELETE FROM request_logs WHERE agent_id = $1 AND created_at < $2 `, [agent.id, cutoffDate]); console.log(`Deleted ${deleted.rowCount} records for agent ${agent.id}`); } // Vacuum to reclaim space await db.query('VACUUM ANALYZE request_logs'); }

Usage Aggregation Job

Pre-compute daily/weekly/monthly statistics:

// Runs every 15 minutes async function aggregateUsage() { const agents = await db.query('SELECT id FROM agents'); for (const agent of agents) { const stats = await db.query(` SELECT COUNT(*) as total_requests, SUM(CASE WHEN error = true THEN 1 ELSE 0 END) as errors, SUM(CASE WHEN pii_detected > 0 THEN 1 ELSE 0 END) as pii_incidents, SUM(CASE WHEN injection_detected = true THEN 1 ELSE 0 END) as injections, AVG(latency_ms) as avg_latency FROM request_logs WHERE agent_id = $1 AND created_at >= NOW() - INTERVAL '15 minutes' `, [agent.id]); // Store in aggregated table (fast queries) await db.query(` INSERT INTO usage_stats_15min (agent_id, timestamp, stats) VALUES ($1, NOW(), $2) `, [agent.id, JSON.stringify(stats.rows[0])]); } }

Email Digest Generation

Send daily or weekly summaries to users:

async function generateEmailDigest() { const users = await db.query(` SELECT id, email, digest_frequency FROM users WHERE digest_enabled = true `); for (const user of users) { const agents = await getUserAgents(user.id); const period = user.digest_frequency === 'daily' ? '24 hours' : '7 days'; let digestData = { user: user.email, period: period, agents: [] }; for (const agent of agents) { const stats = await getAgentStats(agent.id, period); digestData.agents.push({ name: agent.name, requests: stats.total_requests, trust_score: stats.current_trust_score, top_threats: stats.threats, cost: stats.estimated_cost }); } // Queue email for sending await emailQueue.add('send', { to: user.email, template: 'daily-digest', data: digestData }); } }

Webhook Retry Logic

Retry failed webhook deliveries with exponential backoff:

async function sendWebhook(webhookConfig, payload) { const maxRetries = 5; let attempt = 0; while (attempt < maxRetries) { try { const response = await fetch(webhookConfig.url, { method: 'POST', headers: { 'Content-Type': 'application/json', 'X-AAIS-Signature': generateHMAC(payload, webhookConfig.secret) }, body: JSON.stringify(payload), timeout: 10000 // 10 second timeout }); if (response.ok) { return { success: true, attempt }; } // Queue retry await webhookQueue.add('retry', { webhook: webhookConfig, payload: payload, attempt: attempt + 1 }, { delay: Math.pow(2, attempt) * 1000 // 1s, 2s, 4s, 8s, 16s }); return { success: false, retry_scheduled: true }; } catch (error) { attempt++; if (attempt >= maxRetries) { // Give up, log failure await db.query(` INSERT INTO webhook_failures (webhook_id, payload, error, attempts) VALUES ($1, $2, $3, $4) `, [webhookConfig.id, payload, error.message, maxRetries]); return { success: false, exhausted: true }; } } } }
Job Monitoring

All background jobs emit metrics (success rate, duration, failures). Monitor via the Admin Dashboard → Jobs tab. Failed jobs trigger alerts after 3 consecutive failures.

Performance Optimization

Strategies to handle high traffic and maintain low latency.

Caching Strategies (Redis)

Multiple cache layers reduce database load:

1. Session Cache

// User sessions cached for 24 hours await redis.set(`session:${userId}`, JSON.stringify(sessionData), 'EX', 86400);

2. Agent Configuration Cache

// Agent settings cached for 5 minutes const cacheKey = `agent:${agentId}:config`; let config = await redis.get(cacheKey); if (!config) { config = await db.query('SELECT * FROM agents WHERE id = $1', [agentId]); await redis.set(cacheKey, JSON.stringify(config), 'EX', 300); }

3. Trust Score Cache

// Trust scores cached for 10 seconds (real-time feel, reduced DB load) const scoreKey = `trustscore:${agentId}`; let score = await redis.get(scoreKey); if (!score) { score = await calculateTrustScore(agentId); await redis.set(scoreKey, score, 'EX', 10); }

4. Rate Limit Cache

// Sliding window rate limiter (Redis sorted sets) const key = `ratelimit:${apiKey}`; const now = Date.now(); const windowMs = 3600000; // 1 hour // Remove old entries await redis.zremrangebyscore(key, 0, now - windowMs); // Count requests in window const count = await redis.zcard(key); if (count >= limit) { throw new RateLimitError('Monthly limit exceeded'); } // Add current request await redis.zadd(key, now, `${now}:${uuidv4()}`);

Database Indexing

Critical indexes for fast queries:

-- Request logs table (largest table) CREATE INDEX idx_request_logs_agent_created ON request_logs(agent_id, created_at DESC); CREATE INDEX idx_request_logs_threats ON request_logs(agent_id) WHERE injection_detected = true OR pii_detected > 0; -- Agents table CREATE INDEX idx_agents_user_id ON agents(user_id); CREATE INDEX idx_agents_tier ON agents(tier); -- Vector index for semantic search (pgvector) CREATE INDEX idx_embeddings_vector ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Partial indexes for common queries CREATE INDEX idx_active_agents ON agents(id) WHERE status = 'active';

Horizontal Scaling with Load Balancers

Deploy multiple AAIS instances behind a load balancer:

// NGINX load balancer configuration upstream aais_backend { least_conn; // Route to least busy server server aais-1.internal:3000 weight=1 max_fails=3 fail_timeout=30s; server aais-2.internal:3000 weight=1 max_fails=3 fail_timeout=30s; server aais-3.internal:3000 weight=1 max_fails=3 fail_timeout=30s; keepalive 32; // Connection pooling } server { listen 443 ssl http2; server_name agentaishield.com/api; location / { proxy_pass http://aais_backend; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Timeouts proxy_connect_timeout 10s; proxy_send_timeout 60s; proxy_read_timeout 60s; } }

CDN for Static Assets

Serve dashboard, training pages, and assets from CDN:

Connection Pooling

// PostgreSQL connection pool const { Pool } = require('pg'); const pool = new Pool({ host: process.env.DB_HOST, port: 5432, database: 'aais', user: process.env.DB_USER, password: process.env.DB_PASSWORD, max: 20, // Max 20 connections per instance idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000 }); // Reuse connections const client = await pool.connect(); try { const result = await client.query('SELECT * FROM agents WHERE id = $1', [agentId]); return result.rows[0]; } finally { client.release(); // Return to pool }

Query Optimization

Common slow queries and their optimizations:

-- BEFORE: Slow full table scan SELECT * FROM request_logs WHERE agent_id = 'abc123' ORDER BY created_at DESC LIMIT 100; -- Execution time: 2,400ms -- AFTER: Use covering index SELECT request_id, created_at, threat_type, pii_count FROM request_logs WHERE agent_id = 'abc123' ORDER BY created_at DESC LIMIT 100; -- Execution time: 8ms (300x faster) -- Use EXPLAIN ANALYZE to profile queries EXPLAIN ANALYZE SELECT ...;
Performance Metrics

After optimization: P50 latency 45ms, P95 120ms, P99 280ms. Database CPU usage reduced from 85% to 12%. Redis hit rate: 94%. Handles 5,000 req/sec per instance.

High Availability

Design for 99.9% uptime with redundancy and failover.

Multi-Region Deployments

Region Configuration: ━━━━━━━━━━━━━━━━━━━━━━━━━━ Primary: us-east-1 (Virginia) ├─ 3 app servers (autoscaling 3-10) ├─ PostgreSQL primary (RDS Multi-AZ) └─ Redis cluster (3 nodes) Secondary: eu-west-1 (Ireland) ├─ 2 app servers (autoscaling 2-6) ├─ PostgreSQL read replica └─ Redis cluster (3 nodes) Tertiary: ap-southeast-1 (Singapore) ├─ 2 app servers (autoscaling 2-6) ├─ PostgreSQL read replica └─ Redis cluster (3 nodes) // Route53 latency-based routing // Users automatically routed to nearest healthy region

Database Replication

PostgreSQL streaming replication for read scalability:

// Connection config with read replicas const writePool = new Pool({ host: 'primary.db.internal' }); const readPool = new Pool({ host: 'replica.db.internal', max: 50 // Read replicas can handle more connections }); // Route queries appropriately async function getAgent(agentId) { // Reads go to replica return await readPool.query('SELECT * FROM agents WHERE id = $1', [agentId]); } async function updateAgent(agentId, data) { // Writes go to primary return await writePool.query('UPDATE agents SET ... WHERE id = $1', [agentId, ...]); }

Automatic Failover

When primary fails, promote replica automatically:

// AWS RDS Multi-AZ automatic failover // - Detects primary failure within 60 seconds // - Promotes standby to primary // - Updates DNS to point to new primary // - Total downtime: 60-120 seconds // Application connection retry logic async function executeQuery(query, params, maxRetries = 3) { let attempt = 0; while (attempt < maxRetries) { try { return await pool.query(query, params); } catch (error) { if (error.code === 'ECONNREFUSED' || error.code === '57P03') { // Connection refused or terminated - likely failover attempt++; await sleep(2000); // Wait for DNS propagation continue; } throw error; } } }

Health Checks and Monitoring

Continuous health monitoring with alerts:

// Health check endpoint app.get('/health', async (req, res) => { const checks = { database: false, redis: false, ml_model: false }; try { // Database check await pool.query('SELECT 1'); checks.database = true; // Redis check await redis.ping(); checks.redis = true; // ML model check const testInput = "test"; await classifyInjection(testInput); checks.ml_model = true; const allHealthy = Object.values(checks).every(v => v); res.status(allHealthy ? 200 : 503).json({ status: allHealthy ? 'healthy' : 'degraded', checks: checks, timestamp: new Date().toISOString() }); } catch (error) { res.status(503).json({ status: 'unhealthy', checks: checks, error: error.message }); } }); // Load balancer polls /health every 10 seconds // Remove unhealthy instances from rotation
Monitoring Stack

Datadog for metrics, PagerDuty for alerts, Sentry for error tracking. SLA: 99.9% uptime (8.76 hours downtime per year). Current uptime: 99.97% (2.6 hours downtime in 2025).

API Rate Limits

Per-tier limits prevent abuse and ensure fair usage.

Tier Limits

Rate Limits by Tier: ━━━━━━━━━━━━━━━━━━━━━━━━━━ Free Tier: ├─ Monthly: 50,000 requests ├─ Burst: 100 requests/minute ├─ Max concurrent: 5 └─ Overage: Blocked Starter Tier: ├─ Monthly: 500,000 requests ├─ Burst: 500 requests/minute ├─ Max concurrent: 20 └─ Overage: Soft limit (alert at 90%) Enterprise Tier: ├─ Monthly: Unlimited ├─ Burst: 2,000 requests/minute ├─ Max concurrent: 100 └─ Overage: N/A (custom SLAs)

Burst Allowances

Token bucket algorithm for burst traffic:

// Token bucket implementation (simplified) class TokenBucket { constructor(capacity, refillRate) { this.capacity = capacity; // Max tokens this.tokens = capacity; // Current tokens this.refillRate = refillRate; // Tokens per second this.lastRefill = Date.now(); } async consume(count = 1) { // Refill tokens based on time elapsed const now = Date.now(); const elapsed = (now - this.lastRefill) / 1000; this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate); this.lastRefill = now; // Check if enough tokens available if (this.tokens >= count) { this.tokens -= count; return true; } return false; } } // Example: Starter tier = 500 req/min = 8.33 req/sec const bucket = new TokenBucket(500, 8.33);

Rate Limit Headers

Every API response includes rate limit info:

// Response headers X-RateLimit-Limit: 50000 // Monthly limit X-RateLimit-Remaining: 48234 // Requests left this month X-RateLimit-Reset: 1709251200 // Unix timestamp (month reset) X-RateLimit-Burst: 100 // Burst limit (per minute) X-RateLimit-Burst-Remaining: 87 // Burst tokens remaining // When limit exceeded (HTTP 429) { "error": { "type": "rate_limit_exceeded", "message": "Monthly request limit reached (50,000)", "reset_at": "2026-03-01T00:00:00Z", "upgrade_url": "https://agentaishield.com/pricing" } }
Rate Limit Best Practices

Check X-RateLimit-Remaining header before bulk operations. Implement exponential backoff on 429 responses. Cache AAIS results where possible to reduce API calls.

Custom Deployment

For enterprises requiring on-premise or air-gapped deployments.

Docker Containers

# Dockerfile (simplified) FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . # Download ML models RUN npm run download-models EXPOSE 3000 CMD ["node", "server.js"] # docker-compose.yml version: '3.8' services: aais-api: image: agentaishield/aais:latest ports: - "3000:3000" environment: - DATABASE_URL=postgresql://user:pass@postgres:5432/aais - REDIS_URL=redis://redis:6379 - NODE_ENV=production depends_on: - postgres - redis postgres: image: pgvector/pgvector:pg15 volumes: - postgres-data:/var/lib/postgresql/data environment: - POSTGRES_DB=aais - POSTGRES_USER=aais - POSTGRES_PASSWORD=${DB_PASSWORD} redis: image: redis:7-alpine volumes: - redis-data:/data volumes: postgres-data: redis-data:

Kubernetes Manifests

# deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: aais-api spec: replicas: 3 selector: matchLabels: app: aais-api template: metadata: labels: app: aais-api spec: containers: - name: aais image: agentaishield/aais:latest ports: - containerPort: 3000 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: aais-secrets key: database-url - name: REDIS_URL value: redis://redis-service:6379 resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "2Gi" cpu: "2000m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 --- apiVersion: v1 kind: Service metadata: name: aais-service spec: selector: app: aais-api ports: - protocol: TCP port: 80 targetPort: 3000 type: LoadBalancer

Terraform Modules

# main.tf (AWS deployment) module "aais_vpc" { source = "terraform-aws-modules/vpc/aws" name = "aais-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = false } module "aais_rds" { source = "terraform-aws-modules/rds/aws" identifier = "aais-postgres" engine = "postgres" engine_version = "15.4" instance_class = "db.t3.large" allocated_storage = 100 storage_encrypted = true multi_az = true backup_retention_period = 7 vpc_security_group_ids = [aws_security_group.rds.id] db_subnet_group_name = aws_db_subnet_group.main.name } module "aais_elasticache" { source = "terraform-aws-modules/elasticache/aws" cluster_id = "aais-redis" engine = "redis" node_type = "cache.t3.medium" num_cache_nodes = 3 subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.redis.id] } module "aais_ecs" { source = "terraform-aws-modules/ecs/aws" cluster_name = "aais-cluster" fargate_capacity_providers = { FARGATE = {} FARGATE_SPOT = {} } }

Environment Variable Configuration

# .env.production NODE_ENV=production PORT=3000 # Database DATABASE_URL=postgresql://user:pass@host:5432/aais DB_SSL=true DB_POOL_MAX=20 # Redis REDIS_URL=redis://host:6379 REDIS_PASSWORD=your-redis-password # Security JWT_SECRET=your-jwt-secret-key SESSION_SECRET=your-session-secret # ML Models (local paths for air-gapped) PII_MODEL_PATH=/app/models/pii-ner.onnx INJECTION_MODEL_PATH=/app/models/injection-classifier.onnx EMBEDDING_MODEL_PATH=/app/models/embeddings.onnx # External APIs (optional, can be disabled) OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... # Monitoring (optional) DATADOG_API_KEY=... SENTRY_DSN=... # Feature Flags ENABLE_PROXY_MODE=true ENABLE_TRUSTSHIELD=true ENABLE_WEBHOOKS=true
On-Premise Support

Enterprise customers receive dedicated support for custom deployments including installation, configuration, and ongoing maintenance. Contact [email protected] for deployment guides.

Architecture Mastery Complete

You now understand how AgentAIShield works under the hood: from request routing and Trust Score calculations to job scheduling, performance optimization, and custom deployments. This knowledge enables you to:

Expert Training Complete!

You've completed all AgentAIShield training modules. You're now equipped to build, secure, monitor, and scale production AI agents with confidence. Ready to put it into practice?

Previous: Compliance & Enterprise Back to Training Hub
Last verified: March 2026 · Report an issue