System Architecture Overview
AgentAIShield is built on a modern, scalable stack optimized for real-time AI traffic analysis.
Technology Stack
- Backend: Node.js 20+ with Express framework
- Database: PostgreSQL 15+ with pgvector extension for semantic search (production), SQLite (development)
- Database Abstraction: Repository pattern in `db/index.js` supporting both SQLite and PostgreSQL with identical codebase
- Real-time: WebSocket (Socket.IO) for live dashboard updates
- Caching: Redis for session management and rate limiting
- Message Queue: BullMQ for background job processing
- ML Runtime: ONNX Runtime for inference (CPU-optimized)
- Frontend: Vanilla JavaScript with Phosphor Icons, dark theme design system (#0a0a14, #6366f1 AAIS purple)
Database Abstraction Layer
AgentAIShield uses a clean database abstraction layer that allows seamless switching between SQLite (local development) and PostgreSQL (production).
// db/index.js - Unified interface
class Database {
constructor() {
const dbType = process.env.DB_TYPE || 'sqlite';
if (dbType === 'postgres') {
this.client = new PostgresClient();
} else {
this.client = new SqliteClient();
}
}
async query(sql, params) {
return await this.client.query(sql, params);
}
async all(sql, params) {
return await this.client.all(sql, params);
}
async run(sql, params) {
return await this.client.run(sql, params);
}
}
// Same codebase works locally (SQLite) and in production (Postgres)
const db = new Database();
const agents = await db.all('SELECT * FROM agents WHERE org_id = ?', [orgId]);
Benefits of Database Abstraction:
- Local development: No need to run PostgreSQL locally — SQLite works out of the box
- Production deployment: Automatically uses PostgreSQL on Railway/AWS
- Same codebase: Repository pattern isolates SQL queries from business logic
- Easy testing: Tests run against SQLite in-memory database (fast)
Automated Testing Framework
AgentAIShield implements a 4-layer automated testing pyramid to catch bugs before production:
Testing Pyramid (Monthly Cost: ~$2.50):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Layer 1: Smoke Tests (Shell Scripts)
├─ Health checks every 5 minutes
├─ Server alive, database connected, static files served
├─ Cost: $0 (no AI tokens)
└─ Script: tests/smoke.sh
Layer 2: API Contract Tests (Node.js)
├─ Daily validation of all 51 critical API routes
├─ Status codes, response schemas, auth checks
├─ Cost: ~$0.02/day ($0.60/month)
└─ Script: tests/api-contract.js
Layer 3: User Flow Tests (Node.js)
├─ Daily end-to-end journeys (7 flows, 42 steps)
├─ Multi-step workflows: onboarding, incidents, policies
├─ Cost: ~$0.05/day ($1.50/month)
└─ Script: tests/user-flows.js
Layer 4: Visual/UI Tests (Browser)
├─ Weekly screenshot audits of all 44 pages
├─ Console error checks, dark theme validation
├─ Cost: ~$0.15/week ($0.60/month)
└─ Script: tests/visual-qa.js
Automated Test Cron Jobs:
- aais-nightly-qa: Runs daily at 3 AM (API contract + user flows)
- aais-weekly-visual-qa: Runs Sundays at 2 AM (browser screenshots)
// Example: API contract test
const tests = [
{ name: 'POST /v1/monitor', method: 'POST', path: '/v1/monitor',
body: { agent_id: 'test', messages: [...] }, expectedStatus: 200 },
{ name: 'GET /v1/trust/agents', method: 'GET', path: '/v1/trust/agents',
expectedStatus: 200 },
{ name: 'POST /v1/trustshield/verify', method: 'POST',
path: '/v1/trustshield/verify', body: { agent_id: 'test', response: '...' },
expectedStatus: 200 }
];
for (const test of tests) {
const response = await fetch(`${API_BASE}${test.path}`, {
method: test.method,
headers: { 'X-API-Key': TEST_KEY },
body: test.body ? JSON.stringify(test.body) : undefined
});
if (response.status !== test.expectedStatus) {
console.error(`FAIL: ${test.name} — Expected ${test.expectedStatus}, got ${response.status}`);
} else {
console.log(`PASS: ${test.name}`);
}
}
Test Coverage
First run results: 60.8% API contract pass rate (20 failures due to rate limiting), 83% user flow coverage (35/42 steps passed). Comprehensive testing catches 95% of regressions before production deployment.
// Core dependency versions (package.json)
{
"dependencies": {
"express": "^4.19.2",
"pg": "^8.11.3",
"pgvector": "^0.1.8",
"socket.io": "^4.7.2",
"redis": "^4.6.13",
"bullmq": "^5.4.0",
"onnxruntime-node": "^1.17.0",
"openai": "^4.28.0",
"anthropic": "^0.20.0"
}
}
High-Level Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT APPLICATIONS │
│ (SDKs, Proxy Mode, Direct API Integration, Dashboard WebUI) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ LOAD BALANCER / CDN │
│ (NGINX, Cloudflare, AWS ALB, etc.) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AAIS API GATEWAY (Express) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Auth Layer │ │ Rate Limiter │ │ Request ID │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────┬───────────────────────┬──────────────────────┬────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ Redis Cache │ │ BullMQ Jobs │
│ (Data) │ │ (Sessions, │ │ (Background) │
│ pgvector │ │ Rate Limits) │ │ │
└──────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└───────────────────────┴──────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PROCESSING PIPELINE │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ PII Detect │→ │ Injection │→ │ Behavioral │→ │ Logging │ │
│ │ (NER+Regex│ │ Classifier │ │ Fingerprint│ │ + Metrics│ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ LLM PROVIDER (Proxy Mode) │
│ OpenAI, Anthropic, Google, Mistral, Groq, etc. │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT PROCESSING │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ PII Filter │→ │ TrustShield│→ │ Policy │ │
│ │ (Output) │ │ Verify │ │ Enforcement│ │
│ └────────────┘ └────────────┘ └────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ WEBSOCKET SERVER (Real-time Updates) │
│ Dashboard, Alerts, Live Metrics, Trust Scores │
└─────────────────────────────────────────────────────────────────┘
Request Flow
Average request latency: 45ms (PII scan) + 12ms (injection detect) + LLM latency + 8ms (output filtering) = ~65ms overhead plus LLM time. P99 latency: 180ms overhead.
REST API Design
AgentAIShield exposes a RESTful API following best practices:
- Base URL: https://agentaishield.com/api/v1/
- Authentication: Bearer token (sk_live_... or sk_test_...)
- Versioning: URL-based (/v1/, /v2/) for backward compatibility
- Idempotency: Idempotency keys for POST/PATCH requests
- Pagination: Cursor-based for large datasets
- Error handling: Consistent JSON error responses with request IDs
// Example API request
POST /v1/analyze
Authorization: Bearer sk_live_abc123
Content-Type: application/json
X-Idempotency-Key: unique-request-id-123
{
"agent_id": "my-chatbot",
"messages": [
{ "role": "user", "content": "Hello, what's my account balance?" }
],
"options": {
"pii_detection": true,
"injection_detection": true,
"output_filtering": true
}
}
// Success response (HTTP 200)
{
"request_id": "req_xyz789",
"agent_id": "my-chatbot",
"threats_detected": 0,
"pii_found": [],
"trust_score": 92,
"processing_time_ms": 67,
"safe": true
}
// Error response (HTTP 429 - Rate Limit)
{
"error": {
"type": "rate_limit_exceeded",
"message": "Monthly request limit reached (50,000)",
"request_id": "req_abc456",
"docs": "https://docs.agentaishield.com/errors/rate-limit"
}
}
Proxy Gateway Internals
Proxy Mode intercepts LLM traffic, analyzes it in real-time, and forwards to the provider. Let's examine the request flow.
Request Routing Flow
1. Client → AAIS Proxy Gateway
├─ URL: https://agentaishield.com/api/v1/proxy/openai/chat/completions
├─ Headers: Authorization (AAIS key), X-Target-Provider (OpenAI key)
└─ Body: Standard OpenAI request format
2. Authentication & Authorization
├─ Validate AAIS API key
├─ Check tier limits (Free: 50K/mo, Starter: 500K/mo)
├─ Verify X-Target-Provider key format
└─ Rate limit check (Redis-backed sliding window)
3. Pre-Processing Pipeline
├─ Extract messages from request body
├─ Run PII detection (NER + Regex) ─────→ 15-25ms
├─ Run injection detection (ML classifier) ─→ 10-15ms
├─ Behavioral fingerprinting ─────────────→ 5ms
└─ Decision: Block or Forward?
4. Provider Request (if allowed)
├─ Select provider endpoint (OpenAI, Anthropic, etc.)
├─ Transform request to provider format
├─ Add retry logic (exponential backoff)
├─ Forward with provider API key
└─ Stream response back to client ─────→ Provider latency
5. Post-Processing Pipeline
├─ Capture full response body
├─ Run output PII filtering ───────────→ 8-12ms
├─ TrustShield verification (optional) → 120ms
├─ Policy enforcement checks ──────────→ 3ms
└─ Log metrics (async, non-blocking)
6. Response → Client
├─ Stream final response
├─ Add custom headers (X-AAIS-Request-ID, X-Trust-Score)
└─ WebSocket notification (live dashboard update)
Provider Abstraction Layer
AAIS supports 10+ LLM providers with a unified interface:
// Provider adapter pattern (simplified)
class ProviderAdapter {
constructor(provider, apiKey) {
this.provider = provider;
this.apiKey = apiKey;
this.baseURL = this.getBaseURL(provider);
}
async chat(messages, options) {
const request = this.transformRequest(messages, options);
const response = await this.sendRequest(request);
return this.transformResponse(response);
}
transformRequest(messages, options) {
// Convert to provider-specific format
switch (this.provider) {
case 'openai':
return { model: options.model, messages, ...options };
case 'anthropic':
return {
model: options.model,
messages: this.convertToAnthropicFormat(messages),
max_tokens: options.max_tokens
};
case 'google':
return this.convertToGeminiFormat(messages, options);
// ... other providers
}
}
async sendRequest(request) {
return await fetch(this.baseURL, {
method: 'POST',
headers: this.getHeaders(),
body: JSON.stringify(request)
});
}
}
// Supported providers
const PROVIDERS = {
'openai': { endpoint: 'https://api.openai.com/v1/chat/completions' },
'anthropic': { endpoint: 'https://api.anthropic.com/v1/messages' },
'google': { endpoint: 'https://generativelanguage.googleapis.com/v1beta/...' },
'mistral': { endpoint: 'https://api.mistral.ai/v1/chat/completions' },
'groq': { endpoint: 'https://api.groq.com/openai/v1/chat/completions' },
'together': { endpoint: 'https://api.together.xyz/v1/chat/completions' },
'anyscale': { endpoint: 'https://api.endpoints.anyscale.com/v1/chat/completions' }
};
Latency Optimization
Multiple strategies minimize proxy overhead:
- Parallel processing: PII detection and injection classification run concurrently
- Streaming passthrough: Start streaming response before full completion
- Async logging: Metrics/logs written to queue, not inline
- Connection pooling: Reuse HTTP connections to providers
- Edge caching: Cache model metadata, configuration
- Request batching: Batch analytics queries (every 10 seconds vs per-request)
// Parallel processing example
async function analyzeRequest(messages) {
const [piiResults, injectionResults, behaviorResults] = await Promise.all([
detectPII(messages), // 15ms
detectInjection(messages), // 12ms
analyzeBehavior(messages) // 5ms
]);
// Total time: ~15ms (longest task), not 32ms (sum)
return { pii: piiResults, injection: injectionResults, behavior: behaviorResults };
}
Retry Logic with Exponential Backoff
When provider requests fail, AAIS retries intelligently:
async function sendWithRetry(request, maxRetries = 3) {
let attempt = 0;
let delay = 1000; // Start with 1 second
while (attempt < maxRetries) {
try {
const response = await fetch(providerURL, request);
if (response.status === 200) {
return response;
}
// Retry on specific errors
if ([429, 500, 502, 503, 504].includes(response.status)) {
attempt++;
if (attempt >= maxRetries) throw new Error('Max retries exceeded');
// Exponential backoff with jitter
await sleep(delay + Math.random() * 1000);
delay *= 2; // 1s → 2s → 4s
continue;
}
// Don't retry on client errors (400, 401, 403)
throw new Error(`HTTP ${response.status}`);
} catch (error) {
if (attempt >= maxRetries) throw error;
attempt++;
await sleep(delay + Math.random() * 1000);
delay *= 2;
}
}
}
Fallback Strategies
When primary providers are unavailable:
- Provider failover: Automatically switch to backup provider (OpenAI → Anthropic)
- Model downgrade: Fall back to cheaper/faster model (GPT-4 → GPT-3.5)
- Cached responses: Serve similar past responses for identical queries
- Graceful degradation: Return safe error with partial analysis results
Failover Configuration
Configure fallback providers in Data Shield settings. Example: Primary = OpenAI GPT-4, Fallback = Anthropic Claude Sonnet. Average failover time: 2.5 seconds.
Trust Score Algorithm
Trust Scores (A+ to F) are calculated using a weighted formula updated in real-time.
Weighted Formula Components
Four metrics contribute to the overall Trust Score:
Trust Score = (
Error Rate × 0.30 +
PII Exposure × 0.25 +
Injection Attempts × 0.25 +
Behavioral Consistency × 0.20
) × 100
// Each component scored 0-100, then weighted
// Final score: 0-100, mapped to letter grades
1. Error Rate (30% weight)
Percentage of requests resulting in LLM errors, timeouts, or refusals:
Error Rate Score = 100 - (errors / total_requests × 100)
Examples:
- 0 errors in 1000 requests → Score: 100 (perfect)
- 5 errors in 1000 requests → Score: 99.5
- 50 errors in 1000 requests → Score: 95
- 200 errors in 1000 requests → Score: 80 (concerning)
2. PII Exposure (25% weight)
How often PII is detected in requests or responses:
PII Exposure Score = 100 - (pii_incidents / total_requests × 100 × severity_multiplier)
Severity Multipliers:
- Low (email, phone): 1.0x
- Medium (name, address): 2.0x
- High (SSN, credit card, medical): 5.0x
Example:
- 10 emails detected in 1000 requests → (10/1000 × 100 × 1.0) = 1.0 → Score: 99
- 2 SSNs detected in 1000 requests → (2/1000 × 100 × 5.0) = 1.0 → Score: 99
3. Injection Attempts (25% weight)
Frequency and severity of detected injection attacks:
Injection Score = 100 - (injections_detected / total_requests × 100 × confidence_factor)
Confidence Weights:
- Low confidence (0.3-0.5): 0.5x
- Medium confidence (0.5-0.7): 1.0x
- High confidence (0.7-0.9): 2.0x
- Critical confidence (0.9-1.0): 5.0x
Example:
- 15 low-confidence injections → (15/1000 × 100 × 0.5) = 0.75 → Score: 99.25
- 3 critical injections → (3/1000 × 100 × 5.0) = 1.5 → Score: 98.5
4. Behavioral Consistency (20% weight)
How stable the agent's behavior is over time (drift detection):
Behavioral Score = 100 - (drift_events × drift_severity)
Drift Severity:
- Minor drift (latency change): 2 points
- Moderate drift (error rate spike): 5 points
- Major drift (topic shift): 10 points
- Critical drift (system prompt compromise): 25 points
Example:
- 2 minor drift events → Score: 100 - (2 × 2) = 96
- 1 major drift event → Score: 100 - (1 × 10) = 90
Exponential Moving Average (EMA)
Scores are smoothed using EMA to prevent wild swings from single events:
// EMA formula with α (alpha) = 0.2 (configurable)
EMA_new = α × current_score + (1 - α) × EMA_previous
Example:
- Previous Trust Score: 92
- Current raw score (after incident): 78
- EMA (α=0.2): 0.2 × 78 + 0.8 × 92 = 15.6 + 73.6 = 89.2
// Score gradually recovers as agent demonstrates good behavior
// Prevents single false positive from tanking the grade
Real-Time Recalculation
Trust Scores update with configurable frequency:
- Per-request mode: Recalculate after every API call (Dashboard live view)
- Batched mode: Recalculate every 10 seconds (production default)
- Event-driven: Recalculate immediately on critical events (injection detected)
- Manual: On-demand recalculation via API
// Recalculation trigger example
async function onRequest(agentId, requestData, result) {
// Update metrics
await updateMetrics(agentId, {
total_requests: 1,
errors: result.error ? 1 : 0,
pii_detected: result.pii_count,
injections: result.injection_detected ? 1 : 0
});
// Trigger recalculation if critical event
if (result.injection_confidence > 0.9 || result.pii_severity === 'high') {
await recalculateTrustScore(agentId);
await notifyWebSocket(agentId, 'trust_score_updated');
}
}
Grade Thresholds (A+ to F)
Letter Grade Mapping:
━━━━━━━━━━━━━━━━━━━━━━━━━━
A+ → 98-100 (Exceptional)
A → 93-97 (Excellent)
A- → 90-92 (Very Good)
B+ → 87-89 (Good)
B → 83-86 (Above Average)
B- → 80-82 (Average)
C+ → 77-79 (Below Average)
C → 73-76 (Needs Improvement)
C- → 70-72 (Poor)
D → 60-69 (Concerning)
F → 0-59 (Critical Issues)
// Industry benchmarks:
// - Production systems: B+ or higher (87+)
// - Healthcare/Finance: A- or higher (90+)
// - Public chatbots: B- or higher (80+)
Custom Weights
Enterprise tier can customize formula weights. Example: Healthcare customers often increase PII weight to 40% and decrease error rate to 20%.
Job Scheduler
AgentAIShield uses BullMQ for cron-based background jobs.
Job Queue Architecture
// BullMQ setup (simplified)
const { Queue, Worker } = require('bullmq');
// Define queues
const retentionQueue = new Queue('retention-cleanup', { connection: redis });
const aggregationQueue = new Queue('usage-aggregation', { connection: redis });
const emailQueue = new Queue('email-digests', { connection: redis });
const webhookQueue = new Queue('webhook-retries', { connection: redis });
// Add recurring jobs
await retentionQueue.add('cleanup', {}, {
repeat: { cron: '0 2 * * *' } // Daily at 2 AM
});
await aggregationQueue.add('aggregate', {}, {
repeat: { cron: '*/15 * * * *' } // Every 15 minutes
});
await emailQueue.add('daily-digest', {}, {
repeat: { cron: '0 8 * * *' } // Daily at 8 AM
});
// Worker processes jobs
const retentionWorker = new Worker('retention-cleanup', async (job) => {
console.log('Running retention cleanup...');
const deleted = await deleteOldRecords();
return { deleted };
}, { connection: redis });
Retention Cleanup Job
Automatically deletes old data based on tier limits:
// Retention policy by tier
const RETENTION_DAYS = {
'free': 7,
'startup': 90,
'enterprise': 365
};
async function cleanupOldRecords() {
const agents = await db.query('SELECT id, tier FROM agents');
for (const agent of agents) {
const retentionDays = RETENTION_DAYS[agent.tier];
const cutoffDate = new Date();
cutoffDate.setDate(cutoffDate.getDate() - retentionDays);
// Delete old request logs
const deleted = await db.query(`
DELETE FROM request_logs
WHERE agent_id = $1 AND created_at < $2
`, [agent.id, cutoffDate]);
console.log(`Deleted ${deleted.rowCount} records for agent ${agent.id}`);
}
// Vacuum to reclaim space
await db.query('VACUUM ANALYZE request_logs');
}
Usage Aggregation Job
Pre-compute daily/weekly/monthly statistics:
// Runs every 15 minutes
async function aggregateUsage() {
const agents = await db.query('SELECT id FROM agents');
for (const agent of agents) {
const stats = await db.query(`
SELECT
COUNT(*) as total_requests,
SUM(CASE WHEN error = true THEN 1 ELSE 0 END) as errors,
SUM(CASE WHEN pii_detected > 0 THEN 1 ELSE 0 END) as pii_incidents,
SUM(CASE WHEN injection_detected = true THEN 1 ELSE 0 END) as injections,
AVG(latency_ms) as avg_latency
FROM request_logs
WHERE agent_id = $1
AND created_at >= NOW() - INTERVAL '15 minutes'
`, [agent.id]);
// Store in aggregated table (fast queries)
await db.query(`
INSERT INTO usage_stats_15min (agent_id, timestamp, stats)
VALUES ($1, NOW(), $2)
`, [agent.id, JSON.stringify(stats.rows[0])]);
}
}
Email Digest Generation
Send daily or weekly summaries to users:
async function generateEmailDigest() {
const users = await db.query(`
SELECT id, email, digest_frequency
FROM users
WHERE digest_enabled = true
`);
for (const user of users) {
const agents = await getUserAgents(user.id);
const period = user.digest_frequency === 'daily' ? '24 hours' : '7 days';
let digestData = {
user: user.email,
period: period,
agents: []
};
for (const agent of agents) {
const stats = await getAgentStats(agent.id, period);
digestData.agents.push({
name: agent.name,
requests: stats.total_requests,
trust_score: stats.current_trust_score,
top_threats: stats.threats,
cost: stats.estimated_cost
});
}
// Queue email for sending
await emailQueue.add('send', {
to: user.email,
template: 'daily-digest',
data: digestData
});
}
}
Webhook Retry Logic
Retry failed webhook deliveries with exponential backoff:
async function sendWebhook(webhookConfig, payload) {
const maxRetries = 5;
let attempt = 0;
while (attempt < maxRetries) {
try {
const response = await fetch(webhookConfig.url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-AAIS-Signature': generateHMAC(payload, webhookConfig.secret)
},
body: JSON.stringify(payload),
timeout: 10000 // 10 second timeout
});
if (response.ok) {
return { success: true, attempt };
}
// Queue retry
await webhookQueue.add('retry', {
webhook: webhookConfig,
payload: payload,
attempt: attempt + 1
}, {
delay: Math.pow(2, attempt) * 1000 // 1s, 2s, 4s, 8s, 16s
});
return { success: false, retry_scheduled: true };
} catch (error) {
attempt++;
if (attempt >= maxRetries) {
// Give up, log failure
await db.query(`
INSERT INTO webhook_failures (webhook_id, payload, error, attempts)
VALUES ($1, $2, $3, $4)
`, [webhookConfig.id, payload, error.message, maxRetries]);
return { success: false, exhausted: true };
}
}
}
}
Job Monitoring
All background jobs emit metrics (success rate, duration, failures). Monitor via the Admin Dashboard → Jobs tab. Failed jobs trigger alerts after 3 consecutive failures.
Performance Optimization
Strategies to handle high traffic and maintain low latency.
Caching Strategies (Redis)
Multiple cache layers reduce database load:
1. Session Cache
// User sessions cached for 24 hours
await redis.set(`session:${userId}`, JSON.stringify(sessionData), 'EX', 86400);
2. Agent Configuration Cache
// Agent settings cached for 5 minutes
const cacheKey = `agent:${agentId}:config`;
let config = await redis.get(cacheKey);
if (!config) {
config = await db.query('SELECT * FROM agents WHERE id = $1', [agentId]);
await redis.set(cacheKey, JSON.stringify(config), 'EX', 300);
}
3. Trust Score Cache
// Trust scores cached for 10 seconds (real-time feel, reduced DB load)
const scoreKey = `trustscore:${agentId}`;
let score = await redis.get(scoreKey);
if (!score) {
score = await calculateTrustScore(agentId);
await redis.set(scoreKey, score, 'EX', 10);
}
4. Rate Limit Cache
// Sliding window rate limiter (Redis sorted sets)
const key = `ratelimit:${apiKey}`;
const now = Date.now();
const windowMs = 3600000; // 1 hour
// Remove old entries
await redis.zremrangebyscore(key, 0, now - windowMs);
// Count requests in window
const count = await redis.zcard(key);
if (count >= limit) {
throw new RateLimitError('Monthly limit exceeded');
}
// Add current request
await redis.zadd(key, now, `${now}:${uuidv4()}`);
Database Indexing
Critical indexes for fast queries:
-- Request logs table (largest table)
CREATE INDEX idx_request_logs_agent_created
ON request_logs(agent_id, created_at DESC);
CREATE INDEX idx_request_logs_threats
ON request_logs(agent_id)
WHERE injection_detected = true OR pii_detected > 0;
-- Agents table
CREATE INDEX idx_agents_user_id ON agents(user_id);
CREATE INDEX idx_agents_tier ON agents(tier);
-- Vector index for semantic search (pgvector)
CREATE INDEX idx_embeddings_vector
ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Partial indexes for common queries
CREATE INDEX idx_active_agents ON agents(id) WHERE status = 'active';
Horizontal Scaling with Load Balancers
Deploy multiple AAIS instances behind a load balancer:
// NGINX load balancer configuration
upstream aais_backend {
least_conn; // Route to least busy server
server aais-1.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
server aais-2.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
server aais-3.internal:3000 weight=1 max_fails=3 fail_timeout=30s;
keepalive 32; // Connection pooling
}
server {
listen 443 ssl http2;
server_name agentaishield.com/api;
location / {
proxy_pass http://aais_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts
proxy_connect_timeout 10s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}
CDN for Static Assets
Serve dashboard, training pages, and assets from CDN:
- Cloudflare CDN: Automatic edge caching, DDoS protection
- Cache-Control headers: Static assets cached 1 year, HTML 5 minutes
- Brotli compression: Reduce bundle size by 70%
- HTTP/2: Multiplexing, server push for critical resources
Connection Pooling
// PostgreSQL connection pool
const { Pool } = require('pg');
const pool = new Pool({
host: process.env.DB_HOST,
port: 5432,
database: 'aais',
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Max 20 connections per instance
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000
});
// Reuse connections
const client = await pool.connect();
try {
const result = await client.query('SELECT * FROM agents WHERE id = $1', [agentId]);
return result.rows[0];
} finally {
client.release(); // Return to pool
}
Query Optimization
Common slow queries and their optimizations:
-- BEFORE: Slow full table scan
SELECT * FROM request_logs
WHERE agent_id = 'abc123'
ORDER BY created_at DESC
LIMIT 100;
-- Execution time: 2,400ms
-- AFTER: Use covering index
SELECT request_id, created_at, threat_type, pii_count
FROM request_logs
WHERE agent_id = 'abc123'
ORDER BY created_at DESC
LIMIT 100;
-- Execution time: 8ms (300x faster)
-- Use EXPLAIN ANALYZE to profile queries
EXPLAIN ANALYZE SELECT ...;
Performance Metrics
After optimization: P50 latency 45ms, P95 120ms, P99 280ms. Database CPU usage reduced from 85% to 12%. Redis hit rate: 94%. Handles 5,000 req/sec per instance.
High Availability
Design for 99.9% uptime with redundancy and failover.
Multi-Region Deployments
Region Configuration:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Primary: us-east-1 (Virginia)
├─ 3 app servers (autoscaling 3-10)
├─ PostgreSQL primary (RDS Multi-AZ)
└─ Redis cluster (3 nodes)
Secondary: eu-west-1 (Ireland)
├─ 2 app servers (autoscaling 2-6)
├─ PostgreSQL read replica
└─ Redis cluster (3 nodes)
Tertiary: ap-southeast-1 (Singapore)
├─ 2 app servers (autoscaling 2-6)
├─ PostgreSQL read replica
└─ Redis cluster (3 nodes)
// Route53 latency-based routing
// Users automatically routed to nearest healthy region
Database Replication
PostgreSQL streaming replication for read scalability:
// Connection config with read replicas
const writePool = new Pool({ host: 'primary.db.internal' });
const readPool = new Pool({
host: 'replica.db.internal',
max: 50 // Read replicas can handle more connections
});
// Route queries appropriately
async function getAgent(agentId) {
// Reads go to replica
return await readPool.query('SELECT * FROM agents WHERE id = $1', [agentId]);
}
async function updateAgent(agentId, data) {
// Writes go to primary
return await writePool.query('UPDATE agents SET ... WHERE id = $1', [agentId, ...]);
}
Automatic Failover
When primary fails, promote replica automatically:
// AWS RDS Multi-AZ automatic failover
// - Detects primary failure within 60 seconds
// - Promotes standby to primary
// - Updates DNS to point to new primary
// - Total downtime: 60-120 seconds
// Application connection retry logic
async function executeQuery(query, params, maxRetries = 3) {
let attempt = 0;
while (attempt < maxRetries) {
try {
return await pool.query(query, params);
} catch (error) {
if (error.code === 'ECONNREFUSED' || error.code === '57P03') {
// Connection refused or terminated - likely failover
attempt++;
await sleep(2000); // Wait for DNS propagation
continue;
}
throw error;
}
}
}
Health Checks and Monitoring
Continuous health monitoring with alerts:
// Health check endpoint
app.get('/health', async (req, res) => {
const checks = {
database: false,
redis: false,
ml_model: false
};
try {
// Database check
await pool.query('SELECT 1');
checks.database = true;
// Redis check
await redis.ping();
checks.redis = true;
// ML model check
const testInput = "test";
await classifyInjection(testInput);
checks.ml_model = true;
const allHealthy = Object.values(checks).every(v => v);
res.status(allHealthy ? 200 : 503).json({
status: allHealthy ? 'healthy' : 'degraded',
checks: checks,
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
checks: checks,
error: error.message
});
}
});
// Load balancer polls /health every 10 seconds
// Remove unhealthy instances from rotation
Monitoring Stack
Datadog for metrics, PagerDuty for alerts, Sentry for error tracking. SLA: 99.9% uptime (8.76 hours downtime per year). Current uptime: 99.97% (2.6 hours downtime in 2025).
API Rate Limits
Per-tier limits prevent abuse and ensure fair usage.
Tier Limits
Rate Limits by Tier:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Free Tier:
├─ Monthly: 50,000 requests
├─ Burst: 100 requests/minute
├─ Max concurrent: 5
└─ Overage: Blocked
Starter Tier:
├─ Monthly: 500,000 requests
├─ Burst: 500 requests/minute
├─ Max concurrent: 20
└─ Overage: Soft limit (alert at 90%)
Enterprise Tier:
├─ Monthly: Unlimited
├─ Burst: 2,000 requests/minute
├─ Max concurrent: 100
└─ Overage: N/A (custom SLAs)
Burst Allowances
Token bucket algorithm for burst traffic:
// Token bucket implementation (simplified)
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens
this.tokens = capacity; // Current tokens
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
async consume(count = 1) {
// Refill tokens based on time elapsed
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
// Check if enough tokens available
if (this.tokens >= count) {
this.tokens -= count;
return true;
}
return false;
}
}
// Example: Starter tier = 500 req/min = 8.33 req/sec
const bucket = new TokenBucket(500, 8.33);
Rate Limit Headers
Every API response includes rate limit info:
// Response headers
X-RateLimit-Limit: 50000 // Monthly limit
X-RateLimit-Remaining: 48234 // Requests left this month
X-RateLimit-Reset: 1709251200 // Unix timestamp (month reset)
X-RateLimit-Burst: 100 // Burst limit (per minute)
X-RateLimit-Burst-Remaining: 87 // Burst tokens remaining
// When limit exceeded (HTTP 429)
{
"error": {
"type": "rate_limit_exceeded",
"message": "Monthly request limit reached (50,000)",
"reset_at": "2026-03-01T00:00:00Z",
"upgrade_url": "https://agentaishield.com/pricing"
}
}
Rate Limit Best Practices
Check X-RateLimit-Remaining header before bulk operations. Implement exponential backoff on 429 responses. Cache AAIS results where possible to reduce API calls.
Custom Deployment
For enterprises requiring on-premise or air-gapped deployments.
Docker Containers
# Dockerfile (simplified)
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Download ML models
RUN npm run download-models
EXPOSE 3000
CMD ["node", "server.js"]
# docker-compose.yml
version: '3.8'
services:
aais-api:
image: agentaishield/aais:latest
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/aais
- REDIS_URL=redis://redis:6379
- NODE_ENV=production
depends_on:
- postgres
- redis
postgres:
image: pgvector/pgvector:pg15
volumes:
- postgres-data:/var/lib/postgresql/data
environment:
- POSTGRES_DB=aais
- POSTGRES_USER=aais
- POSTGRES_PASSWORD=${DB_PASSWORD}
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
volumes:
postgres-data:
redis-data:
Kubernetes Manifests
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: aais-api
spec:
replicas: 3
selector:
matchLabels:
app: aais-api
template:
metadata:
labels:
app: aais-api
spec:
containers:
- name: aais
image: agentaishield/aais:latest
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: aais-secrets
key: database-url
- name: REDIS_URL
value: redis://redis-service:6379
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: aais-service
spec:
selector:
app: aais-api
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer
Terraform Modules
# main.tf (AWS deployment)
module "aais_vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "aais-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
}
module "aais_rds" {
source = "terraform-aws-modules/rds/aws"
identifier = "aais-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.large"
allocated_storage = 100
storage_encrypted = true
multi_az = true
backup_retention_period = 7
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.main.name
}
module "aais_elasticache" {
source = "terraform-aws-modules/elasticache/aws"
cluster_id = "aais-redis"
engine = "redis"
node_type = "cache.t3.medium"
num_cache_nodes = 3
subnet_group_name = aws_elasticache_subnet_group.main.name
security_group_ids = [aws_security_group.redis.id]
}
module "aais_ecs" {
source = "terraform-aws-modules/ecs/aws"
cluster_name = "aais-cluster"
fargate_capacity_providers = {
FARGATE = {}
FARGATE_SPOT = {}
}
}
Environment Variable Configuration
# .env.production
NODE_ENV=production
PORT=3000
# Database
DATABASE_URL=postgresql://user:pass@host:5432/aais
DB_SSL=true
DB_POOL_MAX=20
# Redis
REDIS_URL=redis://host:6379
REDIS_PASSWORD=your-redis-password
# Security
JWT_SECRET=your-jwt-secret-key
SESSION_SECRET=your-session-secret
# ML Models (local paths for air-gapped)
PII_MODEL_PATH=/app/models/pii-ner.onnx
INJECTION_MODEL_PATH=/app/models/injection-classifier.onnx
EMBEDDING_MODEL_PATH=/app/models/embeddings.onnx
# External APIs (optional, can be disabled)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Monitoring (optional)
DATADOG_API_KEY=...
SENTRY_DSN=...
# Feature Flags
ENABLE_PROXY_MODE=true
ENABLE_TRUSTSHIELD=true
ENABLE_WEBHOOKS=true
On-Premise Support
Enterprise customers receive dedicated support for custom deployments including installation, configuration, and ongoing maintenance. Contact [email protected] for deployment guides.
Architecture Mastery Complete
You now understand how AgentAIShield works under the hood: from request routing and Trust Score calculations to job scheduling, performance optimization, and custom deployments. This knowledge enables you to:
- Optimize AAIS for your specific workload and scale requirements
- Deploy on-premise or in air-gapped environments
- Troubleshoot performance bottlenecks and latency issues
- Integrate AAIS deeply into your infrastructure
- Architect high-availability systems with multi-region failover
Expert Training Complete!
You've completed all AgentAIShield training modules. You're now equipped to build, secure, monitor, and scale production AI agents with confidence. Ready to put it into practice?