Everything you need to know about AgentAIShield. Can't find an answer? Check our full docs.
AgentAIShield detects three categories of AI security risks:
Detection happens on both the prompt (input) and response (output) side, giving you full coverage of your AI pipeline.
AAIS detects the following PII types with severity ratings:
Detection uses pattern matching combined with contextual analysis to minimize false positives.
AAIS detects these prompt injection categories:
Each detection has a confidence score (0.0–1.0). High confidence (≥0.85) is flagged as severity high.
Detection accuracy varies by PII type:
For injection detection, confidence scores help you tune your response. In Monitor Mode, all detections are logged — you can review and adjust policies. In Proxy Mode, you can set a minimum confidence threshold before blocking.
Low-confidence detections (below 0.5) are logged but typically don't trigger blocks.
Every API key in AAIS gets an Agent Trust Score™ — a number from 0 to 100 with a letter grade (A+ to F). Think of it as a behavioral credit score for your AI agents.
The score is updated after every request and considers:
Grade thresholds: A+(95-100), A(90-94), B+(85-89), B(75-84), C+(65-74), C(50-64), D(35-49), F(0-34).
Score confidence (0-1) indicates how statistically reliable the trust score is. New agents start with low confidence because there isn't enough data to form a stable picture.
You can see score_confidence in the GET /api/trust/agents response. This prevents a single PII detection from tanking a new agent's score unfairly.
Badges are earned for sustained good behavior:
Badges are visible in the dashboard and readable via the API (GET /api/trust/agents/:id).
Start with Monitor Mode. It's zero-risk, zero-latency, and gives you immediate visibility into your AI traffic.
base_url. AAIS intercepts calls inline, can block based on policies. Adds ~50-150ms latency per call.Use Proxy Mode when you need blocking (e.g., preventing PII from reaching the LLM, blocking injection attempts before they execute). Use Monitor Mode for visibility without risk.
You can run both simultaneously — some agents in proxy mode, others in monitor mode.
No. Monitor Mode is designed to be completely non-blocking.
You call POST /api/monitor/ingest after your AI call completes, and the endpoint responds with {"ok": true} immediately — all PII scanning, injection detection, and trust score updates happen asynchronously.
Best practice: don't await the ingest call in your hot path. Fire it and forget it. Even if AAIS is unreachable, your application should continue without any impact.
AAIS supports alerts via webhooks (configure in Dashboard → Settings → Webhooks). You can trigger on:
Alternatively, poll GET /api/dashboard/threats and GET /api/dashboard/stats from your own alerting system to build custom notifications.
Monitor Mode supports any provider — just pass the provider name and model as strings. It's completely provider-agnostic.
Proxy Mode supports:
/v1/chat/completions, /v1/completions, /v1/embeddings/v1/messagesYes — in Proxy Mode, streaming (Server-Sent Events) is fully supported for /v1/chat/completions. Pass "stream": true in your request as usual.
In Monitor Mode, you report after the full stream completes — just collect the full response text before sending the ingest request.
AAIS stores metadata about AI calls (model, provider, token counts, latency, status) and detection results (PII types found, injection confidence, violations). It does not store the full prompt or response text.
Specifically, for prompts, AAIS stores a SHA-256 hash (one-way, cannot be reversed) for deduplication. The actual text is processed in memory during scanning and immediately discarded.
If you're self-hosting AAIS, you have full control over what's stored by configuring the database schema.
Yes. AAIS is a Node.js + PostgreSQL application that runs anywhere — your own server, Railway, Render, Fly.io, AWS, GCP, or locally.
Self-hosting gives you:
Required environment variables: DATABASE_URL (PostgreSQL), JWT_SECRET, and provider API keys for proxy mode.
AAIS is designed to help you maintain compliance by detecting when PII appears in AI pipelines where it shouldn't. Since AAIS stores only hashes (not actual prompt text), the self-hosted version significantly reduces your data controller obligations.
For HIPAA compliance (PHI detection): AAIS detects medical identifiers (SSN, DOB, account numbers). Self-hosting on your own HIPAA-compliant infrastructure is the recommended approach for healthcare applications.
Consult your legal team for specific compliance requirements. AAIS is a tool to assist, not a certification.
Default rate limits for a self-hosted instance:
For high-volume agents (>100 RPM), you have two options: create multiple API keys (each gets 100 RPM), or adjust the max value in the rate limiter config in api/monitor.js for your self-hosted instance.
AgentAIShield is self-hostable. You run it on your own infrastructure, so the cost is just your hosting fees (typically $5-20/month on Railway or Render for small-to-medium workloads).
There's no per-request fee. No seat limits. No usage caps (beyond your hardware).
For managed hosting or enterprise support, contact us.
Unlimited. There's no hard limit on API keys or agents in the self-hosted version. Each API key creates one agent profile with its own trust score.
Recommended: create one API key per agent/application for clean trust scoring and audit trails. Don't share keys across different applications.
The Prompt Sanitizer (POST /api/sanitize) is a pre-flight API that cleans prompts before they reach your LLM. It detects and removes 12 types of PII and neutralizes 8 injection attack patterns.
Three modes: redact (replace with label like [SSN_REDACTED]), mask (replace with *****), or remove (delete entirely). Authenticate with your aais_ API key. The endpoint returns the sanitized text, a list of all modifications made, and a risk score (0–1). Processing adds under 5ms — negligible for production use.
Quarantine is AAIS's auto-kill switch for misbehaving agents. You set a trust score threshold (e.g., 40) per agent via PUT /api/quarantine/settings. When an agent's trust score drops below that threshold — due to repeated PII violations, injection attempts, or policy failures — AAIS automatically quarantines it.
Quarantined agents are blocked in Proxy Mode (requests return a 403). Monitor Mode continues to log their traffic for forensics. You can also manually quarantine (POST /api/quarantine/:agentId) and manually lift quarantine (DELETE /api/quarantine/:agentId) at any time.
Red Team Mode runs automated adversarial attacks against your own agents to surface vulnerabilities before real attackers find them. Start a test with POST /api/redteam/run.
5 attack categories: PII extraction, jailbreak attempts, prompt injection, role override ("ignore all previous instructions"), and data exfiltration. 3 intensity levels: light (5 attacks/category), standard (15), thorough (30+). Results include an overall security grade (A–F), per-category scores, and a detailed vulnerability list with evidence and recommendations.
AAIS generates audit-ready compliance reports from your agent activity data with one API call: POST /api/compliance/report. Supported frameworks: SOC2 (security, availability, confidentiality), HIPAA (PHI protection, access controls, audit logging), and GDPR (data minimization, purpose limitation, breach detection). Use "framework": "all" to generate all three at once.
Each report includes an overall compliance score (0–100), a per-control breakdown (pass/fail/partial), and specific findings with remediation guidance. Reports are stored and retrievable at any time via GET /api/compliance/reports/:id.
MCP (Model Context Protocol) is an open standard that lets AI agents discover and call external tools. AAIS exposes a full MCP server at /api/mcp, meaning any MCP-compatible agent — Claude, GPT, Gemini, LangChain, CrewAI — can use AAIS as a native tool set with zero custom integration code.
Add AAIS to your MCP config: {"mcpServers": {"agentaishield": {"url": "https://your-aais.com/api/mcp", "headers": {"Authorization": "Bearer aais_YOUR_KEY"}}}}. The agent will automatically discover 5 tools: scan_prompt, report_interaction, get_trust_score, check_budget, and verify_agent.
AAIS builds a behavioral baseline from each agent's first 500 requests, tracking: average tokens in/out, average latency, model distribution, and error rate. After the baseline is established, every subsequent request is compared using Z-score analysis.
A Z-score above 2.0 triggers a warning anomaly; above 3.5 triggers a critical anomaly. Example: if your agent normally uses ~500 tokens/request and suddenly sends 50,000 tokens (a 100× spike), that's a critical anomaly — potentially indicating prompt injection or a compromised agent. View anomalies via GET /api/fingerprint/:id/anomalies. Reset the baseline after intentional behavior changes with POST /api/fingerprint/:id/reset.
No questions match your search. Try different keywords or browse the full docs.
Browse our full documentation or check the machine-readable integration guide for AI agents.