Trust Score Explained - AgentAIShield Training

Current Trust Score

100.0

Overall Score

A+

Letter Grade

Requests Analyzed (30d)

What is a Trust Score?

The Trust Score is AgentAIShield's credit-bureau-style reputation score for AI agents. It's a composite metric (0-100) that evaluates your agent's security posture, data hygiene, and behavioral reliability over a rolling 30-day window.

Every agent starts at neutral (C / 70) and moves up or down based on real-world behavior. A perfect score of 100 earns an A+ grade, while scores below 50 result in an F (failing).

Why Trust Scores Matter

Trust Scores give you instant visibility into which agents are safe to deploy in production and which need attention. They're designed to be simple (A-F grade) yet precise (0-100 score) — just like a credit score for your AI infrastructure.

How Trust Scores Are Calculated

Trust Scores are computed using a 6-factor weighted formula. Each factor contributes a percentage to the overall score, reflecting its relative importance to agent security and reliability:

PII Leaks

Weight: 30%

100.0

Measures how often Personally Identifiable Information (emails, phone numbers, SSNs, credit cards) is detected in agent requests or responses. Lower is better.

How to Improve: Sanitize user inputs before sending to LLMs. Use AAIS redaction features to strip PII from prompts. Review PII Detection logs to identify data leak sources.

Injection Resistance

Weight: 20%

100.0

Tracks detected prompt injection attempts — adversarial inputs trying to override system instructions, extract sensitive data, or hijack agent behavior.

How to Improve: Strengthen system prompts with clear boundaries. Use AAIS's injection scanner to block malicious inputs. Educate users on safe prompt patterns.

Policy Compliance

Weight: 20%

100.0

Evaluates adherence to custom content policies (e.g., no offensive language, no medical advice, no legal counsel). Violations reduce this score.

How to Improve: Define clear content policies in AAIS Settings. Use pre/post filters to enforce rules. Monitor policy violation logs and tighten constraints as needed.

Behavioral Consistency

Weight: 15%

100.0

Measures request volume variance and traffic patterns. Sudden spikes or erratic behavior may indicate bot activity or account compromise.

How to Improve: Maintain steady traffic patterns. Implement rate limiting to prevent abuse. Investigate anomalies flagged in AAIS Analytics.

Error Rate

Weight: 10%

100.0

Percentage of requests that fail, timeout, or return errors. High error rates suggest integration issues or unreliable LLM provider connections.

How to Improve: Fix API integration bugs. Add retry logic for transient failures. Monitor uptime and latency. Switch to more reliable LLM providers if needed.

Track Record

Weight: 10%

100.0

Rewards agent tenure and sustained clean streaks. New agents start neutral; older agents with consistent behavior earn bonus points.

How to Improve: Time + consistency. Maintain clean records across all other factors for 7+ days to earn "Clean Streak" badges. Longevity builds trust.

The Formula

Trust Score = (Data Hygiene × 0.30) + (Injection Resistance × 0.20) + (Policy Compliance × 0.20) + (Behavioral Consistency × 0.15) + (Error Rate × 0.10) + (Track Record × 0.10)

Understanding Letter Grades

Scores are converted to familiar A-F letter grades for quick interpretation. Here's what each grade means:

Grade Scale

A+

95-100

Exceptional: Elite security posture. Zero violations, consistent behavior, production-ready.

90-94

Excellent: Very safe for production. Minor room for optimization.

B+

85-89

Good: Acceptable for production with monitoring. Some policy violations or PII leaks.

80-84

Fair: Safe for staging environments. Address violations before production deployment.

C+

75-79

Neutral: Starting baseline for new agents. Needs improvement before scaling.

70-74

Below Average: Frequent violations or errors. Investigate and remediate.

60-69

Poor: Not safe for production. High error rate, PII leaks, or injection attempts.

0-59

Failing: Critical issues. Immediate remediation required. Do not deploy.

How to Improve Your Score

Trust Scores update in real-time as your agent processes requests. Here's a step-by-step improvement strategy:

1. Fix High-Impact Issues First

Focus on the highest-weighted factors (PII Leaks 30%, Injection Resistance 20%, Policy Compliance 20%). A single fix in these areas yields bigger score gains than optimizing lower-weight factors.

2. Review Your PII Detection Logs

Go to Dashboard → PII Detections and identify which data types are leaking most frequently. Common culprits:

User emails in prompts ("Send a summary to [email protected]")
Phone numbers in customer service logs
Credit cards in transaction debugging outputs

Use AAIS's built-in PII Redaction to automatically strip sensitive data before LLM processing.

3. Enable Injection Blocking

Navigate to Settings → Security and toggle on:

Auto-block detected injections: Prevents malicious prompts from reaching your LLM
Strict mode: Blocks borderline-suspicious inputs (may increase false positives)

4. Define Content Policies

Set up custom rules in Settings → Policies. Examples:

Block outputs containing offensive language
Reject medical/legal advice requests
Flag politically sensitive content for review

5. Maintain Consistent Traffic

Sudden request spikes hurt your Behavioral Consistency score. Implement rate limiting:

// Example: Rate limit per user
if (requestsThisMinute > 10) {
  return { error: "Rate limit exceeded" };
}

6. Build a Clean Streak

Going 7 consecutive days without PII leaks or policy violations earns a "7-Day Clean Streak" badge. 30 days earns "Zero PII 30d." These badges boost your Track Record score and signal trustworthiness to users.

Scores Update Daily

Trust Scores are recalculated once per day at 2 AM CT. Improvements won't reflect immediately — give it 24 hours after implementing fixes to see score changes.

Where to View Trust Scores

Access Trust Scores from multiple places in the AAIS dashboard:

Dashboard Home: The "Agent Trust Scores" widget shows fleet-wide grade distribution (how many A+, A, B+, etc.)
Agent Registry: Click any agent to see its individual score, grade, and factor breakdown
Trust Scores Page: Dedicated view with historical trends, top performers, worst offenders, and drill-down analytics
Request Log: Every request shows a per-event trust impact (did this request improve or hurt the score?)

Trust Score Badges

High-performing agents earn visual badges that you can display on your website or product:

7-Day Clean Streak: 7 days without violations
30-Day Clean Streak: 30 days without violations
Zero PII 30d: No PII detections in 30 days
A+ Elite: Score ≥95 sustained for 7+ days

Embed badges with auto-updating trust scores to build user confidence. Go to Trust Scores → Badges to generate embeddable HTML.

Pro Tip: Aim for B+ Minimum

A score of 85+ (B+ or higher) is the recommended threshold for production deployment. Anything below signals security/reliability issues that should be addressed before scaling.

Next Steps

Now that you understand how Trust Scores work, explore these related topics:

Understanding PII Detection Basics — Learn how AAIS identifies sensitive data
Basic Threat Monitoring — Detect prompt injections and jailbreaks
Advanced Security Features — Deep dive into policy engines, red team testing, and compliance
Trust Score Algorithm Details — Mathematical breakdown for advanced users