Resource · Framework
The Trust & Safety Decision System Map
A vendor-neutral reference model for any system that makes automated allow/deny, risk, and exposure decisions. It separates the problem into seven layers - decisions, signals, policy logic, enforcement, human operations, change control, and auditability - so you can audit, design, or qualify a Trust and Safety or decisioning system against it. These are common patterns across fintech, marketplaces, AI platforms, and regulated SaaS.
- You run automated allow/deny, ranking, or exposure decisions.
- You need replay or audit evidence for incidents, customers, or regulators.
- You ship ML or LLMs in production and need control over their influence.
The seven layers
- Surfaces & Verdicts - what the system decides.
- Signals & Evidence - what decisions use.
- Policy Logic - how evidence becomes verdicts.
- Enforcement Runtime - where, when, and how outcomes are applied.
- Human Ops & Governance - authority and workflow.
- Change Control - how it evolves safely.
- Audit, Replay & Privacy - prove, explain, reconstruct.
1. Surfaces & Verdicts
what the system decides
| Category | Mechanism | Examples |
|---|---|---|
| Access & eligibility | allow / deny action | deny API call by policy; block LLM tool call; prevent seller from posting |
| suspend / reinstate subject | freeze wallet; suspend merchant; reinstate account after appeal | |
| Risk assessment | score / tier assignment | transaction risk score; user trust tier; API key risk tier |
| abuse / fraud classification | AML flag; account-takeover suspicion; prompt-injection detected | |
| Exposure & distribution | visibility control | suppress scam listing; hide unsafe AI output; block ad delivery |
| ranking adjustment | downrank borderline content; demote low-trust sellers; reduce reach | |
| Flow decisions | auto-resolve vs review | auto-approve low-risk payment; hold withdrawal; quarantine AI output |
| routing to handling path | route to AML vs fraud ops; AI safety vs legal; enterprise escalation queue | |
| Volume & velocity | rate limits / quotas | throttle withdrawals; cap model calls per tenant; limit posting frequency |
| temporary restrictions | 24h cash-out freeze; cooldown after suspicious behavior; DM ban for new users | |
| Data access & flow | data access constraints | block retrieval from HR docs; deny export to external connector; restrict tool scopes |
| data transformation constraints | redact PII in outputs; block secrets leakage; enforce a no-code-execution zone |
2. Signals & Evidence
what decisions use
| Category | Mechanism | Examples |
|---|---|---|
| Entity state | identity / verification attributes | KYC tier; MFA enabled; verified business; device trust state |
| enforcement history | prior chargebacks; past strikes; previous holds or overrides | |
| Event context | action / object metadata | amount and currency; tool name and arguments; listing category and price |
| session / device metadata | device fingerprint; IP reputation; auth method; session age | |
| Behavior signals | sequence / velocity features | burst withdrawals; rapid API calls; repeated denied tool attempts |
| pattern anomalies | payout change then withdraw; login then key creation; prompt spam then tool calls | |
| Relationship signals | linkage indicators | shared wallets; shared devices; shared IP ranges |
| coordination indicators | seller rings; coordinated postings; clustered agent behavior | |
| Model outputs | ML scores / labels | fraud probability; anomaly score; toxicity label |
| LLM classifications (with rationale and confidence) | intent detection; policy label for a prompt; sensitive-data presence tag | |
| Human & external | human labels / outcomes | confirmed fraud; false positive; appeal upheld or overturned |
| external intelligence | sanctions hit; high-risk jurisdiction list; consortium fraud score |
3. Policy Logic
how evidence becomes verdicts
| Category | Mechanism | Examples |
|---|---|---|
| Rules | conditions & thresholds | block if score over X; deny if jurisdiction restricted; allow if KYC at least 2 |
| exceptions / allowlists | regulated-cohort exception; enterprise allowlist; internal test accounts | |
| Statistical decisioning | banding / cutoffs | approve below X; review X to Y; block above Y |
| ensembles / fusion | combine fraud + AML + behavior; blend anomaly + linkage + score | |
| Composition & precedence | rules constrain models | a sanctions rule overrides a model allow; policy blocks a tool regardless of LLM judgment |
| models inform rules | dynamic thresholds from drift; score drives routing and severity | |
| Externalized decisions | vendor verdict integration | third-party fraud verdict; device-reputation vendor; SaaS moderation API |
| consistency / fallback | compare vendor vs internal; fall back on vendor outage; confidence gating | |
| Control contracts | scope | EU-only policy; per-product policy; per-tenant overrides |
| determinism contract | same event + state + policy version gives the same verdict; version-pinned feature snapshot |
4. Enforcement Runtime
where, when, and how outcomes are applied
| Category | Mechanism | Examples |
|---|---|---|
| Action semantics | hard enforcement | decline payment; block prompt or tool call; revoke session or token |
| step-up / friction | MFA challenge; re-KYC; CAPTCHA or re-auth | |
| Conditional / deferred | allow-with-monitoring | approve with enhanced monitoring; allow a tool call with strict logging |
| holds / quarantines | pending withdrawal review; content hidden until review; output quarantine | |
| Timing model | synchronous | checkout decision under 50ms; tool-call admission inline |
| asynchronous | hold then review; batch suspension overnight | |
| Enforcement points | edge / gateway | API gateway deny; LLM proxy blocks a tool call |
| service / worker | payment service declines; worker freezes accounts | |
| Propagation | cross-system effects | disable in IAM + payments + support; open a case in the case system |
| notifications | notify the user of a restriction; page on-call for a critical event | |
| Failure posture | fail-closed / fail-open | fail-closed for withdrawals; fail-open for low-risk reads with caps |
| degraded mode | cached policy snapshot; disable the LLM classifier but keep the rules |
5. Human Ops & Governance
authority and workflow
| Category | Mechanism | Examples |
|---|---|---|
| Review | triage | route large withdrawals to a senior queue; route AI safety to a specialist queue |
| adjudication | confirm fraud and freeze; mark false positive and restore capability | |
| Approvals | operational approvals | dual approval for a large withdrawal; approval for a payout-address change |
| policy-change approvals | compliance sign-off for an AML rule; security sign-off for a tool allowlist | |
| Appeals & escalations | user appeals | seller reinstatement; wallet-unfreeze request; takedown appeal |
| enterprise / regulator escalations | customer security escalation; regulator inquiry packet | |
| Overrides | override authority | senior-ops override; incident-commander emergency action |
| override safeguards | reason required; time-boxed override; mandatory ticket link | |
| Quality controls | calibration | disagreement-review sessions; policy-interpretation alignment |
| reviewer metrics | overturn rate; false-positive rate; time-to-decision by queue | |
| Separation of duties | role boundaries | author cannot deploy; deployer cannot approve; reviewer cannot edit policies |
| accountability | named approver recorded; signed change record; immutable override log |
6. Change Control
how it evolves safely
| Category | Mechanism | Examples |
|---|---|---|
| Versioning | policy versions | ruleset v12; rule hash; reason-code taxonomy version |
| model versions | fraud model v3.2; classifier prompt version; feature schema version | |
| Progressive rollout | canary / percent rollout | 5% to 25% to 100%; per-tenant rollout; per-region rollout |
| shadow mode | run a new model without enforcement; log diffs vs baseline | |
| Evaluation | offline replay | replay the last 30 days; measure precision and recall on labeled cases |
| online monitoring | drift detection; queue impact; false-positive trend | |
| Experimentation | A/B tests | threshold tuning; friction-variant testing; ranking-demotion strength |
| guardrails | blast-radius cap; auto-rollback trigger; restricted cohorts only | |
| Emergency controls | kill switches | disable auto-block; force review-only; disable one policy group |
| rollback | revert in minutes; rollback by tenant, product, or region | |
| Governance workflow | change workflow | proposal to review to approval to deploy; mandatory peer review |
| post-change validation | watch-window after deploy; incident review if metrics spike |
7. Audit, Replay & Privacy
prove, explain, reconstruct
| Category | Mechanism | Examples |
|---|---|---|
| Decision ledger | core record | verdict + reason codes + timestamps + actor; subject IDs recorded |
| correlation | trace ID across services; case ID; request ID | |
| Traceability | input snapshot | feature snapshot ID; model output ID; external list version |
| content snapshot | prompt hash + redacted text; output hash + redacted text | |
| Attribution | policy lineage | policy version; rule IDs hit; exception path taken |
| model lineage | model version; threshold-set ID; calibration-set ID | |
| Replay | reproduce | reproduce a disputed decline; reproduce a tool-call denial; reproduce a suspension |
| what-if simulation | replay under a new threshold; replay under a new model; tenant-specific replay | |
| Reporting | effectiveness | abuse catch rate; fraud prevented; appeal-overturn trend |
| operations | SLA adherence; backlog by queue; latency distribution | |
| Privacy & retention | minimization | store hashes not raw; redact PII; store derived features only |
| retention | 30-day raw retention; 1-year decision ledger; tenant-specific retention |
How to use this map
- Audit an existing Trust and Safety or decisioning system
- Find the control layers you are missing
- Separate policy decisions from enforcement mechanics
- Define safe boundaries for ML and LLM influence
- Structure a conversation with compliance, security, and regulators
- Use it as an ownership map: policy vs enforcement vs ops
Where Swiftward fits
Swiftward is one way to build the Policy Logic and Audit layers of this map: deterministic rules that constrain non-authoritative model signals, versioned and replayable decision traces, and human-in-the-loop where it is needed, on infrastructure you run yourself. The map is the problem; the engine is one implementation of it. See the platform · Trust & Safety.