Call log
Every inbound call, scored across six dimensions and linked to the downstream outcome. Use the Review queue tab for your daily 15-minute sweep.
Every inbound call, scored across six dimensions and linked to the downstream outcome. Use the Review queue tab for your daily 15-minute sweep.
2 min 18 sec · May 22, 6:14 pm · Grover Beach · Returning tenant B-204 · 3 flags across 5 turns.
Rewrite what the AI should have said. This becomes a training pair (rejected → preferred) and optionally a new KB article. Anything you save here is reviewed before it enters the dataset.
You are the AI receptionist for SuperStorage. Be concise and warm. Confirm tenant identity before discussing balance. NEVER promise refunds, waivers, or charges without policy citation. Use the lookup_facility_policies tool for anything about prorate, late fees, lien, hours, access, climate, or insurance.
lookup_facility_policies("move_out", "grover-beach") — tool was not called.Every transfer (was it warranted?) and every missed transfer (caller asked / threatened / repeated · AI kept going). This is where you tune the model's escalation policy.
| Trigger | Type | Volume | Approved | Disagreed | Precision |
|---|---|---|---|---|---|
| Explicit "speak to human" | phrase match | 81 | 79 | 2 | 97.5% |
| Threat keywords (lawyer, sue, BBB) | phrase match | 12 | 12 | 0 | 100% |
| Sentiment threshold (≤ −0.5) | classifier | 34 | 28 | 6 | 82.4% |
| Intent: complaint | classifier | 22 | 19 | 3 | 86.4% |
| 3+ unresolved clarification turns | heuristic | 28 | 22 | 6 | 78.6% |
| Intent: auction / lien / legal | policy | 19 | 19 | 0 | 100% |
| Intent: payment dispute | policy | 11 | 10 | 1 | 90.9% |
| Caller silent > 20s | heuristic | 7 | 4 | 3 | 57.1% |
| Call | Trigger | To | Caller waited | Resolved by staff? | Verdict |
|---|---|---|---|---|---|
| CALL-2026-05-22-0614 prorate question · Smith B-204 |
explicit_human_request | Andre · 7s | 7s | Yes · "no prorate" | |
| CALL-2026-05-22-1453 auction · Carmen L-12 |
intent_lien_policy | Lien team · 12s | 14s | Yes | |
| CALL-2026-05-22-1601 gate code reset |
3+_clarifications | Andre · 22s | 38s | Yes | |
| CALL-2026-05-22-1822 "I need to speak to someone" |
explicit_human_request | Voicemail (after hours) | 2:11 | No · VM only |
Linda Park · B-77 · Grover Beach. Three independent rules should have triggered a transfer; none did. Caller hung up after 5:09.
| Rule | Threshold | Observed | Should have fired? | Fired? | Action |
|---|---|---|---|---|---|
| Threat keywords | contains(lawyer|sue|attorney|BBB|fraud) | 2× "lawyer" @ 1:42, 2:18 | Yes · HIGH | ✗ No | Inspect why → |
| Sentiment threshold | avg(last 3 turns) ≤ −0.5 | −0.7 @ 1:15 | Yes | ✗ No | Inspect → |
| Repeat / clarification loop | ≥ 3 unresolved clarifications | 5 by 3:30 | Yes | ✗ No | Inspect → |
| Intent: complaint | classifier ≥ 0.7 | 0.62 (sub-threshold) | Borderline | — No | Inspect → |
| Explicit human request | phrase_match(human|person|manager…) | none | No | — No | — |
turn.role == 'caller' AND turn.text contains a keyword. Caller said "my lawyer told me to call" — keyword matched. The rule fired upstream but was suppressed by the already_resolved flag set incorrectly when get_tenant succeeded earlier in the call.
already_resolved suppression from threat_keywords rule. Threats should always escalate.Questions the AI couldn't answer well — either no article was retrieved, or the answer contradicted reality. Each row groups all calls with the same underlying gap so you author once and fix n.
The training pipeline reads each call plus the thing that happened next. Linking outcomes turns a transcript into a measurable signal — booked, paid, abandoned, returned.
Every metric that matters for the AI receptionist, in one view. Roll up by facility, drill into any number to see the underlying calls.
| Intent | Volume | Avg score | Hallucination | Containment | Conversion | FCR | Sentiment Δ |
|---|---|---|---|---|---|---|---|
| Move-in inquiry | 187 | 86 | 0.5% | 94% | 41% booked | 88% | +0.31 |
| Payment | 142 | 81 | 0.7% | 89% | 82% paid | 79% | +0.12 |
| Hours / access | 98 | 71 | 3.1% | 97% | — | 82% | +0.04 |
| Move-out / prorate | 38 | 48 | 15.8% | 52% | — | 61% | −0.22 |
| Lockout / overlock | 52 | 76 | 1.9% | 71% | — | 76% | +0.08 |
| Lien / auction | 27 | 84 | 0% | 11% (by design) | — | 91% | +0.03 |
| Complaint | 31 | 62 | 0.0% | 19% (should be 0%) | — | 54% | −0.41 |
| Tour booking | 37 | 89 | 0.0% | 97% | 73% scheduled | 92% | +0.28 |
| Version | Deployed | Volume | Avg score | Hallucination | Δ vs. prev |
|---|---|---|---|---|---|
| v1.0 baseline | Feb 14 | 1,847 | 61 | 5.2% | — |
| v1.1 + prorate KB | Apr 03 | 1,612 | 68 | 3.8% | +7 |
| v1.2 + escalation tune | Apr 28 | 1,288 | 74 | 2.4% | +6 |
| v1.3 DPO round 1 | May 12 | 782 | 79 | 2.0% | +5 |
| v1.4 current | May 19 | 412 | 83 | 1.9% | +4 |
| v1.5 candidate | staged | 94 (shadow) | 87 | 1.4% | +4 (shadow) |
Where Call Review's harvest goes. Three loops, ordered by leverage: KB authoring (instant, no training) · Prompt / playbook (1-day cycle, A/B-able) · DPO fine-tune (weekly, gated).
| Pair ID | Source call | Intent | Rejected (AI said) | Preferred (mgr rewrote) | Severity | Author | Action |
|---|---|---|---|---|---|---|---|
| tp_8h2k9 | CALL-…0614 | prorate | HIGH ×3 | Andre | |||
| tp_4j1m7 | CALL-…1547 | complaint | HIGH ×3 | Andre | |||
| tp_2k9p4 | CALL-…1822 | hours | MED ×1 | Sarah | |||
| tp_9x7s2 | CALL-…2014 | lockout | HIGH ×3 | Andre |