All Posts
AI Agents 8 min read

We taught an AI to read messy handwritten timesheets

#ai#healthcare

A healthcare staffing company came to us with a problem that sounds simple until you actually look at it: they needed to process hundreds of timesheets every week. Some were scanned PDFs. Some were photos taken on someone's phone at a weird angle. Some were handwritten on forms that varied by facility. Their operations team was spending hours every day just reading these things and manually typing shift data into their payroll system.

Why traditional OCR failed

Their first instinct — and ours, honestly — was to throw OCR at it. Tesseract, Google Vision, the usual suspects. The problem is that OCR gives you text. It doesn't give you understanding. A timesheet might say "7a-3p" or "0700–1500" or just have a checkmark in a column. The layout changes between facilities. Some nurses write their hours in the margins. OCR would extract characters, but it had no idea what those characters meant in context.

We needed something that could look at a document the way a human does — understanding that this blob of ink in the top-right is a date, those rows are shifts, and that scribble is someone's signature, not a time entry.

Building the agent

We built an AI agent using Claude's vision capabilities. The agent receives a document image and does something simple but powerful: it looks at the whole page first, figures out the layout structure, then extracts data field by field.

The pipeline works in stages:

1. Document intake — accepts PDFs, photos, scans in any orientation 2. Layout analysis — the model identifies where headers, rows, dates, and signatures are 3. Field extraction — pulls out employee name, date, shift start/end, break duration, facility 4. Validation — cross-references against known employee placements and flags anomalies 5. Output — clean structured JSON ready for payroll import

The key insight was treating this as a reasoning problem, not a text extraction problem. We prompt the model to explain what it sees before extracting data. This "think before you extract" approach caught edge cases that pure extraction missed — like when a nurse wrote "called in" instead of logging hours, or when a supervisor's notes overlapped with the time columns.

Handling the weird stuff

Real-world documents are messy. We encountered:

- Timesheets photographed on a car dashboard with sun glare - Forms where someone used white-out and wrote over it - Pages that were scanned upside down - Two different employees' timesheets on the same page - Handwriting that, frankly, we couldn't read either

For each of these, we built specific handling. Orientation detection runs first. The confidence scoring system flags anything the model isn't sure about rather than guessing. When confidence drops below our threshold, the document gets routed to a human reviewer with the AI's best guess pre-filled — so even the fallback path saves time.

The results

After two months in production:

- 95%+ extraction accuracy across all document formats - 100% hands-free processing for clean, standard-format timesheets - Processing time dropped from hours per day to minutes - Payroll errors from manual data entry effectively eliminated

The operations team went from dreading Monday morning timesheet processing to barely thinking about it. The system handles the bulk automatically, and they only step in for the handful of documents that get flagged for review each week.

What we'd do differently

If we were starting this project today, we'd invest more in the feedback loop earlier. We built the review interface after launch, but having it from day one would have let us fine-tune the prompts faster based on real correction data.

We'd also build the confidence scoring tighter from the start. Our initial threshold was too generous — it let some marginal extractions through that should have been flagged. We tightened it after the first week based on the operations team's feedback.

The lesson: AI agents in production aren't about getting 100% accuracy. They're about knowing exactly when they're not accurate and handling those cases gracefully.

Have a similar challenge?

We build production-grade software for companies that need it done right.

Let's Talk