The Call Before the Transfer

When a caller asks our voice agent to put them through to a specific person, the easy thing is to dial that person and bridge the two calls. In telephony this is a "blind transfer", and it is the rude version. You hand a live caller to someone who did not ask for the call, has no idea who is on the line, and may be mid-task. Often the receiving person picks up, hears a stranger, and the caller explains themselves from scratch. The rest of the time it rings out and the caller lands back where they started, having waited for nothing.

A "warm transfer" is the courteous alternative: an operator calls ahead, explains who is waiting and why, and connects the two only once the receiving party has agreed to take it. We wanted exactly that behaviour, with no human operator in the middle and a caller already live on the line. So we built a tool the agents on Hoycall, our voice agent platform, can call mid-conversation: check_person_availability.

The shape of it

The flow is simple to describe. The caller asks to be transferred. Before doing anything to the caller's line, the agent fires a side-call to the target person, delivers one sentence of context about who wants them and why, and asks a single question: are you free to take a call right now? The person answers. The agent comes back to the original caller and either puts them through, or tells them the person cannot take a call at the moment.

The interesting part is that the side-call is itself an AI agent, not a recording. When the tool is enabled for a business we provision a second, purpose-built agent for that business and cache its ID. This inner agent has one job and a prompt that does nothing else: deliver the greeting, ask the availability question, listen, report a decision, and hang up. It runs on a small fast model, is capped at well under a minute, and is told in plain terms not to hold a conversation. Its greeting is generated from the caller_description the outer agent passes in, so the person being called hears something like "this is a quick call regarding a customer asking about a refund on order 4471, are you free to take a call right now?" rather than a cold ring from an unknown number.

The constraint is the clock

The hard part is not the side-call. It is that the original caller is sitting on the line the entire time it happens. Every second the inner agent spends dialling, greeting, and waiting for an answer is a second of dead air for someone who is still connected and still waiting. That sets a budget. Resolve the availability question fast enough that the held caller's wait stays tolerable, or the warm transfer is worse than the blind one it replaced.

So the design problem became: get a reliable yes or no out of a real phone call, end to end, in the time a person will tolerate silence. We gave the whole operation a hard ceiling of about 35 seconds and then spent the engineering effort on almost never reaching it.

Getting the answer out early

The naive approach is to place the call, wait for it to finish, fetch the finished transcript, and read the answer off it. That is far too slow. Post-call processing alone can add many seconds after the person has already said "sure, put them through". We needed the decision the moment it existed, not after the platform finished tidying up.

The result is a layered poller that races several sources and takes whichever resolves first:

The inner agent reports directly. The moment it has a clear answer, the inner agent calls a report_availability tool with available, not_available, or unclear, before it even says thank you. That writes a small record our poller is watching for. This is the fast path and the common one, and it lands the decision in roughly twelve to fifteen seconds, well before the call has formally ended.
The live transcript pattern. If the inner agent does not report cleanly, we watch the live transcript for the minimum complete exchange: agent greeting, then a human reply, then any agent follow-up. That pattern means a real answer was given, and the platform usually publishes it before it flips the call's status. We analyse it and exit early.
Non-answer signals. Voicemail detection, no-answer, busy, and rejected come straight from call metadata. There is no point waiting on a transcript that will never arrive, so these short-circuit immediately to "not available".

Only if every fast path stays silent does the poller run out its budget and report that the person could not be reached in time. When a transcript is ambiguous rather than absent, a small model classifies it at zero temperature into the same three buckets, so a mumbled "uh, yeah, I guess" still resolves to a decision rather than a hang.

The outcomes we accept

The tool returns one of three things, and the outer agent's behaviour follows from it. If the person said yes, the agent goes back to the caller and completes the transfer to someone who already knows who is waiting and why. If the person said no, was on voicemail, did not pick up, or the line was busy, the caller is told plainly that the person is not available right now, without being put on a long hold to discover it. If the answer was genuinely unclear, we treat it as a no rather than guess, because connecting a caller to someone who did not clearly agree is the exact failure the tool exists to prevent.

The trade-off we accepted is visible and deliberate. A warm transfer costs a real side-call and a short wait that a blind transfer does not. In return the receiving person is never ambushed, the caller is never dumped onto someone who cannot talk, and nobody spends thirty seconds on hold to find out the person was in a meeting. For the businesses running this, the person on the other end of the transfer is usually the owner or a senior member of staff whose time and attention are the scarce resource. Asking first, in seconds, is the whole point.