# The Call Before the Transfer

When a caller asks our voice agent to put them through to a specific person, the easy thing
is to dial that person and bridge the two calls. In telephony this is a "blind transfer",
and it is the rude version. You hand a live caller to someone who did not ask for the call,
has no idea who is on the line, and may be mid-task. Often the receiving person picks up,
hears a stranger, and the caller explains themselves from scratch. The rest of the time it
rings out and the caller lands back where they started, having waited for nothing.

A "warm transfer" is the courteous alternative: an operator calls ahead, explains who is
waiting and why, and connects the two only once the receiving party has agreed to take it.
We wanted exactly that behaviour, with no human operator in the middle and a caller already
live on the line. So we built a tool the agents on Hoycall, our voice agent platform, can
call mid-conversation: `check_person_availability`.

## The shape of it

The flow is simple to describe. The caller asks to be transferred. Before doing anything to
the caller's line, the agent fires a side-call to the target person, delivers one sentence
of context about who wants them and why, and asks a single question: are you free to take a
call right now? The person answers. The agent comes back to the original caller and either
puts them through, or tells them the person cannot take a call at the moment.

The interesting part is that the side-call is itself an AI agent, not a recording. When the
tool is enabled for a business we provision a second, purpose-built agent for that business
and cache its ID. This inner agent has one job and a prompt that does nothing else: deliver
the greeting, ask the availability question, listen, report a decision, and hang up. It runs
on a small fast model, is capped at well under a minute, and is told in plain terms not to
hold a conversation. Its greeting is generated from the `caller_description` the outer agent
passes in, so the person being called hears something like "this is a quick call regarding a
customer asking about a refund on order 4471, are you free to take a call right now?" rather
than a cold ring from an unknown number.

## The constraint is the clock

The hard part is not the side-call. It is that the original caller is sitting on the line
the entire time it happens. Every second the inner agent spends dialling, greeting, and
waiting for an answer is a second of dead air for someone who is still connected and still
waiting. That sets a budget. Resolve the availability question fast enough that the held
caller's wait stays tolerable, or the warm transfer is worse than the blind one it replaced.

So the design problem became: get a reliable yes or no out of a real phone call, end to end,
in the time a person will tolerate silence. We gave the whole operation a hard ceiling of
about 35 seconds and then spent the engineering effort on almost never reaching it.

## Getting the answer out early

The naive approach is to place the call, wait for it to finish, fetch the finished
transcript, and read the answer off it. That is far too slow. Post-call processing alone can
add many seconds after the person has already said "sure, put them through". We needed the
decision the moment it existed, not after the platform finished tidying up.

The result is a layered poller that races several sources and takes whichever resolves
first:

- **The inner agent reports directly.** The moment it has a clear answer, the inner agent
  calls a `report_availability` tool with `available`, `not_available`, or `unclear`, before
  it even says thank you. That writes a small record our poller is watching for. This is the
  fast path and the common one, and it lands the decision in roughly twelve to fifteen
  seconds, well before the call has formally ended.
- **The live transcript pattern.** If the inner agent does not report cleanly, we watch the
  live transcript for the minimum complete exchange: agent greeting, then a human reply, then
  any agent follow-up. That pattern means a real answer was given, and the platform usually
  publishes it before it flips the call's status. We analyse it and exit early.
- **Non-answer signals.** Voicemail detection, no-answer, busy, and rejected come straight
  from call metadata. There is no point waiting on a transcript that will never arrive, so
  these short-circuit immediately to "not available".

Only if every fast path stays silent does the poller run out its budget and report that the
person could not be reached in time. When a transcript is ambiguous rather than absent, a
small model classifies it at zero temperature into the same three buckets, so a mumbled "uh,
yeah, I guess" still resolves to a decision rather than a hang.

## The outcomes we accept

The tool returns one of three things, and the outer agent's behaviour follows from it. If the
person said yes, the agent goes back to the caller and completes the transfer to someone who
already knows who is waiting and why. If the person said no, was on voicemail, did not pick
up, or the line was busy, the caller is told plainly that the person is not available right
now, without being put on a long hold to discover it. If the answer was genuinely unclear, we
treat it as a no rather than guess, because connecting a caller to someone who did not clearly
agree is the exact failure the tool exists to prevent.

The trade-off we accepted is visible and deliberate. A warm transfer costs a real side-call
and a short wait that a blind transfer does not. In return the receiving person is never
ambushed, the caller is never dumped onto someone who cannot talk, and nobody spends thirty
seconds on hold to find out the person was in a meeting. For the businesses running this, the
person on the other end of the transfer is usually the owner or a senior member of staff
whose time and attention are the scarce resource. Asking first, in seconds, is the whole
point.
