What are the essential elements of a human conversation that a VoIP system would capture and convey to be ideal ?
To consider this, let us imagine a conversation between three or more people.
What do each of these people know ?
The list would certainly include the following elements:
- physical presence, including turning away and leaving
- focus, when any person turns to one, a few or all to speak
- visual cueing including pointing, nods of agreement, objections, interest, and lack of interest
- displays of valuable emotion
- content of words spoken
What else might be added ?
Let’s say the converation continues and the subject of authority comes up. The list might then extend to:
- identity beyond physical presence, voice and appearance
- authority as offered by voice or proved by other factors
- policy as for example by custom or rule for the type of meeting
Again let’s ask: what else might be added ?
After considering further, let’s now let’s imagine that the conversation ends and that you have have been invited to diagram it.
Maybe you choose to show it as a storyboard of transaction diagrams. Maybe you see a better way to draw it.
Is there a sensible way of classifying the quality of a conversation as it departs from the ideal ?
Now let’s turn this on its head and ask what happens if we augment human conversation and improve what we have been calling the ideal.
The point is that a VoIP system, or at least a VoIP client, can be classified according to the complexity of the expression that it conveys and this is either equal to, less than or better than face-to-face converation.
So parity with the PSTN is still undershooting what people expect when they meet and certainly less than what is possible if you have faith that computing can improve conversation beyond human vision and speech.
There is no one VoIP performance target. It’s a diagram with curves.