A Tour Through Zfone

This review of Zfone is intended for readers who would like to take a look at Zfone but are too busy to test it at the moment. The visual aids used in this article may also help readers to grasp some of the concepts (such as key continuity) behind Zfone. The complete specification of ZRTP, the key exchange protocol used by Zfone, can be found in this Internet draft. Zfone is available for Windows XP, Mac OS X and Linux, and can be downloaded for free from its official homepage.


Installing Zfone should take only a couple of minutes. You may encounter a number of warning messages presented by Windows XP along the way, but they can be ignored. Despite its name, Zfone is not a stand-alone softphone, but rather, a “bump in the cord” (as described in its homepage) that encrypts RTP packets generated by a softphone. There is not much you need to do to get Zfone up and running, assuming that your softphone works properly prior to the Zfone installation and that Zfone is launched before the softphone.

After installation, Zfone would automatically launch itself and sit in the system tray. Zfone also installs a ZRTP driver that can be verified by opening the property menu of any LAN card installed in the system. Zfone also automatically checks with a designated server to see if there is any new update available. Since I used an isolated network as my test bed, Zfone complained “Can’t connect to libzrtp server” at the bottom of the GUI.


Figure 1: Zfone control panel

The Zfone GUI looks very clean (Figure 1). In fact, what’s conspicuous about the GUI is its lack of any configuration menu. Basically there are only three things you can do when the system is idle: check for new version, read help and exit.

Experiment Setup

In order to experiment Zfone all by myself, I set up a very simple test bed as shown in Figure 2. I used a SIP server implemented by Brekeke, and X-Lite to work with Zfone on hosts dubbed Alice and Bob. I also set up a host dubbed Eve to sniff packets using Ethereal. All four machines are connected using a non-switching hub.


Figure 2: Test bed

Zfone in Action

One revolutionary feature of Zfone is that its key exchange is performed over RTP packet streams without any reliance on signaling and PKI. This approach allows users to encrypt their phone conversation in a pure peer-to-peer manner without any support from their VoIP service providers. It also allows Zfone to easily work with existing softphones and solves interoperability issues one may encounter with cross-carrier authentication.

When Alice calls Bob, the control panel of Zfone would pop up automatically on both ends. It seems that Zfone detects SIP signaling packets used to set up a call. Initially, all RTP packets are exchanged in plaintext and the Zfone GUI would display “not secure” in red color. Zfone uses ephemeral Diffie-Hellman as its key exchange algorithm. The time it takes to complete a ZRTP handshake in which DH values are exchanged seems to vary (ranging from 1 to 20 seconds when I tested it with different softphones). After a successful handshake, all RTP packets are encrypted as SRTP and the GUI would display “secure” in green color. Figure 3 provides an overview.


Figure 3: ZRTP Overview

Figure 4 shows packets captured by Eve with Ethereal. To extract ZRTP packets within the RTP stream, I looked for packets with “Payload type = comfort noise”. The packets are exchanged as described in the Internet draft.


Figure 4: Packet capture (click to enlarge)

Once Zfone is activated, there are a few more things you can do with the GUI. You can turn off encryption by pressing “Go Clear” and turn it on again by pressing “Go Secure”. You can also edit the name of the other party. There is also a little check box labeled “verified”, which will be explained later. From an RTP steam captured by the host Eve, I tried to decode it into an audio file using Ethereal and confirmed that the conversation between Alice and Bob was encrypted (nothing but loud static noise).

Man-in-the-middle Attack

Another important feature of Zfone is its ability to detect man-in-the-middle (MITM) attacks, even though it does not use a PKI. First, let’s have a brief review of the Diffie-Hellman algorithm in Figure 5.


Figure 5: Diffie-Hellman

In short, Alice and Bob have to generate A and B respectively and exchange them in order to come up with a shared secret SAB to be used to create a session key for encryption. A MITM attack is possible (Figure 6) if Eve can somehow intercept packet that carries the value of B from Bob, replace it with E that she generated by herself and send it to Alice. If Alice is not aware that the packet from Bob has been tampered with, she would accept E and create a secret value SAE, believing that the value is shared with Bob while in fact is shared with Eve. If Eve is successful in deceiving Bob in the same manner, she can relay all packets exchanged between Alice and Bob and wiretap the conversation. The conversation appears encrypted to both Alice and Bob, but is in fact decrypted (and encrypted again) by Eve in the middle.


Figure 6: A MITM Attack

To prevent this from happening, Alice has to confirm with Bob that the value she received is in fact generated by Bob, and vice versa. Fortunately, the nature of VoIP provides an easy way to do it. Since they are already on the phone, Alice can simply read aloud a hash (short authentication string or SAS) of the value that she think she got from Bob and verify it with him. Bob then would do just the same. If there is a mismatch, Alice and Bob can be pretty sure that someone else is listening. Verification in such verbal and analog manner is very effective since it is extremely difficult for Eve to replace Alice’s and Bob’s voices in real time.

To make the GUI even cleaner, Zfone hashes both A and B together and presents only one SAS to the user (instead of hashing them individually and presents two SAS, as in the early version). This is somewhat simplified in Figure 7.


Figure 7: Short authentication string

Key Continuity

Although comparing SAS verbally at the beginning of each call can be fun at first, users may start to find it troublesome after a few calls. To combat with laziness, Zfone adds a convenient feature called key continuity. Preferably, users should verify SAS when they call someone for the first time. The shared session key used in that first call is then cached on both hosts. Next time when Alice calls Bob again (or the other way), the new session key would be a combination of the newly calculated shared secret and the session key used in the previous call. This neat trick frees both users the trouble to verify SAS in every call. The concept is simplified in Figure 8.


Figure 8: Key continuity

The following scenario illustrates an attempted MITM attack on this scheme (Figure 9). Eve steps in after Alice has called Bob twice without being intercepted. Eve is able to create a new shared secret with Alice. However, in order to encrypt RTP, Eve has to combine this shared secret with the session key used by Alice and Bob in the previous call. Since Eve does not have it, Zfone would warn Alice that something is wrong as shown in Figure 10.


Figure 9: Another MITM attack


Figure 10: Warning Message

Despite the warning message, Zfone still allows both parties to speak over an encrypted channel. This is because Bob may someday actually lose his cache of session key (due to an OS problem etc) and make him look suspicious. If both parties verify the SAS on the spot and check the box labeled “verified”, they are able to establish key continuity from scratch again.

What happens if Alice and Bob forget to verify SAS in the first call? Let’s say that all of a sudden they remember to do so in the 11th call. If both SAS match, they can rest assured that all previous 10 calls were safe. However, if there is a mismatch (and they had never seen any warning message as in Figure 11), it implies that someone has been wiretapping their conversation since the first call and has cached all previous secrets shared with Alice and with Bob separately. In other words, SAS displayed to Alice and Bob had never matched and the anomaly had gone unnoticed in all previous calls.

In order to manage cached secrets shared with various parties, each Zfone is randomly assigned a ZID at installation. Every time Alice finishes a call session with Bob, the shared secret associated with Bob’s ZID is cached in Alice’s machine, and vice versa. Just for the fun of it, I installed Zfone into a VMware virtual machine, created a clone of it and had one call the other. In other words, I tried to let two Zfones with identical ZIDs communicate with each other. As expected, both Zfones generated error dialogs as shown in Figure 11. In the real world, I think such collision is not likely to happen since each ZID has 96 bits and is randomly generated. However, there seems to be no mechanism to guarantee that each ZID is in indeed globally unique.


Figure 11: ZID Error

Final Remarks

Zfone is very user-friendly since it hides most of the encryption mechanism from its users. Its independence from PKI and signaling makes the technology very accessible to individuals. Zfone, being a “bump in the cord”, also allows its users to keep their favorite SIP softphones without switching to an unfamiliar one. Moreover, because only the end users are involved in the key management, the service provider does not have access to any of the keys. Eavesdropping on Zfone users seems extremely difficult as the attacker would have to be present since the first call, able to forge verbal SAS verification in real time, and preferably, able to imitate voices.

I hope this article has been helpful to you. If you have any question, any insight, or any correction to make to this article, please feel free to leave your comments.

My homepage is here.