Hank Cohen of Hifn sent along this article taking a look at cryptography in VoIP both for signalling and for voice. He explains the various proposals (including TLS/SSL, IPSec and Datagram TLS) and provides his view of the advantages and disadvantages of each proposal. Here is a brief taste of the longer article:
I believe that we first need to divide the VoIP cryptography problem into two parts; signalling security and media security. The requirements for these two areas are quite distinct so we need to be careful not to lump them together. Signalling connections may be persistent for long intervals but they tend to carry only a few short messages. Furthermore although signalling messages must be delivered in a timely manner they are not real time in the sense that their value degrades if latency or jitter increases, assuming that they are delivered soon enough that a connection can be created in a reasonable amount of time. Media on the other hand has stringent real time constraints. If media packets are not delivered within strict limits of latency and jitter their value can decrease to the point where call quality will be better if they are discarded rather than delivered late.
There is an interesting analogy between signalling security in the PSTN and VoIP. In the PSTN in-band signalling was found to be vulnerable to all sorts of hacks through your namesake BlueBoxes. The final solution to the Phone Phreak problem was for the PSTN carriers to create a completely separate signalling network inaccessible from the media network: thus was SS7 born. In the world of VoIP signalling is inherently in-band but we can use cryptographic VPN technology to build a virtual private signalling network with the same technology that enterprises have been using for years now to build virtual private data networks.
There are three proposals floating around the VoIP world for signalling VPNs; SSL or TLS secured signalling tunnels, IPsec secured signalling tunnels and most recently Datagram TLS secured signalling tunnels. I would like to offer some pros and cons for each method.
Follow the link below to read Hank’s full article and if you have a different view (and I expect some will) on the different proposals, please do feel free to leave a comment to this article.
We thank Hank for providing this article and please do know that we are always open to publishing guest articles such as this. Just contact me or one of the other weblog authors if you would like to have an article appear.
Comments on cryptography and VoIP by Hank Cohen
I just have a few comments on the use of cryptography in VoIP applications. I should start by saying that although I am somewhat familiar with VoIP technology my real expertise is more in the area of cryptography. I am Product Line Manager for lookaside cryptography products at Hifn. Hifn is the leading producer of cryptography ASICS for network security applications. www.hifn.com In the interest of full disclosure I will also note that I am responsible for two products directed specifically at VoIP applications. Finally I will be making some statements about the pros and cons of various cryptographic protocols but Hifn is completely neutral on these issues. Our processors do IPsec, SSL/TLS/DTLS and SRTP/SRTCP with equal facility so we have no vested interest in which solution wins in the market place. We only hope that cryptography is widely adopted.
I believe that we first need to divide the VoIP cryptography problem into two parts; signalling security and media security. The requirements for these two areas are quite distinct so we need to be careful not to lump them together. Signalling connections may be persistent for long intervals but they tend to carry only a few short messages. Furthermore although signalling messages must be delivered in a timely manner they are not real time in the sense that their value degrades if latency or jitter increases, assuming that they are delivered soon enough that a connection can be created in a reasonable amount of time. Media on the other hand has stringent real time constraints. If media packets are not delivered within strict limits of latency and jitter their value can decrease to the point where call quality will be better if they are discarded rather than delivered late.
There is an interesting analogy between signalling security in the PSTN and VoIP. In the PSTN in-band signalling was found to be vulnerable to all sorts of hacks through your namesake BlueBoxes. The final solution to the Phone Phreak problem was for the PSTN carriers to create a completely separate signalling network inaccessible from the media network: thus was SS7 born. In the world of VoIP signalling is inherently in-band but we can use cryptographic VPN technology to build a virtual private signalling network with the same technology that enterprises have been using for years now to build virtual private data networks.
There are three proposals floating around the VoIP world for signalling VPNs; SSL or TLS secured signalling tunnels, IPsec secured signalling tunnels and most recently Datagram TLS secured signalling tunnels. I would like to offer some pros and cons for each method. Let’s start with the question of Layer 3 i.e. IPsec or Layer 4 i.e. Transport Layer Security. The arguments put forth in favor of Layer 4 are two fold, first that since Layer 4 security is generally implemented as a user library it can easily be made portable to systems that do not have security implemented in the kernel or in hardware. This is clearly true but by the same token this portability comes at the cost of being unable to implement the cryptography function in the kernel or using hardware acceleration. I would assert that this kind of portability is of limited interest in embedded systems where both hardware acceleration and kernel implementations are widely available. Furthermore implementing TLS as a user library is largely a matter of historical fiat rather than a result of logical design. There is no reason that a secure socket should not be implemented within the kernel. In fact an in-kernel implementation would enable much more efficient hardware acceleration than can be achieved in a user state library since buffer copying and context switching would be minimized.
The other argument for Layer 4 cryptography is that SSL/TLS comes with an API that allows the application to control its own security policy. This is an important rationale and an important capability. Without a security API the application is dependent on an external network management policy to achieve its security goals. Such a dependence only adds complexity and an additional point of weakness to the overall system security landscape. However just as was the case with the user library implementation of SSL this lack of an API for IPsec is largely a matter of historical accident rather than a logical requirement. Just as one can today specify a TCP or UDP socket there is no reason that we should not be able to specify a UDP socket with a security policy. When the socket was opened IKE could be invoked to establish the necessary security associations to implement the desired policy. Unfortunately application developers cannot wait for systems implementers to give them the necessary APIs so they are left to use the APIs that are available now thus the preference for SSL and TLS.
I would point out that neither of these reasons for using Layer 4 cryptography has anything to do with the fact that security is being
applied at layer 4. In addition to the advantages of SSL/TLS there are some distinct disadvantages as well that do stem from the fact that the cryptography is being applied at Layer 4. SSL and TLS assume a reliable connection oriented transport, in other words TCP. TCP adds significant overhead to the connection and makes recovery in the event of system failures much more difficult. This last point can be critical for VoIP systems. Many VoIP systems establish persistent TLS sessions between a Session Boarder Controller and the subscriber’s gateway or terminal. This TLS tunnel is then used for provisioning and as a secure signalling connection. In the event of a hardware failure the highly stateful nature of both TCP and TLS means that hot swap or even warm start recovery is somewhere between very difficult and impossible. As a result recovery generally requires the TLS session to be re-established from scratch. This means first a TCP connection must be established followed by a TLS handshake. A large SBC equivalent to a central office switch could be handling thousands of subscribers and recovery even in the best of cases might require many minutes to re-establish so many connections. VoIP system architects should consider carefully whether the overhead associated with a reliable connection is really worthwhile. SIP can operate over either a reliable connection oriented or a datagram transport. The small amount of additional logic required to retransmit in case of packet loss is much less than is required for TCP reliable connections. One must ask if the convenience of a reliable transport is really worth the difficulty of recovery from major system failures. It may be that the unreliable transport actually yields a more reliable
system. Finally SSL and TLS are susceptible to DoS attacks against the TCP connection.
A new proposal making the rounds at the IETF holds some promise for SIP security. The Datagram TLS protocol seeks to take the advantages of TLS, portability and an API, and remove the problems associated with TCP. Datagram TLS modifies TLS to allow records to be transported over an unreliable datagram protocol like UDP. By removing the TCP requirement the protocol becomes much more recoverable in the event of a device failure. A redundant fault tolerant system can be constructed to recover the TLS session state up to some checkpoint. Some sequence numbers will probably be lost but the TLS handshaking protocol has a provision to allow sessions to be restarted. This would greatly reduce the amount of time necessary to recover all of those subscriber lines.
Moving on to media security the requirements are quite different. Signalling systems send a few small messages that should be handled
quickly but they are not “real time” in that their value does not degrade quickly over time. Media packets, on the other hand, must be
delivered end to end in within strict latency constraints. Packets that arrive too late are better discarded than delivered since the user
perception of call quality will suffer more from delayed delivery than from a few dropped milliseconds of audio. A media packet becomes
valueless in less than a second whereas a signalling transaction can take several seconds without generating complaints about the quality of service.
Media packets are generally transported using the Real Time Protocol (RTP). RTP and its associated control protocol RTCP are designed to provide timely feedback on the latency and jitter of a media connection. RTP enables the endpoints to adjust transmission parameters such as packet length and frequency to adapt to changing line conditions. RTP is always carried by a datagram service, generally UDP; therefore there are no proposals to use SSL or TLS to carry secured RTP traffic. The overhead of the TCP connection adds too much uncertainty and delay and the value of reliable service is negated by untimely delivery in the event of retransmission. So media security must be provided by either Secure RTP (SRTP) or IPsec. The argument in favor of IPsec seems to be ubiquity; IPsec implementations are available for all operating systems and in many network switching elements. Furthermore an IPsec tunnel can be set up to protect traffic end to end without any intervention from the VoIP application. This could allow a user to set up a high security encrypted call even if the VoIP application did not support encryption as a native function.
The downside of IPsec to protect RTP and the advantage of SRTP is that IPsec hides the RTP header information whereas SRTP leaves it visible. If the RTP header is visible in transit as well as at the endpoints it is possible for internal network nodes to use that information to
improve quality of service. For example an MPLS core network might choose to send RTP traffic down a different tunnel than ordinary IP
traffic. Routers could make RTP routing decisions based on link congestion information. Both RTP and IPsec can be prioritized using
diffserv but only RTP carries real time network monitoring information.
I would like to mention one other point that may favor using IPsec over SRTP. There might be significant value to some users in obscuring the fact that a packet belongs to a phone call. RTP or SRTP is almost a dead giveaway as to the media type; IPsec packets on the other hand could be anything. This might be significant to a user who wants to obscure the fact that they are encrypting a phone call. It also might be useful to a third party service provider that wants to hide the fact that it is making phone calls over an untrusted carrier network. (Thus Vonage might not want ATT to be able to “prioritize” their calls.) Finally phone calls are treated differently under the law than other data traffic. CALEA only provides for intercept of phone calls, not of ordinary data traffic such as email or FTP.
I have tried to show some of the advantages and disadvantages of various proposals to secure VoIP applications using cryptography. I have not even touched on that briar patch of issues around key exchange, key management and user and element authentication. There is another hornet’s nest of issues surrounding legaI intercept that are beyond the scope of this posting. I believe that it is in everyone’s interest to see cryptography widely adopted but we all need to be aware that cryptography is no panacea for security. In particular cryptography for data in flight is rather pointless if the network elements are vulnerable to attack. The cryptography that we use is very strong. Some people seem to believe that the NSA has a backdoor into AES but lots of very knowledgeable reviewers haven’t been able to find any evidence of this. Nonetheless strong cryptography does not confer security if the communications endpoints are weak.
Hank Cohen
Hifn Inc.
Hello,
I really like the article in this post. I’m putting a “news” blog together myself using rss feeds and a few articles here and there and I’ll put link back to your blog as a resource.
Thanks for the info…
I found your article very informative. Thanks!
One question: why do you only mention DTLS as a possible candidate for signaling security but not for media security? Is the overhead of DTLS too high for encrypting media?
K,
Thanks for the comment. I’ll have to see if I can ping Hank for a response as he was a guest contributor for only this article back in June 2006. I’m not personally extremely familiar with the DTLS proposal, but my gut reaction may be that it hasn’t been considered for media encryption because pretty much most folks in the industry have at this point standardized on SRTP for encrypting RTP. DTLS is in fact one of the proposed protocols under consideration at the IETF meeting in two weeks in Prague to be used for securing the SRTP key exchange between vendors. But again, that is for signaling and not media encryption.
Thanks,
Dan