Very interesting news: Apple’s new iPhone4 application, Facetime, is a VoIP and IP Video application using SIP signaling and RTP media. Security researcher and SANS Instructor Josh Wright has posted a very interesting and detailed analysis of the Facetime application on Packetstan, a new blog developed by his other SANS and InGuardians colleagues . A couple of quick summary points from Josh’s analysis and a quick look at the phone that we have done thus far:
- iPhone Facetime client doesn’t use SIP REGISTER for authentication
- Uses STUN for NAT Traversal and resolving the remote callee and called party’s IP address of each iPhone
- After the remote party’s IP address has been resolved, SIP INVITE and MESSAGE packets are exchanged directly between iPhone devices
- Cleartext SIP and RTP
- RTP video appears to use the H.264 video codec, and audio appears to use the AAC-LD codec (the same audio codec used in Cisco TelePresence).
- FaceTime uses XMPP to authenticate each iPhone to an Apple Jabber server, using TLS and mutual certificate authentication.
Josh and I were discussing this the other day because he was trying to use the ‘VideoSnarf’ tool in order to re-construct the H.264 encoded media packets. The codec does appear to be H.264, but with some slightly modified reserved fields. Right now this isn’t working, but we hope to have an updated version of VideoSnarf working together with Facetime traffic in the near future.
This is so interesting because it is the first SIP client on a widely deployed consumer Smartphone device developed and supported by a vendor such as Apple. I think it signals that we are going to see more of these applications – these are exciting times. It will be interesting to see how other vendors follow up soon with 2-camera video clients on Smartphones using VoIP protocols, taking the lead from Apple. I am sure many others will be taking a closer look at Facetime, and the attack surface area here for potential exposures are very interesting, as well as the potential security measures that can be applied in order to protect Facetime traffic.