Category Archives: SIP

VoIP on the iPhone and iPod Touch – a security warning

iVoIP clientsAt first sight, using any VoIP client on the iPhone or the iPod Touch (a.k.a. iDevices) may seem like a uninteresting thing. The reason for this is that Apple does not allow 3rd party applications to run in the background. So when a user close down his iVoIP Client he will not be able to receive any calls at all, thus defeating the reason for using VoIP on these devices in the first place.

However, if we take a look at some of the VoIP clients offerings available we notice that a few of these clients have the ability to receive incoming calls, even when the software it self is not running.

At first sight this seems to be a Good Thing – however, there are severe security implications by doing this. Users will in fact willingly, put them self under a man-in-the-middle attack.

Continue reading

Tricking SIP Endpoints Into Divulging Authentication Credentials

This is a neat trick. By doing a little up-front scanning and/or guesswork, an attacker can send an INVITE directly to a SIP user agent, causing the device to ring.  Then, when the user agent issues the BYE message to hang-up, the attacker can respond with a 407 Proxy authorization required message, causing the endpoint to then respond with it’s authentication credentials, essentially handing them directly to the attacker.

The page linked above indicates that this attack is currently implemented in the VoIP Pack for CANVAS, so it’s essentially packaged and ready to use for you CANVAS users.  You can see a video of this being used in CANVAS here.  I would expect to see this credential-harvesting attack in other exploitation frameworks or stand-alone tools shortly…

“SIP Trunking And Security” workshop coming up at ITEXPO on February 3, 2009

ITEXPO-East-logo-2.jpgIf you will be in Miami at ITEXPO February 2-4 you are welcome to attend a free “SIP Trunking And Security” session I (Dan York) will be doing as part of Ingate Systems’ SIP Trunking Workshops. The SIP trunking workshops are free to all attendees even if you only register for an exhibit pass.

My session will be 11:15-12:30 on Wednesday, February 3rd, and if you do attend please feel free to come up and introduce yourself (or drop me a note in advance to let me know to look out for you). I’ll be bringing my recording gear, too, and the talk will eventually go out in my Blue Box Podcast feed so you will be able to hear it later.

P.S. If you are attending ITEXPO and your company makes a product or provides a service related to VoIP security, please feel free to let me know and perhaps we can schedule an interview to go out as a Blue Box Special Edition.

Technorati Tags:
, , , , , , , ,

Video demo of “sipautohack” tool

Over in the VOIPSEC mailing list, Shawn Merdinger recently pointed out a video produced by the folks at Enable Security to highlight one of their new tools, “sipautohack”, that they sell as part of one of their packages of tools called “VOIPPack”. From their description page, VOIPPack includes:

  • sipscan – Scans the network for SIP devices and identifies the user-agent and if the device is a PBX
  • sipenumerate – Enumerates extensions on a PBX server
  • sipcrack – Launches password attacks on the PBX server
  • sipautohack – Given a target network, this module will scan for SIP devices, enumerate any extensions on all PBX servers found and try to guess their password

This video, then, is a demonstration of the last of the listed tools:

Demonstrating sipautohack from Sandro Gauci on Vimeo.

We here at VOIPSA have no connection to this tool or vendor and cannot say anything positive or negative about the tool or company… it’s just another entry in the very long list of VoIP security tools out there (see our Tools list). I just think it’s great to see video screencasts out there showing what tools like this can do. (And if you have a screencast related to VoIP security out there you’d like us to mention, feel free to contact me.)

Technorati Tags:
, , , ,

5th Emergency Services Workshop to be held Oct 21-23 in Vienna

How does an emergency call to 9-1-1 or 1-1-2 (or whatever your local emergency number may be) work in a world of voice-over-IP?

It’s not a topic we cover hardly at all here on this blog, yet it’s definitely one of the security and social/cultural aspects of our migration to IP that we definitely have to get right. If we as an industry don’t, people can die. (Or the migration to VoIP will be significantly delayed.)

To that end, a number of emergency services experts are meeting to discuss ongoing work on IP-based emergency services in Vienna, Austria on 21st to 23rd October 2008. The first workshop day is focusing on tutorials to help those interested in the classical 1-1-2 (or 9-1-1) emergency call to get up-to-speed with architectures and standards developed for next generation emergency calling. During the second day various recent activities of standardization organizations around the world will be presented. The third workshop day is dedicated to early warning standardization efforts and the outlook to future emergency services activities.

Participation from those working in standardization organizations as well as persons with interest into the subject is highly appreciated. The event is open to the public and anyone may attend.
For socializing an evening program has been organized. There is a nominal fee of 120 Euros charged to cover the facilities cost, food, drinks, etc. Arrangements are also being made for participants to join remotely.

More information about the workshop can be found behind the following link:

This page also points to previous workshops that took place in New York, Washington, Brussels and Atlanta.

(Thanks to Hannes Tschofenig for providing the majority of this text.)

Technorati Tags:
, , , , ,

Slides: SIP Trunking and Security in an Enterprise Network

Earlier this month out at ITEXPO in Los Angeles, I participated in the Ingate SIP Trunking seminars as I have been doing for the last year or so. My talk was “SIP Trunking and Security in an Enterprise Network“. The slides are available for viewing or download from my SlideShare account and I’ll also embed them here in this post.

I did record the presentation in both audio and video and hope to be making that available as a Blue Box podcast some time soon. I’ll then sync the slides to the audio. Meanwhile… enjoy the slides!

Technorati Tags:
, , , , , , , , ,

Mark Collier and SecureLogix release new VoIP security tools

In a message to the VOIPSEC mailing list over the weekend, Mark Collier announced the release of a new suite of VoIP security test tools. Mark, as you may recall, is the co-author with (VOIPSA Chair) David Endler of the book “Hacking Exposed: VoIP” and as part of the book publication he and Dave made available a series of voip security tools through their website.

Now, Mark’s back with a second version of those VoIP security tools. He describes the new tools in one blog post on his VoIP security blog and announces their availability in a second blog post. Here’s his description of new tools:

We also built several new tools:

– Several new flood-based DoS tools, which generate floods using different SIP requests, including byeflood, optionsflood, regflood, and subflood. The regflood tool is certainly the most potent of the group.

– dirsniff and dirsortmerge – a passive scanner that builds a directory of valid SIP phone addresses. By using the dirsortmerge tool, you can manage results from this tool, as well as output from the dirscan active scanner.

– Call Monitor and sipsniffer – this tool provides a GUI that shows active SIP calls. The tool allows you to select a call and terminate it (via teardown) or insert/mix in audio (via rtpinsertsound or rtpmixsound). The tool allows you to define up to 10 sound files, that can be inserted/mixed in on command. The tool also streams the call audio to the XMMS player, so you can listen in and “time” when you affect the call.

The Call Monitor tool is particularly interesting. It makes using the rtpinsertsound/rtpmixsound tools a lot easier and more effective. It makes real audio manipulation possible.

Interestingly, the tools are not being made available through but rather directly from SecureLogix’s web site, where you have to register first to download the tools.

Mark also provides a PowerPoint presentation about the “Call Monitor” tool he mentions here. He’d mentioned this tool to me once before when we met at one of the conferences…. basically it provides a “point-and-click” interface to allow you to inject or mix in new audio into existing audio streams. Making it this easy is definitely a scary prospect (and another good argument for why you should be using SRTP to encrypt audio streams).

Anyway, the new tools are now out there if you want to try them out. (Joining the long list of existing VoIP security tools.)

Technorati Tags:
, , , , , ,

Webinar on SIP Security on Thurs, Sept 11, by Audiocodes and Interactive Intelligence

Many of you may have received this in your email inbox – Audiocodes and Interactive Intelligence are jointly sponsoring a TMCnet webinar on Thursday, September 11, 2008, at 12noon US Eastern time called “Do You Know Who is Listening? – The Truth of Enterprise SIP Security The abstract is here:

Session Initiation Protocol (SIP) has emerged as the predominant protocol for VoIP deployments. While SIP is gaining headway in the IP communications market, any new technology brings with it some inherent security challenges. In this webinar, we discuss these challenges, the misconceptions surrounding SIP Security, and examine the tools available to counter them. This session will also explore robust solutions that not only tackle security threats, but also empower businesses to proactively protect their networks from current and future attacks. Included in this webinar, we will examine the Interactive Intelligence suite of products as a communications platform case study that empowers businesses to tackle security threats while maintaining affordability and performance.

Obviously it is a vendor presentation with the associated perspective, but for those wishing to attend, you can register online.

[VOIPSA is a vendor-neutral organization and we do not endorse or recommend solutions from any particular vendors. However, as our interest is in elevating the level of discussion about VoIP security issues in general, we are glad to post notices here about upcoming vendor presentations.]

Technorati Tags:
, , , , , ,

How Aircell is (probably) blocking VoIP phone calls on planes (hint… VoIP Whack-A-Mole)

aircell-gogo-logo.jpgHow is Aircell blocking VoIP phone calls from systems like Skype, SightSpeed and Gizmo? (And how did Andy get through with Phweet?)

Ever since last week’s announcement of the “Gogo Inflight Internet Service” provided to American Airlines by a company called Aircell – and the ensuing coverage in the blogosphere – I’ve been getting asked about how exactly Aircell is blocking VoIP calls. Especially after Andy Abramson was able to make a call using Phweet. Aircell is very clear in the Gogo terms of service that no voice calls are allowed:

No Voice Applications. You will not use any type of voice application (including, without limitation, voice over Internet protocol) without written permission from Aircell.

And early users of the system who tried VoIP calls reported that indeed after about 5 seconds or so, their VoIP conversation was terminated. Repeatedly. They could use Skype, for instance, for IM, but not for voice.

So how is Aircell blocking VoIP? (And how did Andy get through with Phweet?)

Unfortunately, I was a wee bit busy last week when all this was breaking, so it’s taken me until now to come up for air enough to write about how Aircell could be blocking VoIP calls on the planes.


First, I should state up front – I have absolutely NO connection to Aircell, Gogo, American Airlines, etc.. I don’t know exactly how they are blocking VoIP calls but am laying out here how they could be blocking VoIP calls.

[I also feel compelled to say that I personally think it’s silly of Aircell to block VoIP because there will inevitably be people who figure out ways to route around the blocking. I think Aircell’s excuse that they want to block people from talking loudly is also rather lame. I’ve been on long flights where I was trying to sleep and I’ve had two people talking very loudly in the seat behind me or next to me. (And yes, sometimes I’ve asked them to please speak quieter.) Other times I’ve had babies screaming and crying for most of the flight… or other “energetic” children carrying on. No, I don’t really want my neighbor to be in an involved VoIP call but my point is that there are disruptions already. Part of me wonders if there aren’t really more technological issues with doing streaming audio/video, but anyway… that’s their policy and if you want to use their service, you have to agree. You don’t (yet) have a choice in services to use, and you probably won’t.]


Given that Skype conversations in particular are encrypted, I’ve had several people ask me if this means that Aircell can decrypt Skype calls. How else, they ask, can Aircell differentiate between Skype text chat and Skype voice chat? It’s simple really:

Pattern recognition.

A VoIP call in progress has a distinct profile from a network packet point-of-view. In general, the audio streams of VoIP calls (as compared to the call control/signalling channels) have the following characteristics:

  1. there are a zillion small packets
  2. the packets are sent over UDP versus TCP
  3. the packets are sent using the Real-time Transport Protocol (RTP) over UDP

Now with Skype’s encryption, network software can’t know about the contents of the audio stream, so the software can’t know about my #3 here, but the software can recognize the pattern based on the first two. For other non-encrypted services, the software can very simply look for RTP streams.

Now, why do VoIP systems use a zillion small packets? Typically, a VoIP system will sample the speaker’s voice at an interval of every 10, 20 or 30 milliseconds. Most I’m familiar with seem to go for a 20ms rate. So 20ms of audio is captured, encoded digitally and sent off in an IP packet. The exact size of the IP packet will vary depending upon what codec is used to encode the audio. The standard G.711 packets will be at one size and G.729 will be much smaller. Many of the VoIP streams I’ve captured in network traces have a total packet length of somewhere between 35-70 bytes.

To put that in perspective, understand that an Ethernet packet can have typically a max size of 1500 bytes. And packets sent by various protocols can be even larger and will be “fragmented” into smaller pieces (for instance, 1500-byte pieces) to be moved across the network. [Network geeks: Please give me some poetic license here… I realize I could be more precise, but I’m trying not to completely bore the readers.]

The point is that voice packets are typically tiny in comparison to other packets – and there are a lot of them.

How many? Well, if you take a 20ms sampling rate, that means that you are sampling the audio voice 50 times each second… so that’s 50 packets per second for one audio stream. Almost every voice conversation involves two audio streams (one from the caller, one from the recipient) and so you are looking at 100 packets per second for a typical two-way VoIP conversation.

The reason for this is relatively simple. In a file transfer, you are looking to move a file across the network as fast as possible – but you aren’t necessarily in “real-time”. So you generally will stuff the packets with as much info as you can and push them across the network. With voice, we are making use of the fact that the human ear will deal with some lost audio, and so we are chopping up the audio up into a zillion tiny pieces, tossing them in unreliable UDP packets and hoping that enough get there so that the listener can make sense of the conversation.

Put another way, if I wanted to send you this blog post via snail-mail, I could print it out, stick it in an envelope and mail it to you. That’s file transfer. On the other hand, if I were to chop this blog post up and write each word on a post card and stick them in the snail-mail to you, odds are that enough would arrive at the other end that you could assemble them into something like this post. (With luck, maybe you could even make the post shorter!)

That’s VoIP.

So once you know the pattern, it’s fairly easy to spot the calls. For instance, where’s the VoIP call in this network trace?


Do you see it? How about this trace with a filter turned on to show UDP in red?


Ta da… there’s your VoIP call!

Let’s look at this one in a bit more detail:


There it is… all in UDP… and coming in at about 100 packets per second. And if I look at the actual Wireshark traces, I can see that these 100 packets per second are all very tiny sizes. Many of them are between 37 and 50 bytes.

And this is an encrypted Skype call!

No need to decrypt it. Just see that it’s a steady stream of 100 very small packets per second (50 packets per second each way) all over UDP.

Kill the stream. Block it. Conversation dead. No more VoIP on the plane.

It’s basically the network security version of Whack-A-Mole. See a VoIP stream start up… block it. See another one… block it. See yet another… block it. Whenever anything pops up that meets the profile, stomp on it.

This explains, too, why people could talk for a few seconds and then had their conversations terminated. The pattern has to appear in the network monitoring software. The software has to be sure it’s a VoIP stream and not something else… and then the software can block it.

Now I don’t know for a fact that this is how Aircell is blocking VoIP, but it would be easy enough to do it this way.


There are, of course, easier ways to kill the conversations. If the VoIP calls use unencrypted audio streams then it’s incredibly trivial to block. Just block the RTP protocol. Period. End-of-story. Now this does involve a hair more packet inspection in that the software has to look farther into the packet headers to see the protocol type, but again this is easy to do. All RTP is typically used for is streaming audio or video… block it and the “problem” goes away.

They could of course go even further into packets and see if they are Session Initiation Protocol (SIP) packets and if so, what the packets are asking to do. If one endpoint sends a SIP INVITE packet to another endpoint and indicates that an audio conversation is to start, the software could again simply block the impending audio stream. (Of course this couldn’t be done if the SIP was encrypted…)

The software could also simply block ports. Block any usage of port 5060 or 5061 and you would probably kill off most “regular” SIP conversations. (Yes, SIP endpoints can make connections on non-standard ports, but the majority of clients probably wouldn’t.) The challenge here is that some SIP endpoints also would use SIP to set up non-audio communication channels like text chat, so blocking all SIP would also block SIP-based text chat which is probably not desirable.

The software could also block on a service level… if it knew, for instance, the host names or IP addresses for the media servers and consumer services (through which the audio would be sent), the software could block all connections to those media servers.

There’s a whole range of additional layers the network monitoring software could use. Any good system will have a “defense in depth” strategy and make use of many of these different algorithms.

Of course, adding on these layers does require more computing power and will undoubtedly add some latency (even on a microscopic level). It may be that for right now they can simply do the pattern recognition approach and shut down VoIP calls.


Okay, so how did Andy make a call using Phweet? Given that this post has already gone on this long, I’ll publish my guess in a subsequent post. The text above should give you enough clues, though… any pattern recognition system is inherently fragile because it depends upon recognizing patterns. So what if the audio packets you are sending don’t match any known patterns? What if (hint) the folks at Aircell forgot to watch all protocols?

Stay tuned for some more network charts in my next post… 🙂

P.S. This is, by the way, why I think that these type of systems trying to block VoIP calls are inherently doomed… someone will inevitably find a way to “cloak” their VoIP calls so that they are unrecognizable or indistinguishable from other data traffic… it’s a cat and mouse game and inevitably people will find ways to get around the watchers…

Technorati Tags:
, , , , , ,

Dan York is Best Practices Chair for the VoIP Security Alliance and writes here and at DisruptiveTelephony. You can follow him on Twitter or

Asterisk “hack” to show blocked Caller-ID points to larger trust issues with SIP

Can Asterisk really be used to “unmask”blocked Caller-ID and show the private number?

Well, yes… but it really has less to do with Asterisk then it does with not respecting the signaling sent to you by a SIP trunking provider. It’s conceivable that any IP-PBX could be configured to allow you to do this… and points to a larger issue with trust boundaries between SIP Service Providers (a.k.a. Internet Telephony Service Providers or ITSPs) and their customers.


Let’s take a step back first and explain… over the weekend FierceVoIP ran a piece about VoIP security talks at the “Last Hope” conference that referenced a demonstration by Kevin Mitnick of how you could use Asterisk to show Caller ID information for someone calling even if the caller’s ID is set to “private”. Someone (“phant0msignal”) recorded a video of the demonstration (and yes, if you listen, the audio cuts in and out) and posted the video to YouTube and the code to his blog. This might have gone somewhat unnoticed except that it got picked up by Engadget, which naturally garnered a good bit of attention. Here’s the video:


So was this really a big “hack” that exposed private information?

Not really… although it may be a clever use of scripting within Asterisk.

Here’s the thing:

Asterisk received this information as a natural part of SIP communication because the SIP Service Provider TRUSTED Asterisk to “do the right thing” and NOT display the information.

Which, normally, would be the case. Asterisk would respect the SIP privacy headers and not display the Caller ID. However, in this case Asterisk was modified to NOT respect the privacy headers and display the information that was requested to be private.

To understand this, we need to look at one of the ways that “Caller ID” is usually handled within the world of SIP communication. RFC 3325 defines a SIP header called “P-Asserted-Identity” that is inserted typically by the first SIP proxy that is interacting with the SIP endpoint. The result, within a trusted administrative domain, is the inclusion of one or more headers that look like:

P-Asserted-Identity: "Dan York" <>
P-Asserted-Identity: tel:+14155551212

The P-Asserted-Identity header, often referred to as P-A-I for short, includes this identity information that can be used by the proxy for the recipient of the call to display “Caller ID” on the recipient’s SIP endpoint (phone, softphone, etc.).

Now, when a call is to be private, there is an additional SIP header included. RFC 3323 defines the “Privacy” SIP header and section 9.3 of RFC 3325 adds an “id” value to the Privacy header. So the resulting SIP headers look like:

P-Asserted-Identity: "Dan York" <>
P-Asserted-Identity: tel:+14155551212
Privacy: id

Per RFC 3325 Section 7, this Privacy header indicates to the SIP proxy that the P-A-I information MUST be stripped off before the SIP headers are sent to an “untrusted” entity. From the RFC:

Parties who wish to request the removal of P-Asserted-Identity header
fields before they are transmitted to an element that is not trusted
may add the “id” privacy token defined in this document to the
Privacy header field. The Privacy header field is defined in [6].
If this token is present, proxies MUST remove all the P-Asserted-
Identity header fields before forwarding messages to elements that
are not trusted.

So the “hack” in this case was that Asterisk’s SIP handling was modified to NOT respect the Privacy header and instead pass along the P-A-I information to, in this case, the endpoint.


The larger problem/issue is really this:

Why did the SIP Service Provider send the P-A-I information down to Asterisk box in the first place?

The answer, of course, is simply this:

The SIP Service Provider assumed that it could trust the SIP server with which it was communicating.

The Service Provider extended its “trust boundary” out to encompass the SIP network of its customers. As far as the Service Provider was concerned, the customer was just another SIP network and should be trusted. The Service Provider did not apparently care whether the customer was another carrier – or just someone running Asterisk on a home system. They were simply glad to provide connectivity to the customer.

The problem is:

The trust boundary of the PSTN was then extended out to the customer system.

and there was an implicit assumption that PSTN privacy requests would be respected.


One obvious reaction is “So the Service Provider shouldn’t send that information to the customer’s SIP server!” Perhaps. Perhaps the Service Provider should not trust any of its customers with that information. (And I Am Not A Lawyer so I don’t know if in this case there are actual legal issues here.)

But I’m not sure it’s that simple.

You see, there’s a bit of a “Wild West” going on right now in the world of SIP trunking. Basically, anyone and their brother, mother, father, sister (and…) can get into the world of providing SIP trunks simply by setting up a SIP server (which could be done with Asterisk) and buying some upstream SIP connectivity from a larger SIP Service Provider… ta da… “ZZZZZ VoIP Services” is born. Simple. Easy.

If you are a larger SIP Service Provider, you will sell to smaller Service Providers and naturally extend your “trust boundary” to them. They will sell to others… and so on… and so on… until some final system is connected to some endpoints.

SIP clouds connected to SIP clouds connected to more SIP clouds.

Where do you appropriately define the “trust boundary”? Is it perhaps the “top tier” SIP Service Providers? Is it “the carriers who run the PSTN”? Should it have been stripped off at a gateway coming in from the PSTN?

We’re building this massive “interconnect” of SIP clouds… and this is just one of the many issues that it is not entirely clear that we have a consensus on. Sure, RFC 3325 defines what should happen on a technical level… but what about on a policy level? Who gets to be part of the “trusted” community? (FYI, I would strongly recommend reading RFC 3325 for a better understanding of the issue.)

In the meantime, it’s fairly safe to assume that if you are “blocking” your Caller ID, there is no actual guarantee that it won’t be seen by the recipient. In the vast majority of cases, sure, that privacy will be respected. But there’s no guarantee.

Welcome to new world of VoIP…

P.S. And yes, if you were reading this and thinking “Gee, so can’t the ‘Caller-ID’ be easily spoofed just by modifying the SIP headers?” you are absolutely right. That’s why there’s a good amount of work going on right now in the IETF around the whole area of “strong identity”… but that’s a topic for another blog post some time…

Technorati Tags:
, , , , , , , , ,