The Network's Network: Business Intelligence from SS7

 

White Paper by BSI

The Signalling System #7 (SS7) network is a meta-network: a network about a network. Its only purpose is to enable traffic on another network, the telephone network, to move efficiently and flexibly. Among its many functions, it provides the means for translating telephone numbers, for locating mobile users, and for two switches to negotiate how they will handle a call that needs to transit their common trunks. The SS7 message traffic generated by this last, negotiating, function is called "ISUP" traffic (short for "ISDN User Part"), and it provides an astonishingly detailed and inexpensive source of information for the traffic engineering, planning, regulatory affairs, accounting and marketing departments in a telephone company.

This white paper describes how to capture that data, how to analyze it, and some of the benefits gained by one Telco when it decided to get serious about SS7. We concentrate here on ISUP traffic, because that is quite enough to cover in one white paper of reasonable length. We provide practical examples of the problems encountered and solutions we have implemented at BSI. Together, these solutions constitute what we call Total Network Knowledge™.

A meta-network metaphor
Let's branch off into a metaphor for a moment. Suppose you wanted to collect information about air traffic in your area. You could enroll a team of observers and send them off to every airport to make notes about departures, arrivals, aircraft types, operating airlines, numbers of passengers or type of cargo. You would have to bring all the information together and collate it, making sure that one observer's departure record and another's arrival record referred to the same flight, deal with different ways in which observers recorded and so on. Alternatively, you could record the data traffic going through the Air Traffic Control (ATC) message switch, which would provide all the information you needed (and more) about air traffic, in a convenient format from one location. You'd probably ask ATC for their permission before you did this, of course.

The same holds for a telephone network: if you wanted to know about inter-switch traffic you could generate a Call Detail Record (CDR) on every switch for every inbound and outbound call, and then collect the data from all the switches. Alternatively, you can collect the SS7 ISUP messages that all pass through two, or rarely more, SS7 message switches, known as Signal Transfer Points (STPs).

The messages in the medium
What can ISUP traffic tell you about telephone traffic? A lot more, it turns out, than you can get from the CDRs generated by a switch:

  • Every inter-switch call attempt, whether successful or not, generates ISUP traffic; switches normally generate CDRs only for calls that are completed and answered.
  • Calls both into and out of the switch generate ISUP traffic; it is very hard to get most switches to record all inbound traffic.
  • Because events within calls generate ISUP messages which pass through a fast data network, the timings of the messages accurately reflect the timings of events within calls; a switch normally records only those events significant to billing, to the accuracy required for billing.
  • Much of the history of the call - whether or not it was forwarded, for example, and why; whether the originating or terminating numbers were ported - is recorded in the ISUP messages; if information isn't used for billing, a switch will seldom record it.

To sum up: switches aren't very good at recording information about calls unless the call is going to generate a line on a bill somewhere; ISUP traffic has to carry all the information about a call that enables a switch to handle it appropriately. Actually, if you try to configure a switch to record all the information available from the ISUP traffic you can easily bring it to its knees (we know - we tried). In contrast, recording SS7 traffic is normally entirely non-intrusive, and doesn't impact your network's ability to switch traffic.

The devil is in the details
If you're interested in studying inter-switch traffic, why bother with all this detail? After all, most switches will give you summaries of the amount of traffic into and out of different trunk groups over periods of 15 minutes, or an hour, or however you configure them to record the information. They'll also provide details on how many calls were answered, or encountered a busy, or a network blockage. They'll tell you how much traffic was long-distance, or local, or ISDN, or POTS or any number of summaries and statistics. Why go to the extra trouble of finding out more?

Well, traffic summaries and statistics were quite sufficient when the voice network was just that - a network carrying voice conversations between people. That, after all, is what the telephone network was designed to do and it does it very well. Unfortunately, that isn't what it is doing any longer. The traffic mix now includes:

  • Data traffic - most notably to the Internet, but also to access points on corporate and academic networks, and to point-of-sale (POS) authorization systems.
  • Fax traffic - sometimes concentrated from a few high-usage points of origin (fax broadcasters) or destination (financial institutions clustered around a business district).
  • Pager traffic - short calls sending numeric messages to paging servers.
  • Voice mail traffic - both recording and retrieving messages, the former function often causing calls to be forwarded and the latter function showing marked daily peaks.
  • Chatline traffic - growing rapidly in urban areas, and an increasing source of congestion on some switches.
  • Mass calling events - a big prize phone-in competition run by a radio station can bring a network to a halt, as can the offer of tickets for a popular sporting or entertainment event.
  • Voice traffic - a decreasingly small proportion in many areas, and changing in characteristics as long distance (LD) rates continue to plummet.

If you want to engineer, or adapt, your network to deal with all these types of traffic you have to be able to identify which calls fall into which category: only then can you begin to understand where and when they originate and terminate, what the average holding times are, what routes they are taking to their destinations, how much of the network's resources each call consumes and so on.

This mix of traffic is currently characteristic of the wireline network - wireless networks still carry largely voice calls, averaging 2.15 minutes, with traditional peaks over the working day. But the wireless carriers should not wallow in the luxury of enjoying their wireline competitors' discomfort: if they are as successful at deploying wireless local loops as they hope to be, they will face exactly the same problems. It's a short-lived schadenfreude.

In summary, besides the traditional measures of traffic - how much of it, when, from where and to where - we need to add further important dimension: purpose. What was the subscriber trying to achieve by the call? Then we can ask, and expect a detailed reply to, the next question: how well did the network respond to the subscriber's need? Trying to tease answers to these questions from traditional traffic statistics is nigh impossible. You need to look at every single call, and in detail.

Casting your net over the network
Before using all the information in ISUP traffic, of course, you have to record it, and the recording, or monitoring, system has to deal with a number of problems. In the first place, every call results in several ISUP messages being exchanged between switches, and these have to be collated to form a single record that we'll call a Call Completion Record (CCR), to distinguish it from CDR (an acronym normally applied to records generated by switches). Since the constituent ISUP messages of a CCR usually travel over different physical links, getting all the parts together is no mean juggling feat, particularly since it has to be done in real-time.

The next requirement is to be able to design your own CCR. You will seldom want all the information available from the ISUP messages. For a study on voice mail and call forwarding, for example, you'll need to know about forwarding numbers; for a study on number portability you'll need the location routing number (LRN). If you can't design your own CCR, you'll either be collecting far more data than you need (about which more anon) or you'll be frustrated by not being able to get the data that you do need.

Your monitoring system must be able to deal with very large volumes of messages, and to produce a CCR for every call attempt between the switches you're studying. In the case of switches that handle both local and long-distance traffic, this is usually a large multiple of the number of CDRs recorded by the switches. When you're trying to understand traffic patterns it is of no use whatever to have only a sample of calls recorded.

The issue of volume needs to be stressed, because it has important implications on how you handle the CCRs once they have been collected, as discussed in detail below. On a typical business day, one 120,000-line urban central office we've studied, with a mix of residential and business traffic, originates 1,200,000 inter-switch calls and terminates a further 900,000. Clearly, if you are monitoring several busy switches, your monitoring system will have to handle a huge amount of data. Incidentally, that switch only recorded some 35,000 CDRs on the same day.

You'll want the CCRs generated in real time, so that you can study events as they happen. It doesn't help much to learn that a switch had a routing error three days ago: you need to be able to generate alarms immediately something goes wrong. Actually, if you're monitoring the SS7 network for fraud prevention, you may need to generate real-time records when calls start, as well as a CCR when they finish.

Doing as little as possible
Once you've installed a monitoring system generating all these CCRs, you'll need somewhere to put them, and the obvious answer is to store them in a Relational Database Management System (RDBMS). Unfortunately, this often turns out to be the wrong answer as well. We have found that modern RDBMSs usually do so much more than we need to do with CCRs, that they can't do much effectively with CCRs.

The problem here is one of volume. Taking the data from the switch quoted earlier, 2,100,000 messages, each 200 to 250 bytes in length, results in about half a gigabyte of data to be handled per day. Multiply that by the number of switches in a typical study - between 4 and twelve for urban areas - and the daily problems of data handling become dominant.

An RDBMS, among other functions, will support a large variety of data types, will protect the integrity of the tables in a database with careful commitment and rollback processes for adds and updates, will provide table- page- or record-level locking, will provide complex and flexible schemes for total and incremental backup and recovery, and will implement multiple levels of security and access control. All this, and more, is essential to support an accounting, or stock control, or enterprise resource planning (ERP) system, but it all gets in the way if your needs are restricted to the rapid conversion and loading of large numbers of CCRs, and the ability to query and row scan them.

So much does it get in the way, in fact, that we found that it could take several days to convert and load the CCRs collected in just one day - and that was before undertaking any analysis. The amount of processing power needed to convert and load the data in a timely manner, and the concomitant hardware costs, and operating system and RDBMS licence fees that went with that processing power, were prohibitive. There had to be a better way, and it turned out to be the careful application of minimization.

There aren't many different data types in a CCR: telephone numbers, switch and circuit identifiers, times, dates and various indicators. All of this information is largely numeric or logical, and a careful examination of the data types enabled us to build a very rapid conversion tool, TCX™, which converts CCRs in to a highly compact and efficient storage format we call BCBRF™ ("BSI CCR Binary Record Format").

Having stored the data, of course, we needed to query it, extract records from it, provide summaries of it and use it to generate statistics about different types of traffic (traffic with different purposes) over various routes at various times. Maintaining the principle of doing as little as possible, we built a specialized query and analysis tool, Tau.Seven™, designed solely for handling SS7 data stored in BCBRF™.

This processing environment, the major components of Total Network Knowledge™, we built on a Windows NT™ and Intel™ platform, bought by mail order from one of the larger PC manufacturers. They are powerful systems, to be sure, configured with as much processor power and memory and the fastest hard discs we can find, but they are otherwise standard systems with neither exotic nor proprietary hardware.

We chose this platform partly for the variety of development tools available to us, and because our customers would have a wide choice of platform options, without a large burden of licence fees. Our customers can scale the platform up or down according to the size of the network being monitored, the amount of historical data kept, or the number of concurrent studies being undertaken. Of equal importance is the fact that most customers will already be supporting similar platforms in their companies, so there is no additional support burden or extensive training to undertake.

Pointing and clicking
These tools weren't built in a vacuum, of course. They were built because we were trying to solve some real and pressing problems in a real network. Starting in mid-1997 we began a series of detailed and extensive studies on the BC Tel network in the province of British Columbia (BC) in Canada. BC Tel (now BC Tel.Telus) is the Incumbent Local Exchange Carrier (ILEC) in BC, and it was starting to experience the problems that many ILECs across North America were experiencing: very large increases in local traffic, with long holding times, at times of day that did not coincide with traditional peaks. Worse, some of these increases did coincide with traditional peaks, and the trunk network was becoming overburdened.

There was a lot of pointing and clicking. The clicking was the sound of modems going off-hook, and the pointing was of fingers at the probable source of the problem: the Internet. Where exactly to point the finger was the problem, however. Finding the ISP modem banks to which all the traffic was heading was harder than expected - there was no central list of access numbers. Finding where the traffic was originating was also difficult: many trunks were becoming full and it wasn't clear whether this was traffic following first-, second- or third-choice routes, traversing one or several trunks on its way to its destination.

As it turns out, ISP traffic has certain distinguishing characteristics that allowed us identify the modem banks by collecting hundreds of millions of CCRs and subjecting them to rigorous statistical analyses (interestingly, personal chatlines (sex chatlines, most of them) have almost exactly the same traffic profile, but that's another story). Once we knew where the traffic was going, we were able to find out where it was coming from, by keying our searches on the ISP access numbers. We were thus able to build up a comprehensive picture of the patterns of ISP traffic in time and space.

The numbers were, at the time, shocking. Some routes, at full capacity, were carrying more Internet traffic than voice. Some bedroom communities were generating volumes of Internet traffic that showed no correlation with known demographic data. The detailed studies we undertook enabled BC Tel to re-engineer parts of its trunk network with extremely accurate data about the characteristics of traffic it needed to carry, and with very good projections for its growth. That, however, was the short-term solution.

The longer-term solution was to design a network of Virtual Points-of-Presence (VPOPs), connected directly to the ISPs on an Asynchronous Transfer Mode (ATM) network, getting the ISP traffic off the voice network as close to its point of origin as possible. We were able to identify exactly where the VPOPs should be, and make estimates of their required capacity, by studying in details exactly which traffic originated where, where it was headed, and which routes it took.

Solving this problem alone allowed BC Tel to reduce its network capital costs by 17%, a saving of some $8 million.

Bouncing around the network
It isn't just the Internet that causes problems. The network isn't routing traffic the way it was designed to, because traffic isn't behaving itself any more. It used to be that a call which encountered a busy condition or was unanswered was terminated fairly rapidly by the caller. An increasingly large proportion of calls don't do this. They are forwarded to another number, or to a voice mail system, and the forwarding often takes them to another central office - in some cases more than one. Every "leg" of a call between switches generates a separate CCR, and by finding all the CCRs for one call we can find out exactly which route a call took from origin to destination. The routes can often be surprisingly complex, and very inefficient.

There are two corollaries to this, fairly recent, traffic behavior. One is that a regular POTS call is likely to use more of the network's resources (more switches and more miles of trunk, and for longer periods) than in the past. The second is that understanding the routing tables in your end offices and tandems doesn't help you predict how traffic will flow in your network. Routing tables predict how traffic is switched, but it is the forwarding numbers, and the forwarding conditions, that subscribers choose, that determine how traffic will "bounce" around the network.

As voicemail traffic constituted an increasing proportion of the traffic mix, there was increasing concern that the routes calls were taking to voicemail servers were convoluted and inefficient - the servers were connected to the wrong switches. By studying in detail where voicemail traffic was originating, and when, we were able to model a number of network options, measuring in each case the amount of network resources used by the voicemail calls. Whilst it is never possible (it is mathematically impossible) to know whether any network design is optimal, we were able to produce a design that reduced the use of network resources to an acceptable level.

Balancing the books
It usually isn't hard to work out how much revenue a product generates. A Telco's accounting and billing systems will generally produce accurate and timely information about revenue from subscription services and from pay-per-use services. What is harder is to work out how much a product costs. Some elements of its cost may be easy to obtain: the marketing expenditure; the cost of hardware; software licence fees. More difficult to estimate, however, is the product's consumption of network resources.

Before we start we need some idea of what network resources may be "consumed" by a call, and how we can measure the consumption. We may not get an absolute answer in terms of dollars and cents, but we should at least be in a position to say something like "ISP traffic consumed 35% of all network resources used; LD consumed 15%; local POTS calls 32%; voicemail 7%…." The resources we can measure are the use of switches and the use of the trunks that connect them.

We start by assuming that our measure is time-based: a short call consumes fewer resources than a long call. For switch consumption, we multiply the duration of the call by the number of switches it occupied, to give us "switch minutes". For trunk consumption, we multiply the number of miles of trunks the call used (based on V&H (Vertical and Horizontal) coordinate calculations) to give "trunk mile-minutes". We then decide on the cost ratio between a switch minute and a trunk mile-minute, and we have the basis for comparative costing. This is only possible because the CCRs we have collected reflect all inter-switch traffic during the study period, and the detail in the records allows us to assign calls to different product categories (more than one, if necessary).

Value in the network
There isn't space to describe all the voice network problems for which we've found SS7 data to provide a highly cost-effective diagnostic tool. In the last eighteen months we've solved numerous and varied problems, and in each case it has been the detail and comprehensiveness of SS7 traffic that has allowed us to do so. There are other application areas - fraud prevention and inter-carrier billing are two that stand out - for which SS7 traffic is already being used. Telcos have long recognized the value inherent in their voice networks, and are now beginning to recognize the value in voice's meta-network.

This White Paper was written by BSI. Direct all comments to BSI directly:
BSI
220-3665 Kingsway Vancouver
BC V5R 5W2 Canada
Tel: +1 (604) 713-6400
Fax: +1 (604) 713-6449
url: www.bottingsystems.com
email: rodmcintoshshand@bottingsystems.com