|
White Paper by BSI
The Signalling System #7 (SS7) network
is a meta-network: a network about a network. Its only
purpose is to enable traffic on another network, the telephone
network, to move efficiently and flexibly. Among its many
functions, it provides the means for translating telephone
numbers, for locating mobile users, and for two switches
to negotiate how they will handle a call that needs to
transit their common trunks. The SS7 message traffic generated
by this last, negotiating, function is called "ISUP" traffic
(short for "ISDN User Part"), and it provides
an astonishingly detailed and inexpensive source of information
for the traffic engineering, planning, regulatory affairs,
accounting and marketing departments in a telephone company.
This white paper describes how to
capture that data, how to analyze it, and some of the benefits
gained by one Telco when it decided to get serious about
SS7. We concentrate here on ISUP traffic, because that
is quite enough to cover in one white paper of reasonable
length. We provide practical examples of the problems encountered
and solutions we have implemented at BSI. Together, these
solutions constitute what we call Total Network Knowledge.
A meta-network metaphor
Let's branch off into a metaphor for a moment. Suppose you wanted to collect
information about air traffic in your area. You could enroll a team of
observers and send them off to every airport to make notes about departures,
arrivals, aircraft types, operating airlines, numbers of passengers or
type of cargo. You would have to bring all the information together and
collate it, making sure that one observer's departure record and another's
arrival record referred to the same flight, deal with different ways in
which observers recorded and so on. Alternatively, you could record the
data traffic going through the Air Traffic Control (ATC) message switch,
which would provide all the information you needed (and more) about air
traffic, in a convenient format from one location. You'd probably ask ATC
for their permission before you did this, of course.
The same holds for a telephone network:
if you wanted to know about inter-switch traffic you could
generate a Call Detail Record (CDR) on every switch for
every inbound and outbound call, and then collect the data
from all the switches. Alternatively, you can collect the
SS7 ISUP messages that all pass through two, or rarely
more, SS7 message switches, known as Signal Transfer Points
(STPs).
The messages in the medium
What can ISUP traffic tell you about telephone traffic? A lot more, it turns
out, than you can get from the CDRs generated by a switch:
- Every inter-switch call attempt, whether
successful or not, generates ISUP traffic; switches normally
generate CDRs only for calls that are completed and answered.
- Calls both into and out of the switch
generate ISUP traffic; it is very hard to get most switches
to record all inbound traffic.
- Because events within calls generate
ISUP messages which pass through a fast data network,
the timings of the messages accurately reflect the timings
of events within calls; a switch normally records only
those events significant to billing, to the accuracy
required for billing.
- Much of the history of the call -
whether or not it was forwarded, for example, and why;
whether the originating or terminating numbers were ported
- is recorded in the ISUP messages; if information isn't
used for billing, a switch will seldom record it.
To sum up: switches aren't very good
at recording information about calls unless the call is
going to generate a line on a bill somewhere; ISUP traffic
has to carry all the information about a call that enables
a switch to handle it appropriately. Actually, if you try
to configure a switch to record all the information available
from the ISUP traffic you can easily bring it to its knees
(we know - we tried). In contrast, recording SS7 traffic
is normally entirely non-intrusive, and doesn't impact
your network's ability to switch traffic.
The devil is in the details
If you're interested in studying inter-switch traffic, why bother with all
this detail? After all, most switches will give you summaries of the amount
of traffic into and out of different trunk groups over periods of 15 minutes,
or an hour, or however you configure them to record the information. They'll
also provide details on how many calls were answered, or encountered a
busy, or a network blockage. They'll tell you how much traffic was long-distance,
or local, or ISDN, or POTS or any number of summaries and statistics. Why
go to the extra trouble of finding out more?
Well, traffic summaries and statistics
were quite sufficient when the voice network was just that
- a network carrying voice conversations between people.
That, after all, is what the telephone network was designed
to do and it does it very well. Unfortunately, that isn't
what it is doing any longer. The traffic mix now includes:
- Data traffic - most notably to the
Internet, but also to access points on corporate and
academic networks, and to point-of-sale (POS) authorization
systems.
- Fax traffic - sometimes concentrated
from a few high-usage points of origin (fax broadcasters)
or destination (financial institutions clustered around
a business district).
- Pager traffic - short calls sending
numeric messages to paging servers.
- Voice mail traffic - both recording
and retrieving messages, the former function often causing
calls to be forwarded and the latter function showing
marked daily peaks.
- Chatline traffic - growing rapidly
in urban areas, and an increasing source of congestion
on some switches.
- Mass calling events - a big prize
phone-in competition run by a radio station can bring
a network to a halt, as can the offer of tickets for
a popular sporting or entertainment event.
- Voice traffic - a decreasingly small
proportion in many areas, and changing in characteristics
as long distance (LD) rates continue to plummet.
If you want to engineer, or adapt, your
network to deal with all these types of traffic you have
to be able to identify which calls fall into which category:
only then can you begin to understand where and when they
originate and terminate, what the average holding times
are, what routes they are taking to their destinations,
how much of the network's resources each call consumes
and so on.
This mix of traffic is currently characteristic
of the wireline network - wireless networks still carry
largely voice calls, averaging 2.15 minutes, with traditional
peaks over the working day. But the wireless carriers should
not wallow in the luxury of enjoying their wireline competitors'
discomfort: if they are as successful at deploying wireless
local loops as they hope to be, they will face exactly
the same problems. It's a short-lived schadenfreude.
In summary, besides the traditional measures
of traffic - how much of it, when, from where and to where
- we need to add further important dimension: purpose.
What was the subscriber trying to achieve by the call?
Then we can ask, and expect a detailed reply to, the next
question: how well did the network respond to the subscriber's
need? Trying to tease answers to these questions from traditional
traffic statistics is nigh impossible. You need to look
at every single call, and in detail.
Casting your net over the network
Before using all the information in ISUP traffic, of course, you have to record
it, and the recording, or monitoring, system has to deal with a number
of problems. In the first place, every call results in several ISUP messages
being exchanged between switches, and these have to be collated to form
a single record that we'll call a Call Completion Record (CCR), to distinguish
it from CDR (an acronym normally applied to records generated by switches).
Since the constituent ISUP messages of a CCR usually travel over different
physical links, getting all the parts together is no mean juggling feat,
particularly since it has to be done in real-time.
The next requirement is to be able to
design your own CCR. You will seldom want all the information
available from the ISUP messages. For a study on voice
mail and call forwarding, for example, you'll need to know
about forwarding numbers; for a study on number portability
you'll need the location routing number (LRN). If you can't
design your own CCR, you'll either be collecting far more
data than you need (about which more anon) or you'll be
frustrated by not being able to get the data that you do
need.
Your monitoring system must be able to
deal with very large volumes of messages, and to produce
a CCR for every call attempt between the switches you're
studying. In the case of switches that handle both local
and long-distance traffic, this is usually a large multiple
of the number of CDRs recorded by the switches. When you're
trying to understand traffic patterns it is of no use whatever
to have only a sample of calls recorded.
The issue of volume needs to be stressed,
because it has important implications on how you handle
the CCRs once they have been collected, as discussed in
detail below. On a typical business day, one 120,000-line
urban central office we've studied, with a mix of residential
and business traffic, originates 1,200,000 inter-switch
calls and terminates a further 900,000. Clearly, if you
are monitoring several busy switches, your monitoring system
will have to handle a huge amount of data. Incidentally,
that switch only recorded some 35,000 CDRs on the same
day.
You'll want the CCRs generated in real
time, so that you can study events as they happen. It doesn't
help much to learn that a switch had a routing error three
days ago: you need to be able to generate alarms immediately
something goes wrong. Actually, if you're monitoring the
SS7 network for fraud prevention, you may need to generate
real-time records when calls start, as well as a CCR when
they finish.
Doing as little as possible
Once you've installed a monitoring system generating all these CCRs, you'll
need somewhere to put them, and the obvious answer is to store them in
a Relational Database Management System (RDBMS). Unfortunately, this often
turns out to be the wrong answer as well. We have found that modern RDBMSs
usually do so much more than we need to do with CCRs, that they can't do
much effectively with CCRs.
The problem here is one of volume. Taking
the data from the switch quoted earlier, 2,100,000 messages,
each 200 to 250 bytes in length, results in about half
a gigabyte of data to be handled per day. Multiply that
by the number of switches in a typical study - between
4 and twelve for urban areas - and the daily problems of
data handling become dominant.
An RDBMS, among other functions, will
support a large variety of data types, will protect the
integrity of the tables in a database with careful commitment
and rollback processes for adds and updates, will provide
table- page- or record-level locking, will provide complex
and flexible schemes for total and incremental backup and
recovery, and will implement multiple levels of security
and access control. All this, and more, is essential to
support an accounting, or stock control, or enterprise
resource planning (ERP) system, but it all gets in the
way if your needs are restricted to the rapid conversion
and loading of large numbers of CCRs, and the ability to
query and row scan them.
So much does it get in the way, in fact,
that we found that it could take several days to convert
and load the CCRs collected in just one day - and that
was before undertaking any analysis. The amount of processing
power needed to convert and load the data in a timely manner,
and the concomitant hardware costs, and operating system
and RDBMS licence fees that went with that processing power,
were prohibitive. There had to be a better way, and it
turned out to be the careful application of minimization.
There aren't many different data
types in a CCR: telephone numbers, switch and circuit identifiers,
times, dates and various indicators. All of this information
is largely numeric or logical, and a careful examination
of the data types enabled us to build a very rapid conversion
tool, TCX, which converts CCRs in to a highly compact
and efficient storage format we call BCBRF ("BSI
CCR Binary Record Format").
Having stored the data, of course,
we needed to query it, extract records from it, provide
summaries of it and use it to generate statistics about
different types of traffic (traffic with different purposes)
over various routes at various times. Maintaining the principle
of doing as little as possible, we built a specialized
query and analysis tool, Tau.Seven, designed solely
for handling SS7 data stored in BCBRF.
This processing environment, the
major components of Total Network Knowledge, we built on
a Windows NT and Intel platform, bought by
mail order from one of the larger PC manufacturers. They
are powerful systems, to be sure, configured with as much
processor power and memory and the fastest hard discs we
can find, but they are otherwise standard systems with
neither exotic nor proprietary hardware.
We chose this platform partly for the
variety of development tools available to us, and because
our customers would have a wide choice of platform options,
without a large burden of licence fees. Our customers can
scale the platform up or down according to the size of
the network being monitored, the amount of historical data
kept, or the number of concurrent studies being undertaken.
Of equal importance is the fact that most customers will
already be supporting similar platforms in their companies,
so there is no additional support burden or extensive training
to undertake.
Pointing and clicking
These tools weren't built in a vacuum, of course. They were built because we
were trying to solve some real and pressing problems in a real network.
Starting in mid-1997 we began a series of detailed and extensive studies
on the BC Tel network in the province of British Columbia (BC) in Canada.
BC Tel (now BC Tel.Telus) is the Incumbent Local Exchange Carrier (ILEC)
in BC, and it was starting to experience the problems that many ILECs across
North America were experiencing: very large increases in local traffic,
with long holding times, at times of day that did not coincide with traditional
peaks. Worse, some of these increases did coincide with traditional peaks,
and the trunk network was becoming overburdened.
There was a lot of pointing and clicking.
The clicking was the sound of modems going off-hook, and
the pointing was of fingers at the probable source of the
problem: the Internet. Where exactly to point the finger
was the problem, however. Finding the ISP modem banks to
which all the traffic was heading was harder than expected
- there was no central list of access numbers. Finding
where the traffic was originating was also difficult: many
trunks were becoming full and it wasn't clear whether this
was traffic following first-, second- or third-choice routes,
traversing one or several trunks on its way to its destination.
As it turns out, ISP traffic has certain
distinguishing characteristics that allowed us identify
the modem banks by collecting hundreds of millions of CCRs
and subjecting them to rigorous statistical analyses (interestingly,
personal chatlines (sex chatlines, most of them) have almost
exactly the same traffic profile, but that's another story).
Once we knew where the traffic was going, we were able
to find out where it was coming from, by keying our searches
on the ISP access numbers. We were thus able to build up
a comprehensive picture of the patterns of ISP traffic
in time and space.
The numbers were, at the time, shocking.
Some routes, at full capacity, were carrying more Internet
traffic than voice. Some bedroom communities were generating
volumes of Internet traffic that showed no correlation
with known demographic data. The detailed studies we undertook
enabled BC Tel to re-engineer parts of its trunk network
with extremely accurate data about the characteristics
of traffic it needed to carry, and with very good projections
for its growth. That, however, was the short-term solution.
The longer-term solution was to design
a network of Virtual Points-of-Presence (VPOPs), connected
directly to the ISPs on an Asynchronous Transfer Mode (ATM)
network, getting the ISP traffic off the voice network
as close to its point of origin as possible. We were able
to identify exactly where the VPOPs should be, and make
estimates of their required capacity, by studying in details
exactly which traffic originated where, where it was headed,
and which routes it took.
Solving this problem alone allowed BC
Tel to reduce its network capital costs by 17%, a saving
of some $8 million.
Bouncing around the network
It isn't just the Internet that causes
problems. The network isn't routing traffic the way it was
designed to, because traffic isn't behaving itself any more.
It used to be that a call which encountered a busy condition
or was unanswered was terminated fairly rapidly by the caller. An increasingly
large proportion of calls don't do this. They are forwarded to another
number, or to a voice mail system, and the forwarding often takes them
to another central office - in some cases more than one. Every "leg" of
a call between switches generates a separate CCR, and by finding all the
CCRs for one call we can find out exactly which route a call took from
origin to destination. The routes can often be surprisingly complex, and
very inefficient.
There are two corollaries to this,
fairly recent, traffic behavior. One is that a regular
POTS call is likely to use more of the network's resources
(more switches and more miles of trunk, and for longer
periods) than in the past. The second is that understanding
the routing tables in your end offices and tandems doesn't
help you predict how traffic will flow in your network.
Routing tables predict how traffic is switched, but it
is the forwarding numbers, and the forwarding conditions,
that subscribers choose, that determine how traffic will "bounce" around
the network.
As voicemail traffic constituted an increasing
proportion of the traffic mix, there was increasing concern
that the routes calls were taking to voicemail servers
were convoluted and inefficient - the servers were connected
to the wrong switches. By studying in detail where voicemail
traffic was originating, and when, we were able to model
a number of network options, measuring in each case the
amount of network resources used by the voicemail calls.
Whilst it is never possible (it is mathematically impossible)
to know whether any network design is optimal, we were
able to produce a design that reduced the use of network
resources to an acceptable level.
Balancing the books
It usually isn't hard to work out how much revenue a product generates. A Telco's
accounting and billing systems will generally produce accurate and timely
information about revenue from subscription services and from pay-per-use
services. What is harder is to work out how much a product costs. Some
elements of its cost may be easy to obtain: the marketing expenditure;
the cost of hardware; software licence fees. More difficult to estimate,
however, is the product's consumption of network resources.
Before we start we need some idea
of what network resources may be "consumed" by a
call, and how we can measure the consumption. We may not
get an absolute answer in terms of dollars and cents, but
we should at least be in a position to say something like "ISP
traffic consumed 35% of all network resources used; LD
consumed 15%; local POTS calls 32%; voicemail 7%
." The
resources we can measure are the use of switches and the
use of the trunks that connect them.
We start by assuming that our measure
is time-based: a short call consumes fewer resources than
a long call. For switch consumption, we multiply the duration
of the call by the number of switches it occupied, to give
us "switch minutes". For trunk consumption, we
multiply the number of miles of trunks the call used (based
on V&H (Vertical and Horizontal) coordinate calculations)
to give "trunk mile-minutes". We then decide
on the cost ratio between a switch minute and a trunk mile-minute,
and we have the basis for comparative costing. This is
only possible because the CCRs we have collected reflect
all inter-switch traffic during the study period, and the
detail in the records allows us to assign calls to different
product categories (more than one, if necessary).
Value in the network
There isn't space to describe all the voice network problems for which we've
found SS7 data to provide a highly cost-effective diagnostic tool. In the
last eighteen months we've solved numerous and varied problems, and in
each case it has been the detail and comprehensiveness of SS7 traffic that
has allowed us to do so. There are other application areas - fraud prevention
and inter-carrier billing are two that stand out - for which SS7 traffic
is already being used. Telcos have long recognized the value inherent in
their voice networks, and are now beginning to recognize the value in voice's
meta-network.
This White Paper was written by BSI.
Direct all comments to BSI directly:
BSI
220-3665 Kingsway Vancouver
BC V5R 5W2 Canada
Tel: +1 (604) 713-6400
Fax: +1 (604) 713-6449
url: www.bottingsystems.com
email: rodmcintoshshand@bottingsystems.com
|