Cisco Tips and Tricks: Traffic Analysis

Showing posts with label Traffic Analysis. Show all posts

Tuesday, 9 February 2010

Embedded Packet Capture via Routers.

Traditionally, getting packet level information above and beyond the protocol headers from Netflow meant one of two things. Either on the wire packet capturing via Wireshark or enabling SPAN on a local switch and picking up the traffic that way.

Fortunately in IOS12.4(20)T and beyond Cisco have now extended that feature to routers meaning you can now locally or remotely packet capture any data as long as you have a router in line. Furthermore any captures taken can be exported from the router, opened in Wireshark and analysed.

An example of when this would be useful is if you have a user with an issue on a particular protocol. I've seen this recently where via Netflow and SNMP we've identified the site with load, drilled into the data, found the traffic flow and noted by comparison it was pulling 5 times more data than it should suggested protocol issues. Using packet capture we can then drill for protocol error codes and functions via Wireshark to further determine what is causing the issues.

It is worth noting at this early stage that this process does use quite a chunk of processor and memory overhead as the packet capture happens in memory and in effect is copying each packet to either then be dumped onto flash or exported.

Below is how you effectively setup a capture on a router.

You will need to enable a Capture Buffer and set a Capture Point in order for the data to be processed. It is also recommended to use a Named access list as a filter criteria for the data you are looking to capture.

Setting the Capture buffer

monitor capture buffer buffer-name [circular | clear | export export-location | filter access-list {ip-access-list | ip-expanded-list | access-list-name} | limit {allow-nth-pak nth-packet | duration seconds | packet-count total-packets | packets-per-sec packets} | linear | max-size element-size | size buffer-size [max-size element-size]]
Example:

Router# monitor capture buffer pktrace1 size 58 max-size 256 circular

Setting up the Capture Point
monitor capture point {ip | ipv6}{cef capture-point-name interface-name interface-type {both | in | out} | process-switched capture-point-name {both | from-us | in | out}}
Example:

Router# monitor capture point ip cef ipceffa0/1
fastEthernet 0/1 both

Link the Capture Point and the Buffer

monitor capture point associate capture-point-name capture-buffer-name
Example:

Router# monitor capture point associate
ipceffa0/1 pktrace1

If you haven't already now would be the time to setup the access list you are filtering on. At the very least a source and destination IP conversation from information already gleaned via flow would be a good starting point. Or a fixed destination protocol. (all the way up to a source interface/vlan/subnet).

Starting and Stopping the capture

Starting the capture

monitor capture point start {capture-point-name | all}
Example:

Router# monitor capture point start ipceffa0/1

Stopping the capture

monitor capture point stop {capture-point-name | all}
Example:

Router# monitor capture point stop ipceffa0/1

Exporting the data set

monitor capture buffer buffer-name export export-location

Router# monitor capture buffer pktrace1 export tftp://88.1.88.9/pktrace1

Although there are some options for reading the data via the router this is really not recommended as it can be like trying to read the Matrix and can really put you off.

I'll try and expand on this article with some videos when I get them up and running (I'm considering my provider options for the few I've already done).

Saturday, 6 February 2010

Understanding Serial Interface output

Serial0 is up
Indicates whether the interface hardware is currently active (whether carrier detect is present) or if it has been taken down by an administrator.

line protocol is up
Indicates whether the software processes that handle the line protocol consider the line usable (that is, whether keepalives are successful) or if it has been taken down by an administrator.

Hardware is HD64570
Specifies the hardware type.

Internet address is x.x.x.x/x
Specifies the Internet address and subnet mask.

MTU xxxx bytes
Maximum transmission unit of the interface.

BW xxxx Kbit
Indicates the value of the bandwidth parameter that has been configured for the interface (in kilobits per second). The bandwidth parameter is used to compute IGRP metrics only. If the interface is attached to a serial line with a line speed that does not match the default (1536 or 1544 for T1 and 56 for a standard synchronous serial line), use the bandwidth command to specify the correct line speed for this serial line.

DLY xxxxx usec
Delay of the interface in microseconds.

reliablility 255/255
Reliability of the interface as a fraction of 255 (255/255 is 100% reliability), calculated as an exponential average over 5 minutes by default.

txload 1/255, rxload 1/255
Load on the interface as a fraction of 255 (255/255 is completely saturated), calculated as an exponential average over 5 minutes by default.

Encapsulation HDLC
Encapsulation method assigned to interface.

loopback not set
Indicates whether loopback is set or not.

Keepalive set (10 sec)
Indicates whether keepalives are set or not.

Last input 00:00:00
Number of hours, minutes, and seconds since the last packet was successfully received by an interface. Useful for knowing when a dead interface failed.

output 00:00:00
Number of hours, minutes, and seconds since the last packet was successfully transmitted by an interface.

output hang never
Number of hours, minutes, and seconds (or never) since the interface was last reset because of a transmission that took too long. When the number of hours in any of the "last" fields exceeds 24 hours, the number of days and hours is printed. If that field overflows, asterisks are printed.

Input queue: 0/75/0 (size/max/drops); Total output drops: 0
Number of packets in output and input queues. Each number is followed by a slash, the maximum size of the queue, and the number of packets dropped due to a full queue.

Queueing strategy: weighted fair
Weighted fair queuing strategy (other queueing strategies you might see are priority-list, custom-list, and fifo).

Output queue: 0/1000/64/0 (size/max total/threshold/drops)
Number of packets in output and input queues. Each number is followed by a slash, the maximum size of the queue, and the number of packets dropped due to a full queue.
Congestive-discard threshold. Number of messages in the queue after which new messages for high-bandwidth conversations are dropped.

5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
Average number of bits and packets received/transmitted per second in the last 5 minutes (the default setting)

The 5-minute input and output rates should be used only as an approximation of traffic per second during a given 5-minute period. These rates are exponentially weighted averages with a time constant of 5 minutes. A period of four time constants must pass before the average will be within two percent of the instantaneous rate of a uniform stream of traffic over that period.

packets input
Total number of error-free packets received by the system.

bytes
Total number of bytes, including data and MAC encapsulation, in the error-free packets received by the system.

no buffer
Number of received packets discarded because there was no buffer space in the main system. Compare with ignored count. Broadcast storms on Ethernet networks and bursts of noise on serial lines are often responsible for no input buffer events.

Received x broadcasts
Total number of broadcast or multicast packets received by the interface.

runts
Number of packets that are discarded because they are smaller than the medium's minimum packet size.

giants
Number of packets that are discarded because they exceed the medium's maximum packet size.

throttles
Number of times the receiver on the port was disabled, possibly due to buffer or processor overload.

input errors
Total number of no buffer, runts, giants, CRCs, frame, overrun, ignored, and abort counts. Other input-related errors can also increment the count, so that this sum might not balance with the other counts.

CRC
Cyclic redundancy checksum generated by the originating station or far-end device does not match the checksum calculated from the data received. On a serial link, CRCs usually indicate noise, gain hits, or other transmission problems on the data link.

frame
Number of packets received incorrectly having a CRC error and a noninteger number of octets. On a serial line, this is usually the result of noise or other transmission problems.

overrun
Number of times the serial receiver hardware was unable to hand received data to a hardware buffer because the input rate exceeded the receiver's ability to handle the data.

ignored
Number of received packets ignored by the interface because the interface hardware ran low on internal buffers. Broadcast storms and bursts of noise can cause the ignored count to be increased.

abort
Illegal sequence of one bits on a serial interface. This usually indicates a clocking problem between the serial interface and the data link equipment.

packets output
Total number of messages transmitted by the system.

bytes
Total number of bytes, including data and MAC encapsulation, transmitted by the system.

underruns
Number of times that the transmitter has been running faster than the router can handle. This might never be reported on some interfaces.

output errors
Sum of all errors that prevented the final transmission of datagrams out of the interface being examined. Note that this might not balance with the sum of the enumerated output errors, as some datagrams can have more than one error, and others can have errors that do not fall into any of the specifically tabulated categories.

collisions
Number of messages retransmitted due to an Ethernet collision. This usually is the result of an overextended LAN (Ethernet or transceiver cable too long, more than two repeaters between stations, or too many cascaded multiport transceivers). Some collisions are normal. However, if your collision rate climbs to around 4 or 5%, you should consider verifying that there is no faulty equipment on the segment and/or moving some existing stations to a new segment. A packet that collides is counted only once in output packets.

interface resets
Number of times an interface has been completely reset. This can happen if packets queued for transmission were not sent within several seconds' time. On a serial line, this can be caused by a malfunctioning modem that is not supplying the transmit clock signal, or by a cable problem. If the system notices that the carrier detect line of a serial interface is up, but the line protocol is down, it periodically resets the interface in an effort to restart it. Interface resets can also occur when an interface is looped back or shut down.

output buffer failures, output buffers swapped out
Number of failed buffers and number of buffers swapped out (packets swapped to DRAM).

carrier transitions
Number of times the carrier detect signal of a serial interface has changed state. For example, if data carrier detect (DCD) goes down and comes up, the carrier transition counter will increment two times. Indicates modem or line problems if the carrier detect line is changing state often.

Friday, 5 February 2010

Enabling Flow Top Talkers

Following on from my traffic analysis introduction, one of the tools listed (and in my opinion one of the most useful) is Flow Top Talkers.

This tool allows live , on the fly analysis of traffic passing through the router and is a very useful tool in trying to find out things like :

What is eating up all my bandwidth?
Is anyone using protocol (x)?
What is user/IP (n) doing right now?
Can I use service (x) currently?

Effectively what this tool does is leverage the native Netflow data caching in the routers memory with a CLI output to save all that tedious export and setting up of servers/DB's to gather the data which for must short term or low level queries is a little over the top.

It won't give you the long term data support a proper flow export can but to be honest you'd only use it for on the fly troubleshooting.

First things first, Enable CEF on your router if you haven't already (and if it's capable of it, why haven't you to be frank!).

Next, enable the caching of the flow data on whichever interface(s) you are looking to gather data from. This can be a single interface in a single direction or every interface on the device depending on the scope of your query and the traffic flow direction.

This is done as follows :

conf t
interface (x)
ip flow ingress
ip flow egress
ip route-cache flow

*** PLEASE NOTE ****
Where (x) is the interface name (i.e. fastethernet0/0)
the flow can be ingress only, egress only or both depending on your needs. ip route-cache flow is a legacy command that effectively combines flow ingress+egress and should only be used when the above are not supported.

Once the flow caching is enabled, you need to setup Flow Top Talkers. It has its own sub-menu which can be reached via.

conf t
ip flow-top-talkers

Then from the sub menu there are a number of options :

cache-timeout Configure cache timeout
default Set a command to its defaults
exit Exit from top talkers configuration mode
match Configure match criteria
no Negate a command or set its defaults
sort-by Configure top talker sort criteria
top Configure number of top talkers

In order to setup the Top Talkers correctly you will need to specify a minimum of

top (x)
match (x)
sort-by (x)

The context sensitive help is always good and for this instance but an example config would be:

TLAN-MAIN-1(config-flow-top-talkers)#top 20
TLAN-MAIN-1(config-flow-top-talkers)#match destination port 25 25
TLAN-MAIN-1(config-flow-top-talkers)#sort-by bytes
TLAN-MAIN-1(config-flow-top-talkers)#

The above config would log all SMTP traffic traversing the interfaces flow was enabled on giving you source and destination addresses and how much transferred.

It would list the top 20 conversations and stack the table by bytes transferred.

You can review the results with the exec level command :

show ip flow top-talkers

The output table puts the ports in HEX format rather the DEC (to save space i guess) so you'll have to break out the HEX to DEC converter.

Output example below (annotated for security)

ROUTER#sh ip flow top-talkers

SrcIf SrcIPaddress DstIf DstIPaddress Pr SrcP DstP Bytes
Gi0/1 **.**.**.** Gi0/0* **.**.**.** 01 0000 0303 646K
Gi0/1 **.**.**.** Gi0/0 **.**.**.** 01 0000 0303 646K
Gi0/1 **.**.**.** Gi0/0* **.**.**.** 06 0D3D 2130 627K
Gi0/1 **.**.**.** Gi0/0 **.**.**.** 06 0D3D 2130 627K
Gi0/1 **.**.**.** Local **.**.**.** 01 0000 0303 21K
(there is more but you get the idea)
20 of 20 top talkers shown. 21 of 42 flows matched.

ROUTER#

How to analyse traffic via the CLI.

Depending on the situation and the router/IOS version there are a number of ways to analyse traffic passing through a Cisco router. I'll list those methods below with some of the Pros and Cons and then run individual articles on how to configure each part.

IP Accounting.

In the old days this was one way to look at the conversations at Layer 3 on the router. Effectively what this does is during the recursive lookup (on older routers) or via the packet inspection for the CEF switching (on newer routers)and it collects the source and destination IP address. You can then query the router for the table of conversations during the period of capture to see the bytes/packets passing per conversation.

Pros - Can be done on just about any router. Simple to enable. Doesn't require export or specialist config/mibs. Can be done per interface.

Cons - Can be very resource intensive, particularly on older routers, Doesn't show what the traffic is, the table is not always easy to extract, the output is purely cumulative so does not always give the data you would expect for short term bursts.

NBAR

NBAR (or Network Based Application Recognition to give it its full title) is a method for analysing protocol level data on a router. It is in many respects the predecessor to the more powerful netflow and here as a result, is not as powerful.

NBAR looks at the protocol level data but only recognises a set of well known applications. There is a basic application set with any IOS version which can be added to by downloading PDLM's from the Cisco site and uploading it to the router flash.

The NBAR won't tell you the hosts having the conversation but will just give you an aggregated table of the recognised traffic passing through it and an "unknown" category.

Pros - Can be run on a lot of routers, including many non Flow capable routers. Can be updated via PDLM. Can be used with other functions (like QoS). Can be queried via SNMP GETs.

Cons - Doesn't tell you which node is passing the traffic, doesn't cover full protocol suite. Data is cumulative so doesn't show flows well.

Netflow

By capturing the top 30 bytes or so of each Layer 3 packet of the router Netflow is able to construct traffic flow tables using the Source IP, Destination IP, Source Port, Destination Port, Protocol and in some versions as far as URL's and similar.

The router constructs tables of unique flows and cumulatively builds traffic amounts for the flows which can then be aggregated and exported via UDP or queried directly on the router.

Most people choose to export the data and store it in a database for simple or complex queries as the newer versions can collect a vast amount of data which builds up very quickly (into gigabytes).

However many people don't realise it is possible to do queries based on short term captures from the router itself using a service called Flow Top Talkers.

Pros - Can capture very detailed information traffic. Can be exported and stored. Can be leveraged for a number of applications.

Cons Data can build up very quickly when exporting, uses bandwidth to export, can use a lot of system resources in older versions or if improperly configured.

CEF Traffic Statistics

Although CEF is not as new a feature as it was, it is still new to a lot of people. CEF creates a layer 3 behaviour on routing more akin to switching to improve packet transit through devices and is an important precursor in things like MPLS. I will run a series of articles on CEF as leveraging it requires some knowledge but it can be a very useful tool.

SPAN/Packet Capture

Tools like Netflow are incredibly useful but once you've identified the traffic flowing over a network and resolved any underlying network issues you can still have situations where traffic sessions don't work or you need a deeper packet inspection.

Historically the logical choice would be to run Wireshark or similar near the source or destination device depending on the issue to get a full packet breakdown and use deep inspection to see whats going on "on the wire". Using this tool you can see protocol error codes, packet corruption, retransmits etc.

If you cannot get directly inline , it is possible to use a process called SPAN to duplicate traffic on switchports to an alternative port/vlan/svi for analysis. Whilst this is resource intensive it helps you analyse out-of-band (see article on in-band/out-of-band).

Historically this has been a reserve of the switch but with Hybrid devices (like the 877 SOHO router) or with newer devices/IOS (such as the 2800 series running 12.4(20) or higher) you can now perform Wireshark compatible packet grabs , save to flash and export via the usual methods for analysis.

Pros - This represents the deepest level of packet inspection possible and allows you to gain a complete picture of what is going on in the network.

Cons - It requires specific device/IOS combinations so is not currently widely available. Depending on the capture you need large memory cards or good export paths. It requires good understanding of tools like Wireshark.

Each item above will have a configuration article generated along with expanded articles, so check back for more and look for the tags.

Cisco Tips and Tricks