Anomaly Detection Studies in the IP Backbone Tao Ye Sprint Burlingame, CA 2007-09-19.

Post on 27-Dec-2015

215 views 0 download

Transcript of Anomaly Detection Studies in the IP Backbone Tao Ye Sprint Burlingame, CA 2007-09-19.

Anomaly Detection Studies in the IP Backbone

Tao Ye

Sprint

Burlingame, CA

2007-09-19

2

24: “Stop that packet at the router!”

• Detected an anomaly

• Specify and activate a new ACL

• At OC-192 routers? @500Kpps? In 26μs?

Anomaly Detection at the IP Backbone

3

Outline

• Tier-1 backbone: an overview

• TAPS: connectionless port scan detection and tracking on the backbone

• Scaling up: sampling and anomaly detection

4

Today’s Tier-1 Backbones

• Topology – high speed routers in points-of-presence (POPs) connected by long-haul fiber > numerous small POPs (e.g., UUNet)> relatively few large POP (e.g., Sprint)

• Technologies > IP over SONET (POS)> IP over ATM (phasing out)> MPLS, VPN tunnel

• Common Engineering Practice> failure protection implemented at IP layer> “over-provisioned” core

5

What we (Research group @ Sprint ) do

• Measurement: Collect a lot of data from the Internet backbone, understand the current state

• Monitoring: Use of measurement to detect events of (operational) interest

• Hardware> CMON Monitoring boxes in the field, @5 POPs> Storage (30T) and analysis platform at the lab> Website for sharing results

• Algorithms and Software tools> Continuous monitoring> Anomaly detection> Active measurement

• Other: > Wireless

• Paging attacks• Fairness implementations• TCP over wireless

6

Outline

• Measurement and Monitoring at a tier-1 backbone: an overview from the industry perspective

• TAPS: Connectionless port scan detection and tracking on the backbone

• Scaling up: sampling and anomaly detection

7

Motivation and Challenges

• Our goals> Detect and track> Understand long term behavior of scanners> On the backbone network

• Why Backbone ? > Detection: Existing work most at stub networks, limited

visibility > Tracking: Honeypots can be evaded> More scanning activities visible at core> Peering point unique vantage point

• Challenges> Backbone traffic unidirectional, asymmetric> High speed (OC-48, OC-192) links, needs fast algorithm> Diverse traffic mix, needs efficient data structure

8

Intuition: Access Patterns

9

TAPS: Time-based Access Pattern Sequential hypothesis testing

• Based on 5-tuple flow summary on unidirectional link

• Scanner suspects: source IPs accesses IP/port (or port/IP) ratio > k in time-bin

• Sequential Hypothesis Testing

1

1

0

1

1

0

[ 1 | ] IPif

[ 1 | ] Port

[ 0 | ] IPif

[ 0 | ] Port

i

i

P Y Hk

P Y HiP Y H

kP Y H

10

TAPS

( ) 1Y

1( )Y

0( )Y

Threshold for tagging source as scanner

Increment when IP/port > K

Decrement when IP/port < K

Threshold for tagging source as benignScanner if i 1

Benign if i 0

> <

SrcIP

11

Performance: TCP

Detection Algo Comparison

90.20%

81.70%

35.10%

9.80%12.10%

45.60%

25.70%

74.30%

64.90%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Success False Positive False Negative

per

cen

tag

es

TAPS

TRWSYN

SNORT

12

Online Implementation Architecture

• Use CMON to produce flows in NetFlow5

• Flow Daemon distributes flows

• Keep flows in circular buffer

CMON Flow Collector

Flow

Daemon

Core

App Handler

TAPS Other

Disk Writer

Disk Reader

Circular Buffer

Disk

Flow Daemon

13

Detector and Tracker Architecture

14

Design choices: Approximation Counters

• Issues: > Need to keep the fan-out count

for each IP> Heap implementation has

prohibitively high memory requirements

• Probabilistic Counters: > Many recently proposed

counters: • Small SRAM Implementation:

Multi-resolution bitmap, trigger bitmap

> Simple Flajolet-Martin counter

• FM counter performance> 8 hash functions accurate

enough for <>k test> 256, 32 and 8 hash functions

15

Results

• Data set> OC48 Peering link incoming, ~320Mbps, 22 days> OC48 Peering link outgoing, ~560Mbps, 3 days

16

Scanner Duration

22 days 3 days

17

Scanner Rate

18

Number of Scanner Detected (1)

• Time series of Number of scanners detected (3days)

19

Scanning Ports

• Port accessed

20

Conclusion

• Online Scan Detection and Tracking> Targets unidirectional backbone link> Detector: Time-based Access Pattern Sequential

Hypothesis (TAPS)• Combines rate limiting with statistical tests on destination IP

and port access patterns> Implementation design: Queue model and FM counter

• Scanner Behavior> 90-10 split of scanning rate, scanning duration behavior> Spike in number of scanners detected

21

Outline

• Tier-1 backbone: an overview

• TAPS: connectionless port scan detection on the backbone

• Scaling up: sampling and anomaly detection

22

Motivation

• Sampling to reduce processing overhead in traffic monitoring

• Sampled data used in:> Traffic Engineering -- computing traffic matrices> Inferring flow statistics from sampled data (Duffield03,

Hohn03)

• Anomaly Detection (DDoS attacks, worm scans):

Does sampled data contain sufficient information for effective anomaly detection?

• The brief answer … it depends> On sampling method> On sampling rate

• The impact of sampling> Number of anomalies detected: decreased> False positives: increased

23

Methodology

Anomaly Detection Module

Traffic

traces

Anomaly Detection Module

Sampling Module

Results

Results

compare

24

Anomalies and Detection Algorithms

Type of Anomaly Detection Algorithms

Volume Anomaly :

DoS attacks, flash crowds

1. Wavelet-based change detection [Barford02]

Port Scanning:

Worm/virus propergation

2. Threshold Random Walk [Jung04]

3. Access Pattern: TAPS [Sridharan06]

Anomaly Detection Module

Traffic

tracesAnomaly Detection

ModuleSampling

Module

Results

Results

compare

25

Sampling Methods

• Random packet sampling: each packet sampled with probability r < 1

> Simple implementation (good for busy routers)> Widely deployed (Cisco NetFlow)> Flow statistics hard to recover

• Random flow sampling: classify flows, each flows sampled with probability p < 1

> High resource requirement> Accurate estimation of flow statistics

Anomaly Detection Module

Traffic

tracesAnomaly Detection

ModuleSampling

Module

Results

Results

compare

26

Sampling (continue)

• Designer flow sampling: for catching heavy-hitters> Smart Sampling [Duffield02] – flow records selected with a

probability

> Sample-and-Hold [Estan02]:

Each byte of a packet sampled with a small probability h. All the following packets in the flow will be sampled once the a packet in the flow gets sampled.

27

Comparing Sampling Algorithms

• How to compare: normalizing CPU load, or memory consumption

• Our choice – the percentage of flows sampled> Input to the anomaly detection based on flows,> Number of flows translates to memory consumption.

• Example of sampling parameter settings:

28

Impact of Sampling on Volume Anomaly Detection (1)• Wavelet-base change

detection on flow rate

• Decomposition

• Re-synthesize into three bands

•High ~ 1sec

•Mid ~ 1min

•Low ~ 15min

• Detection on high/mid

•Sliding window

•Deviation score

29

Impact of Sampling on Volume Anomaly Detection (2)

• Original detection: 21

• False negatives > Random flow sampling introduces more local variance

> Random packet sampling introduces even more variance> Smart sampling and sample-and-hold flatten the time

series

30

Impact of Sampling on Port Scan Detection

• Performance Metrics Definition> Success Ratio Rs= Num True Scanners Detected / Num True Scanners> False Positive Ratio Rf+= Num False Scanners Detected / Num True Scanners

• Rs => effectiveness, Rf+= errors

• Ground truth: True scanner set examined by hand.

31

TRWSYN results

32

TAPS results

• Flow count reduction – false negatives

• Flow shortening – false positives shoot up in random packet sampling.> A multi-packet TCP flow shrunk to a single SYN-packet flow> The result: scanners and benign hosts are statistically

indistinguishable.

33

Conclusion

• Implications of Our Results:> Random flow sampling is generally robust to both volume

anomaly and port scan detections.> Random packet sampling is oblivious to any underlying

traffic features, and causes information loss and distortion which degrade the performance of anomaly detection algorithms.

• Smart sampling and sample-and-hold target heavy-hitters, thus not quite suitable for anomaly detections.

• Ongoing work: > Design anomaly detection algorithms robust to sampling,> Design new anomaly-detection-friendly sampling methods.

34

The End!

• Tier-1 backbone: an overview

• TAPS: Connectionless port scan detection on the backbone and scanner profiling

• Sampling data is not NOT sufficient for anomaly detection purposes

http://research.sprintlabs.com

35

A Backbone POP

Peer

Core Router

Other POPs

Edge Router