[ Plab 2.3-beta2 ]

Copyright (c) 2004-2006 Alberto Dainotti, Antonio Pescap, Alessio Botta
Email: alberto@unina.it, pescape@unina.it, a.botta@unina.it
DIS - Dipartimento di Informatica e Sistemistica (Computer Science Department)
University of Naples "Federico II"
All rights reserved.

Students collaborating to the project: Alessandro De Peppo (depeppo@unina.it)


Plab
====
Plab is a software platform for packet capture and analysis. It is capable to
extract, either from live traffic or from file traces of network traffic, several
measures by decomposing traffic according to different definitions of
sessions: by host, flow, and conversation.
Inside each session Inter Packet Times (IPT) and Packet Sizes (PS) series and 
other statistics are extracted.
All the data is written into textfiles which can be directly loaded into software
for statistical analysis (e.g. Matlab). 
Several features for traffic filtering have been implemented. Moreover, filters in 
tcpdump/bpf syntax can be used.
Plab runs under Linux, FreeBSD, MacOS X. It tries to use as few processing resources
as possible and it is has been tested even with traffic traces of hundreds millions
packets associated to millions of conversations.


* Documentation summary:
README		- this file. Read it first
OUTPUT		- specifications of the output files generated by Plab
EXAMPLES	- some examples of commandline options (more to come..) 


Installation
============
Plab requires the libpcap library (downloadable at http://www.tcpdump.org).
After libpcap is installed, it is possible to compile plab

* LINUX:
	tar zxvf plab.tar.gz
	cd plab/src
	make

* FREEBSD:
	tar zxvf plab.tar.gz
	cd plab/src
	make OSFLAGS=FREEBSD
	
* MACOSX:
	tar zxvf plab.tar.gz
	cd plab/src
	make OSFLAGS=MACOSX


Traffic analysis
================
Traffic traces can be analyzed in 3 different modes: host, flow and
conversation. 

- host mode:	analyze all packets generated from a single host;
		IPT and PS are relative to a single host (inbound and outbound
		traffic are separated, see below). 

- flow mode:	traffic is decomposed into flows. A traffic flow is
		identifed by source host, soruce port, destination
		host, destination port, protocol and the maximum
		allowed interpacket time (default is 60s).
		IPT and PS are relative to a single flow.

- conversation mode:	consider traffic between a client and a
		server, where client is requesting a specific service.
		A conversation is identified by client IP, server IP,
		service and timeout. A conversation is bidirectional, so we
		separately consider traffic from client to server (upstream) 
		and from server to client (downstream).
		IPT and PS are relative to the two directions of a single conversation


Summary of how a session is identified:
 
mode          | parameters 
-----------------------------------------
host          | (Source IP)
flow          | (Source IP, Source Port, Destination IP, Destination Port, L4 Protocol, Timeout)
conversation  | (Client IP, Server IP, L4 Protocol, Service, Timeout)


Moreover, IPT and PS related to the overall aggregated traffic are dumped into a separate file.


Command line options
=====================

Below, all command line options are summarized. See file "EXAMPLES" to find some typical examples
of Plab usage.
Besides such options, it is possible to specify filters in tcpdump/bpf syntax at the end of the command line
(as in tcpdump). This allows to "a priori" discard some packets.


-0          zero out IP addresses in the output file
-a          filter duplicate segments in trace (see Section "trace sanitization")
-C num      skip the first 'num' packets
-c num      stop after 'num' packets
-D 0-6      consider only packets from a specific day of week
-d path     output directory
-f          disable filter
-F path     use BPF file specified in 'path' to filter out packets
-g type     enable worm packet fingerprinting, based on worm characteristics.
		Predefined mark functions are:
		'1' Slammer (default)
		'2' Codered
		'3' Witty
-H path     read host table from file
-h	    print help and exit
-i string   read packets from interface 'string'
-I          don't dump IPT and PS
-k          dump flow or conversation IPs and Ports (only in flow and conversation mode)
-M num      dump flow table to save memory. Table is dumped every (num * 100000) packets read
-m          dump TCP Maximum Segment Size (MSS) optional header 
-P          dump payload contents
-p          disable packet processing
-q mode     type of traffic analysis: by host, flow, or conversation.
            Mode can be:
		'h'	by host		(default)
		'f'	by flow
		'c'	by conversation
-R num      calculate some parameters rates (packet,bytes,flows,sessions) every 'num' seconds
-r path     read from file 'pathtofile'
-s num      set snaplen when capturing traffic
-t num      set flow or conversation timeout (in seconds)
-T string   set a specific time range you want analyze. Time range is a string must be in the form 'hh:mm-HH:MM'
-v num      specify the port number which identifies the server when the analysis is in conversation mode;
	    User should specify the Layer-4 protocol by using the tcpdump/bpf style filters at the end of
	    command line (e.g. for HTTP lunch: "./plab [....] -q c -v 80 [....] tcp").
-W num      dump 'num' bytes from headers
-w path     dump traffic to file 'pathtofile'
-x          skip pure TCP packets (SYN,ACK,etc)
-Z num      set a custom timezone offset


Output
======
See file "OUTPUT" for information about the output files produced by Plab


Signals
=======
Plab ignores the SIGHUP signal, so it is possible to exit from the shell and
leave the program running.
You can stop it with a SIGINT signal.


Fingerprinting
==============
Plab can mark packets according to specific characteristics using "-g type" option,
where "type" is an integer which specifies which marking function you want to use.
There are 3 predefined marking functions in Plab, but you can easly add more if required
(see "for developers" section below).

type = 1 (Slammer): udp AND payload size is 376 byte
type = 2 (CodeRed): tcp AND syn flag is on AND destination port is 80
type = 3 (Witty): udp AND source port is 4000 AND payload size is gretaer than 767 bytes.

Marked packets are reported in "pkts_up" and "pkts_dw" output files.
Flows containing matching packets are marked (in "flow_data" output file).

Everytime a matching packet is sent, a counter for that host is incremeted.
Everytime a non-matching packet arrives to an host, the counter is decremented.
Counter is limited between -20 and 20. Every host with a counter greater than 5
is reported in the "host" file as matched.

A note on Worm traffic:
More identification of worm traffic can be based on the analysis in Plab's "flow_mode". Indeed,
both Witty and Slammer generate one single packet for each scanned host. By selecting flows
with more than one packets is therefore easy to spot non-worm traffic, and viceversa.

For developers:
Parameters related to the counters can be set in "common.h" (see COUNT_MIN , COUNT_MAX , COUNT_LEV). 
You can write a new custom marking function by editing the fingerprint() function in "plab.c".
A list of all Macros to easly access packet header informations is available in "pkt_macros.h".


Trace sanitization
==================

We discovered that in some publicly available traffic traces available to the community there
are some spurious data probably due to hw/sw errors. Among them, there are full chunks of packet
sequences that are duplicated inside the trace. By using the "-a" option
it is possible to remove such duplicates. The checks are based on packets timestamps inconsistencies.

