Unified Summaries for Internet traffic

Cristian Estan
June 15, 2004

Traffic analysis is important to the operation of IP networks. The input to the analysis is raw data such as packet header traces or NetFlow records and the output is often the size aggregates such as the traffic generated by various applications or by individual customers. Storing the raw data allows the flexibility of running arbitrary new analyses in the future, but the sheer amount of raw data is often a challenge. Sampling based techniques such as smart sampling aim at reducing the amount of raw data while preserving the ability of future analyses to accurately estimate the traffic of any large aggregate. There are three important measures of the traffic of an aggregate: the number of bytes, the number of packets and the number of flows. Current data reduction solutions allow estimating only one of these measures. In this paper we propose the idea of unified summaries that allow the analyses to get unbiased estimates for all three measures. Our unified summary that takes as input flow records is based on smart sampling and the one that reads in packet header traces is based on sample and hold. The most important contributions of this paper are the development of novel unbiased statistical estimators for the number of flows, the development of methods for combining summaries measuring bytes and packets using less memory than separate summaries, and experimental evaluation of the proposed solutions based on traces of traffic.

How to view this document

The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, techreports@cs.ucsd.edu.

[ Search ]

This server operates at UCSD Computer Science and Engineering.
Send email to webmaster@cs.ucsd.edu