A New Direction in Tree Based Search Engine Architectures Using Balanced Single Port Memories

Florin Baboescu and Dean Tullsen
CS2004-0799
October 15, 2004

This paper examines the microarchitecture of a novel network search processor which provides both high execution throughput and balanced memory distribution. Pipelined forwarding engines are used in core routers to meet speed demands. Most algorithmic-based solutions for these engines use tree based search structures. The tree traversal is pipelined across a number of stages to achieve high throughput. Prior work has shown that the pipelining of these trees results in unevenly distributed memory. To address this imbalance, conventional approaches use either complex dynamic memory allocation schemes (which dramatically increase hardware complexity) or over provision each of the pipeline stages (which results in memory waste). This paper has three primary contributions: i) a novel logical pipeline architecture in which search operations can start execution at {\em any} stage, ii) a new allocation algorithm that leverages this degree of freedom to eliminate memory imbalance and thus memory waste, iii) a practical implementation of our logical pipeline which eliminates non-neighbor communication and guarantees in-order completion without using a dedicated task scheduler. The implementation also minimizes interconnect complexity by having searches enter and exit the pipeline through one location (while still allowing the search to begin at any stage). We validate our new scheme by implementing and simulating state of the art solutions for IPv4 lookup, VPN forwarding and packet classification. In our simulation we use both real life and synthetically generated routing tables and classifiers. We show that our new pipeline scheme and memory allocator can provide searches with a memory allocation efficiency that is within 1\% of non-pipelined schemes. This allows us to obtain a forwarding rate of 1 packet every 6 ns using memories with 2 ns cycle time, with a constant latency of 48 ns and near-perfect memory efficiency.


How to view this document


The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, techreports@cs.ucsd.edu.


[ Search ]


NCSTRL
This server operates at UCSD Computer Science and Engineering.
Send email to webmaster@cs.ucsd.edu