Spamscatter: Characterizing Internet Scam Hosting Infrastructure

David S. Anderson, Chris Fleizach, Stefan Savage and Geoffrey M. Voelker
March 21, 2007

Few Internet security issues have attained the universal public recognition or contempt of unsolicited bulk email -- SPAM. The engine that drives this enormous activity is not spam itself -- which is simply a means to an end -- but the various money-making ``scams'' (legal or illegal) that extract value from Internet users. In this paper, we focus on the Internet infrastructure used to host and support such scams. Unlike mail-relays or bots, scam infrastructure is directly implicated in the spam profit cycle and thus considerably rarer and more valuable. Our goal is to measure and analyze this scam infrastructure to better understand the dynamics and business pressures exerted on spammers. To identify scam infrastructure, we employ an opportunistic technique called spamscatter. The underlying principal is that each scam is, by necessity, identified in the link structure of associated spams. To this end, we have built a system that mines email, identifies URLs in real time and follows such links to their eventual destination server. We further identify individual scams by clustering scam servers whose rendered Web pages are graphically similar using a technique called image shingling. Using the spamscatter technique on a large real-time spam feed (roughly 150,000 per day) we identify and analyze over 2,000 distinct scams hosted across more than 7,000 distinct servers.

How to view this document

The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego,

[ Search ]

This server operates at UCSD Computer Science and Engineering.
Send email to