Abstract
A major challenge in digital forensics is the handling of very large amounts of data. Since forensic investigators often have to analyze several terabytes of data in a single case, efficient and effective tools for automatic data identification and filtering are required. A common data identification technique is to match the cryptographic hashes of files with hashes stored in blacklists and whitelists in order to identify contraband and harmless content, respectively. However, blacklists and whitelists are never complete and they miss most of the files encountered in investigations. Also, cryptographic hash matching fails when file content is altered even very slightly. This paper analyzes several distributed systems for their ability to support file content identification. A framework is presented for automated file content identification that searches for file hashes and collects, aggregates and presents the search results. Experiments demonstrate that the framework can provide identifying information for 26% of the test files from their hashed content, helping reduce the workload of forensic investigators.
Chapter PDF
Similar content being viewed by others
References
F. Adelstein and R. Joyce, File Marshal: Automatic extraction of peer-to-peer data, Digital Investigation, vol. 4(S), pp. S43–S48, 2007.
BitTorrent, BitTorrent and μTorrent software surpass 150 million user milestone; announce new consumer electronics partnerships, Press Release, San Francisco, California ( www.bittorrent.com/intl/es/company/about/ces_2012_150m_users ), January 9, 2012.
Cisco Systems, Cisco Visual Networking Index: Forecast and Methodology, White Paper, San Jose, California ( www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf ), 2012.
B. Cohen, Incentives build robustness in BitTorrent, Proceedings of the First International Workshop on the Economics of Peer-to-Peer Systems, 2003.
Dev-Host, The ultimate free file hosting/file sharing service, Los Angeles, California ( d-h.st ).
eMule-MODs.de, Server List for eDonkey and eMule ( www.emule-mods.de/?servermet=show ).
Escape Media Group, Grooveshark, Gainesville, Florida ( www.grooveshark.com ).
IMDb.com, Internet Movie Database, Seattle, Washington ( www.imdb.com ).
Kuiper Forensics, PeerLab – Scanning and evaluation of P2P applications, Mainz, Germany ( www.kuiper.de/index.php/en/peerlab ).
Y. Kulbak and D. Bickson, The eMule Protocol Specification, Technical Report, School of Computer Science and Engineering, Hebrew University of Jerusalem, Jerusalem, Israel, 2005.
P. Maymounkov and D. Mazieres, Kademlia: A peer-to-peer information system based on the XOR metric, Proceedings of the First International Workshop on Peer-to-Peer Systems, pp. 53–65, 2002.
National Institute of Standards and Technology, National Software Reference Library, Gaithersburg, Maryland ( www.nsrl.nist.gov ).
Net Applications, Desktop Search Engine Market Share ( www.netmarketshare.com/search-engine-market-share.aspx?qprd=4&qpcustomd=0 ), October 2012.
SANS Internet Storm Center, Hash Database, SANS Institute, Bethesda, Maryland ( isc.sans.edu/tools/hashsearch.html ).
H. Schulze and K. Mochalski, Internet Study 2008/2009, ipoque, Leipzig, Germany ( www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2008-2009.pdf ), 2009.
M. Steinebach, H. Liu and Y. Yannikos, Forbild: Efficient robust image hashing, Proceedings of the SPIE Conference on Media Watermarking, Security and Forensics, vol. 8303, 2012.
M. Steiner, T. En-Najjary and E. Biersack, A global view of kad, Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement, pp. 117–122, 2007.
Team Cymru, Malware Hash Registry (MHR), Lake Mary, Florida ( www.team-cymru.org/Services/MHR ).
VirusTotal Team, VirusTotal, Malaga, Spain ( www.virustotal.com ).
Yahoo! Flickr, Sunnyvale, California ( www.flickr.com ).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Yannikos, Y., Schluessler, J., Steinebach, M., Winter, C., Graffi, K. (2013). Hash-Based File Content Identification Using Distributed Systems. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics IX. DigitalForensics 2013. IFIP Advances in Information and Communication Technology, vol 410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41148-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-41148-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41147-2
Online ISBN: 978-3-642-41148-9
eBook Packages: Computer ScienceComputer Science (R0)