Enigma 2019 has ended
Wednesday, January 30 • 12:00pm - 12:30pm
When the Magic Wears Off: Flaws in ML for Security Evaluations (and What to Do about It)

Sign up or log in to save this to your schedule and see who's attending!

Academic research on machine learning-based malware classification appears to leave very little room for improvement, boasting F1 performance figures of up to 0.99. Is the problem solved? In this talk, we argue that there is an endemic issue of inflated results due to two pervasive sources of experimental bias: spatial bias, caused by distributions of training and testing data not representative of a real-world deployment, and temporal bias, caused by incorrect splits of training and testing sets (e.g., in cross-validation) leading to impossible configurations. To overcome this issue, we propose a set of space and time constraints for experiment design. Furthermore, we introduce a new metric that summarizes the performance of a classifier over time, i.e., its expected robustness in a real-world setting. Finally, we present an algorithm to tune the performance of a given classifier. We have implemented our solutions in TESSERACT, an open source evaluation framework that allows a fair comparison of malware classifiers in a realistic setting. We used TESSERACT to evaluate two well-known malware classifiers from the literature on a dataset of 129K applications, demonstrating the distortion of results due to experimental bias and showcasing significant improvements from tuning.


Lorenzo Cavallaro

King's College London
Lorenzo Cavallaro is a Full Professor of Computer Science, Chair in Cybersecurity (Systems Security) in the Department of Informatics at King's College London, where he leads the Systems Security Research Lab. He received a combined BSc-MSc (summa cum laudae) in Computer Science from... Read More →

Wednesday January 30, 2019 12:00pm - 12:30pm
Grand Peninsula Ballroom ABCD

Twitter Feed