دانلود کتاب Expert-in-the-Loop Supervised Learning for Computer Security Detection Systems
by Anaël Beaugnon
|
عنوان فارسی: متخصص-in-the-Loop تحت نظارت آموزش کامپیوتر, امنیت سیستم های تشخیص |
دانلود کتاب
جزییات کتاب
The standard supervised learning pipeline consists of data annotation, feature extraction, train- ing and evaluation. Security experts must carry out all these steps to set up supervised detection models ready for deployment. In this thesis, we adopt an end-to-end approach. We work on the whole machine learning pipeline with security experts as its core since it is crucial to pursue real-world impact.
First of all, security experts may have little knowledge about machine learning. They may therefore have difficulty taking full advantage of this data analysis technique in their detection systems.
This thesis provides methodological guidance to help security experts build supervised detection models that suit their operational constraints. Moreover, we design and implement DIADEM, an interactive visualization tool that helps security experts apply the methodology set out. DIADEM deals with the machine learning machinery to let security experts focus mainly on detection.
Besides, most research works assume that a representative annotated dataset is available for training while such datasets are particularly expensive to build in computer security. Active learning has been introduced to reduce expert effort in annotation projects. However, it usually focuses on minimizing only the number of manual annotations, while security experts would rather minimize the overall time spent annotating. Moreover, user experience is often overlooked while active learning is an interactive procedure that should ensure a good expert-model interaction.
This thesis proposes a solution to effectively reduce the labeling cost in computer security annotation projects. We design and implement an end-to-end active learning system, ILAB, tailored to security experts needs. Our user experiments on a real-world annotation project demonstrate that security experts can gather an annotated dataset with a low workload thanks to ILAB.
Finally, feature extraction is usually implemented manually for each data type. Nonetheless, detection systems process many data types and designing a feature extraction method for each of them is tedious. Automatic feature generation would significantly ease, and thus foster, the deployment of machine learning in detection systems.
In this thesis, we define the constraints that such methods should meet to be effective in building detection models. We compare three state-of-the-art methods based on these criteria, and we point out some avenues of research to better tailor these techniques to computer security experts needs.