Two-Stage Pre-Execution Detection
(Segmentation of Object Space)
For the analysis of the files before execution, we use similarity hashing approach with other algorithms that we've trained. For this method we require a large dataset containing files that are benign and malicious.
Follow the above image to make this easily understandable. To make it less complex we are working with 2 dimensions only. An index of each cell corresponds to the particular similarity hash mapping value. Each cell of the grid illustrates a region of objects with the same value of similarity hash mapping, also known as a hash bucket. Dot colors: malicious and benign/unknown. Two options are available: add the hash of a region to the malware database (simple regions) or use it as the first part of the two-stage detector combined with a region-specific classifier (hard regions).
Using two-staged design has reduced the possibilities giving false positives which means more accurate results. Here is how we used two-staged design:
First, we leave the regions with possibilities of having false positives. It creates a bias and objects are passed to the next operation. Due to this bias for the "malicious" class, it reduces the possibility of getting false +ve.
Next classifiers in each hard region are trained on malware from only one bucket – but on all clean objects available in all the buckets of the training set. This makes a regional classifier detect the malware of a particular hard region bucket more precisely. It also prevents any unexpected false positives, when the model works in products with real-world data.