On many occasions, malware evades detection by scanning engines, and escape unscathed by undergoing a change in its structure and behavior. However, this one attribute (when present in large volumes) can be used to determinerelationship between different types of malware and detect new strains. A recent study published by security researcher Silvio Cesare emphasizes malware strains can be identified by their heritage. The researcher developed a model called Simseer capable of identifying a plagiarised software andestablishing relationship between malware.
The website tracks and categorizes the heritage of different strains of malware. At the time of research Cesare realized that even moderate changes to malware don’t change the structures. He used this factor as a model for detecting approximate matches of malware, and pick an entire family of malware based on that one structure. The analysis done by the tool helped the Melbourne based security researcher determine relationship between malware by assessing their similarity to existing based on malicious code and find if a malware outbreak had links to previous outbreaks. He could predict all this by tabulating the analysis results and visualizing the program relationships as an evolutionary tree.
How does Simseer work
You have to submit a Zip archive containing the malware to Simseer. The maximum file size per is 100,000 bytes. The sample filename must be: alphanumeric or periods and PE-32 and ELF-32 executables only. A maximum of 20 submissions are permissible in a day.
Simseer servers group the samples into clusters, then scan an unknown sample for similarities with known malware families and to identify new ones. It then displays an evolutionary tree on the left, showing the relationships between existing and new code. The closer the programs are in the tree, the closer they are related and are likely to belong to the same family. New strains, if found are cataloged separately when they are less than 98% similar to an existing strain.
A score of 1.0 means the programs are identical. A score of 0.0 means the programs are not at all similar. Programs that have a similarity greater or equal to 0.60 are variants of each other and highlighted green in the results. The brighter the green, the more similar the programs are.
To maintain Simseer’s database, Cesare downloads raw malware code from open malware-sharing network VirusShare and other sources, with between 600MB and 16GB of data fed into his algorithms every night.
Via AusCERT 2013.