A Graph Similarity Calculation Algorithm and Its Application for Comparing Binary Executable Files

P. D Borisov; Борисов П. Д; D. V Varlamov; Варламов Д. В; Yu. V Kosolapov; Косолапов Ю. В

doi:10.15622/ia.24.5.9

A Graph Similarity Calculation Algorithm and Its Application for Comparing Binary Executable Files

作者: Borisov P.D¹, Varlamov D.V², Kosolapov Y.V²
隶属关系:
1. FSASE SRI «Specvuzavtomatika»
2. Southern Federal University (SFedU)
期: 卷 24, 编号 5 (2025)
页面: 1506-1531
栏目: Information security
URL: https://ogarev-online.ru/2713-3192/article/view/350765
DOI: https://doi.org/10.15622/ia.24.5.9
ID: 350765

如何引用文章

全文:

详细
作者简介
参考
补充文件
统计

详细

The paper addresses the task of static (without execution) comparison of binary executable files. A program and any of its procedures can be represented as a directed graph. For a program, the corresponding graph is a function (procedure) call graph, where the nodes are the functions themselves, and an edge from vertex a to b describes a call to function b from function a. For a procedure, such a graph is a control flow graph, where the vertices are basic blocks, and an edge between nodes a and b means that the commands of block b can be executed after the commands of block a. The study proposes an algorithm for comparing directed graphs, which is then applied to program comparison. The graph comparison algorithm is based on applying a node similarity function. For comparing procedure graphs, a fuzzy hash function and a cryptographic hash function are used as this similarity function. Subsequently, this method of comparing procedure graphs is used as the node similarity function for comparing program graphs. Based on the proposed algorithm, a method for program comparison has been developed, and its investigation was conducted through two experiments. In the first experiment, the method’s behavior was studied when comparing programs compiled with different optimization options (O0, O1, O2, O3, and Os). In the second experiment, the possibility of identifying effective and resilient obfuscating transformations within a previously developed model was investigated. Within the first experiment, evidence was obtained supporting the hypothesis that similarity decreases as optimization increases from O1 to O3. Within the second experiment, some previously obtained results concerning the effectiveness (ineffectiveness) and resilience (non-resilience) of obfuscating transformations were confirmed.

关键词

graph comparison, program similarity, effectiveness and resilience of obfuscation

参考

Van Tilborg H., Jajodia S. Encyclopedia of cryptography and security. Springer Science & Business Media. 2014. 1416 p.
Li B., He J., Huang J., Shi Y.Q. A survey on image steganography and steganalysis // Journal of Information Hiding and Multimedia Signal Processing. 2011. vol. 2. no. 2. pp. 142–172.
Chabot C. Recognition of a code in a noisy environment // IEEE International Symposium on Information Theory. IEEE, 2007. pp. 2211–2215. doi: 10.1109/ISIT.2007.4557548.
Crawford M., Khoshgoftaar T.M., Prusa J.D, Richter A.N., Al Najada H. Survey of review spam detection using machine learning techniques // Journal of Big Data. 2015. vol. 2. doi: 10.1186/s40537-015-0029-9.
Forrest S., Hofmeyr S., Somayaji A. The evolution of system-call monitoring // Proceedings of the 2008 Annual Computer Security Applications Conference (ACSAC). 2008. pp. 418–430. doi: 10.1109/ACSAC.2008.5.
Khraisat A., Gondal I., Vamplew P., Kamruzzaman J. Survey of intrusion detection systems: techniques, datasets and challenges // Cybersecurity. 2019. vol. 2. doi: 10.1186/s42400-019-0038-7.
Kosolapov Y.V. On detecting code reuse attacks // Automatic Control and Computer Sciences. 2020. vol. 54. no. 7. pp. 573–583. doi: 10.3103/S0146411620070111.
Kiger J., Ho S.-S., Heydari V. Malware binary image classification using convolutional neural networks // Proceedings of the 17th International Conference on Cyber Warfare and Security (ICCWS). 2022. vol. 17. pp. 469–478. doi: 10.34190/iccws.17.1.59.
Polsani H., Jiang H., Liu Y. DeepGray: Malware Classification Using Grayscale Images with Deep Learning // The 37th International FLAIRS Conference. 2024. pp. 1–5.
Борисов П.Д., Косолапов Ю.В. Способ количественного сравнения обфусцирующих преобразований // Информатика и автоматизация. 2024. Т. 23. № 3. С. 684–726. doi: 10.15622/ia.23.3.3.
Борисов П.Д., Косолапов Ю.В. Способ оценки похожести программ методами машинного обучения // Труды Института системного программирования РАН. 2022. Т. 34. № 5. С. 63–76. doi: 10.15514/ISPRAS-2022-34(5)-4.
Kornblum J. Identifying almost identical files using context triggered piecewise hashing // Digital investigation. 2006. vol. 3. pp. 91–97. doi: 10.1016/j.diin.2006.06.015.
Breitinger F., Baier H. Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2 // Digital Forensics and Cyber Crime: 4th International Conference (ICDF2C 2012). 2013. pp. 167–182. doi: 10.1007/978-3-642-39891-9_11.
Roussev V. An evaluation of forensic similarity hashes // Digital investigation. 2011. vol. 8. pp. S34–S41. doi: 10.1016/j.diin.2011.05.005.
Pagani F., Dell’Amico M., Balzarotti D. Beyond precision and recall: understanding uses (and misuses) of similarity hashes in binary analysis // Proc. of the Eighth ACM Conference on Data and Application Security and Privacy. 2018. pp. 354–365. doi: 10.1145/3176258.3176306.
BinDiff. URL: https://www.zynamics.com/ (дата обращения: 23.06.2025).
Aslanyan H., Avetisyan A., Arutunian M., Keropyan G., Kurmangaleev S., Vardanyan V. Scalable Framework for Accurate Binary Code Comparison // Ivannikov ISPRAS Open Conference (ISPRAS). 2017. pp. 34–38. doi: 10.1109/ISPRAS.2017.00013.
Machoc hash. URL: https://github.com/ANSSI-FR/polichombr/blob/dev/ (дата обращения: 23.06.2025).
Machoke. URL: https://github.com/conix-security/machoke (дата обращения:
06.2025).
Li Y., Jang J., Ou X. Topology-aware hashing for effective control flow graph similarity analysis // Security and Privacy in Communication Networks: 15th EAI International Conference (SecureComm). 2019. pp. 278–298.
Borisov P.D., Kosolapov Y.V. On the Automatic Analysis of the Practical Resistance of Obfuscating Transformations // Aut. Control Comp. Sci. 2020. vol. 54. pp. 619–629. doi: 10.3103/S0146411620070044.
Борисов П.Д., Косолапов Ю.В. О функции похожести графических представлений исполняемых файлов в модели оценки обфусцирующих преобразований // Известия ЮФУ. 2025. № 3(245). С. 264–273.
Naville Z. Hikari–an improvement over Obfuscator-LLVM. 2017. URL: https://github.com/HikariObfuscator/Hikari (дата обращения: 26.11.2024).
Holder W., McDonald J.T., Andel T.R. Evaluating optimal phase ordering in obfuscation executives // Proceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop. 2017. pp. 1–12. doi: 10.1145/3151137.3151140.
small-programs. A set of small programs for experiments with obfuscations. URL: https://github.com/Boriskin61/small-programs (дата обращения: 22.06.2025).

补充文件

附件文件

动作

1. JATS XML

下载

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册

卷 24, 编号 5 (2025)

A Graph Similarity Calculation Algorithm and Its Application for Comparing Binary Executable Files

全文:

详细

关键词

作者简介

P. Borisov

D. Varlamov

Yu. Kosolapov

参考

补充文件