Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval

Computer Science ›› 2020, Vol. 47 ›› Issue (8): 313-318.doi: 10.11896/jsjkx.190700031

Previous Articles     Next Articles

Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval

CHEN Qing-chao1, WANG Tao1, FENG Wen-bo2, YIN Shi-zhuang1, LIU Li-jun1   

  1. 1 Equipment Simulation Training Center, Army Engineering University, Shijiazhuang 050003, China
    2 College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CHEN Qing-chao, born in 1996, postgraduate.His main research interests include cyber security and so on.
    WANG Tao, born in 1964, Ph.D, professor.His main research interests include cyber security and cryptography.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2017YFB0802900) and Natural Science Foundation of Jiangsu Province, China (BK20161469).

Abstract: In the process of format inference of unknown binary protocols, a large amount of prior knowledge is often introduced, the experimental operation is complex and the accuracy of the results is low.For this reason, a method that requires less artificial setting of parameters, simple operation and higher accuracy is proposed to infer the unknown binary protocol format.The preprocessed protocol data is clustered hierarchically, and the optimal clustering is obtained by using CH (Calinski-Harabasz) coefficient as the evaluation criteria.Through the improved sequence comparison of the clustering results, the protocol data sequence with interval is obtained, continuous intervals are counted and merged to analyze protocol formats.The experimental results show that the binary protocol format inference method proposed in this paper can infer more than 80% of the field intervals in the unknown binary protocol.Compared with the format inference method in AutoReEngine algorithm, the F1-Measure value of the proposed method is improved by about 30% as a whole.

Key words: Binary protocol, Format inference, Hierarchical clustering, Interval, Sequence alignment

CLC Number: 

  • TP393
[1]DUCHENE J, LE GUERNIC C, ALATA E, et al.State of the art of network protocol reverse engineering tools[J].Journal of Computer Virology and Hacking Techniques, 2018, 14(1):53-68.
[2]LUO J Z, YU S Z.Position-based automatic reverse engineering of network protocols[J].Journal of Network and Computer Applications, 2013, 36(3):1070-1077.
[3]LI M, YU S Z.Noise-Tolerant and Optimal Segmentation of Message Formats for Unknown Application-Layer Protocols [J].Journal of Software, 2013(3):604-617.
[4]ZHANG Z, ZHANG Z, LEE P P, et al.ProWord:An unsupervised approach to protocol feature word extraction[C]∥International Conference on Computer Communications.2014:1393-1401.
[5]MUHAMAD F N, AHMAD R B, ASI S M, et al.Performance Analysis Of Needleman-Wunsch Algorithm (Global) And Smith-Waterman Algorithm (Local) In Reducing Search Space And Time For Dna Sequence Alignment[C]∥Journal of Physics:Conference Series.IOP Publishing, 2018, 1019(1):012085.
[6]TAO S, YU H, LI Q.Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J].IET Communications, 2016, 10(6):709-716.
[7]YAN X, LI Q.Method for determining boundaries of binary protocol format keywords based on optimal path search[J].Journal of Computer Applications, 2018, 38(6):1726-1731.
[8]WANG Y, LI X, MENG J, et al.Biprominer:Automatic Mining of Binary Protocol Features[C]∥International Conference on Parallel & Distributed Computing.IEEE, 2012:179-184.
[9]HOU F J, WANG L, WANG S, et al.Position-based Automated Protocol Reverse Engineer on Network Flows[J].Computer Engineering, 2019, 45(5):84-87.
[10]LIU J L, FU G Y, LI H L, et al.Proprietary protocol fuzzing method based on improved voting expert algorithm[J].Compu-ter Engineering and Applications, 2018, 54(12):98-104.
[11]MENG F, ZHANG C, WU G.Protocol reverse based on hierarchical clustering and probability alignment from network traces[C]∥2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA).IEEE, 2018:443-447.
[12]LI Y, LI Q, ZHANG X.Automatic protocol format signatureconstruction algorithm based on discrete series protocol message
[J].Journal of Computer Applications, 2017, 37(4):954-959.
[13]WU Y.Research on Encryption Identification and frequent patterns mining of unknown protocol bitstreams[D].Shijiazhuang:Army Engineering University, 2015:130-132.
[14]ASHKENAZY H, SELA I, LEVY KARIN E, et al.Multiple sequence alignment averaging improves phylogeny reconstruction[J].Systematic Biology, 2018, 68(1):117-130.
[15]HASHEEM Y M, MOHAMAD K M, ABDI A N E, et al.Mo-bile Forensic Images and Videos Signature Pattern Matching using M-Aho-Corasick[J].International Journal of Advanced Computer Science and Applications, 2016, 7(7):261-264.
[16]QIAO Z, GOTO K, OHSHIMA T, et al.Dictionary matching:review of the aho-corasick algorithm and vision for large dictio-naries[C]∥Proceedings of the 8th International Conference on Information Systems and Technologies.ACM, 2018:4.
[17]LEI D, WANG T, WANG X H, et al.Unknown protocol frame segmentation algorithm based on preamble mining [J].Journal of Computer Applications, 2017, 37(2):440-444.
[18]LIAO Y L, LI Y C, CHEN N C, et al.Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware Accelerator[C]∥2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).IEEE, 2018:1-9.
[19]LI T, LIU Y, ZHANG C, et al.A noise-tolerant system for protocol formats extraction from binary data[C]∥2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA).IEEE, 2014:862-865.
[20]TRIFILO A, BURSCHKA S, BIERSACK E.Traffic to protocol reverse engineering[C]∥2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.IEEE, 2009:1-8.
[21]SUN F, WANG S, ZHANG C, et al.Unsupervised field segmentation of unknown protocol messages[J].Computer Communications, 2019, 146:121-130.
[22]WRCCDC:Pcaps from the Western Regional Collegiate CyberDefense Competition[OL].https://archive.wrccdc.org/pcaps/.
[23]CSDN.S7协议数据集[OL].https://download.csdn.net/down-load/jizhuan0248/10780517.
[1] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[2] WEI Jian-hua, XU Jian-qiu. Efficient Top-k Query Processing on Uncertain Temporal Data [J]. Computer Science, 2020, 47(9): 67-73.
[3] XU Xu-dong, ZHANG Zhi-xiang and ZHANG Xian. Format Mining Method of Variable-length Domain in Private Binary Protocol [J]. Computer Science, 2020, 47(6A): 556-560.
[4] XIAO Cheng-xue, GUO Jian. Improved FMEA Method Based on Interval-Valued Hesitant Fuzzy TODIM [J]. Computer Science, 2020, 47(6): 225-229.
[5] YANG Wen-jing,ZHANG Nan,TONG Xiang-rong,DU Zhen-bin. Class-specific Distribution Preservation Reduction in Interval-valued Decision Systems [J]. Computer Science, 2020, 47(3): 92-97.
[6] GUO Qing-chun,MA Jian-min. Judgment Methods of Interval-set Consistent Sets of Dual Interval-set Concept Lattices [J]. Computer Science, 2020, 47(3): 98-102.
[7] ZHANG Yun-fan,ZHOU Yu,HUANG Zhi-qiu. Semantic Similarity Based API Usage Pattern Recommendation [J]. Computer Science, 2020, 47(3): 34-40.
[8] LI Yi-hao, HONG Zheng, LIN Pei-hong, FENG Wen-bo. Message Format Inference Method Based on Rough Set Clustering [J]. Computer Science, 2020, 47(12): 319-326.
[9] ZHANG Guang-yuan, WANG Ning. Truth Inference Based on Confidence Interval of Small Samples in Crowdsourcing [J]. Computer Science, 2020, 47(10): 26-31.
[10] ZHANG Hong-ze, HONG Zheng, WANG Chen, FENG Wen-bo, WU Li-fa. Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method [J]. Computer Science, 2019, 46(6): 80-89.
[11] ZHANG Feng. Node Encounter Interval Based Buffer Management Strategy in Opportunistic Networks [J]. Computer Science, 2019, 46(5): 57-61.
[12] WU Yi-fan, CUI Yan-peng, HU Jian-wei. Alert Processing Method Based on Hierarchical Clustering [J]. Computer Science, 2019, 46(4): 203-209.
[13] XIA Ying, LI Liu-jie, ZHANG XU, BAE Hae-young. Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data [J]. Computer Science, 2019, 46(4): 22-27.
[14] ZHANG Mao-yin, ZHENG Ting-ting, ZHENG Wan-rong. Interval-valued Intuitionistic Fuzzy Entropy Based on Exponential Weighting and Its Application [J]. Computer Science, 2019, 46(10): 229-235.
[15] YUE Chuan, PENG Xiao-hong. Evaluation Model of Software Quality with Interval Data [J]. Computer Science, 2019, 46(10): 209-214.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!