Abstract
Under the new international situation, every country attaches great importance to the information security of the Internet, among which webpage anti-tampering is one of the top priorities. The premise of webpage tampering detection is to obtain webpages with different timestamps. Because of the diversity of website structure, it is necessary to make reasonable crawling strategy when using the traditional crawler method to obtain web pages, which leads to the problem of inflexible application. To address this, this paper adopts deep learning approach for detecting webpage text and thus acquiring web page information. The improved Faster-RCNN model is used to detect webpages and the resnet network is used to extract text features. In view of the feature of long text image font, the square convolution kernel of the traditional network is replaced by a rectangular convolution kernel to better fit the long and narrow features of the text; for the characteristics of dense text lines, the traditional NMS algorithm is replaced by the Soft-NMS algorithm to reduce the missed detection of dense regions. The experiments show that this algorithm has a better detection effect, which is important for network information security.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lu, Y.H., Gao, J.: Design and implementation of anti-tampering monitoring system for campus secondary websites based on webpage comparison. Experimental Technol. Manage. 28(06), 119–121+133 (2011)
Sun, L.W., He, G.F., Wu, L.F.: Research on web crawler technology. Computer Knowledge Technol. 6(15), 4112–4115 (2010)
Chakrabarti, S., Van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Comput. Netw. 31(11–16), 1623–1640 (2009)
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)
Zhang, X., Zeng, Y., Jin., X.B., Yan, Z.W., Geng, G.G.: Boosting the phishing detection performance by semantic analysis. In: 2017 IEEE International Conference on Big Data, pp. 1063–1070. IEEE publisher, Piscataway (2017)
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Zhi, T., Huang, W., Tong, H.: Detecting text in natural image with connectionist text proposal network. In: 2016 14th European Conference on Computer Vision, pp. 56–72. Springer Science press, Amsterdam (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multi-box detector. In: 2016 14th European Conference on Computer Vision, pp. 21–37. Springer Science press, Amsterdam (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Shi, B.G., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3482–3490. IEEE publisher, Honolulu (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis Machine Intelligence 39(4), 640–651 (2015)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), pp. 2117–2125. IEEE publisher, Honolulu (2017)
Zhou, X.Y., et al.: EAST: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651. IEEE publisher, Honolulu (2017)
Wang, W.H., et al.: Shape robust text detection with progressive scale expansion network. In: 2019 IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR ), pp. 9328–9337. IEEE publisher, Long Beach (2019)
Xie, Y., Lei, Y.: Image object detection based on deep convolutional neural network. Industrial Control Computer 30(4), 96–97 (2017)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519. IEEE publisher, Columbus, OH, USA (2014)
Gu, J.X., et al.: Recent advances in convolutional neural networks. Pattern Recogn. 77, 458–463 (2018)
Rong, X.J., Yi, C., Tian, Y.L.: Unambiguous text localization, retrieval, and recognition for cluttered scenes. In IEEE Trans. Pattern Analysis Machine Intelligence 44(3), 1638–1652 (2022)
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene textdetection via transverse and longitudinal sequence connection. Pattern Recogn. 90, 337–345 (2019)
Patgiri, R., Katari, H., Kumar, R., Sharma, D.: Empirical study on malicious URL detection using machine learning. In: 15th International Conference on Distributed Computing and Internet Technology, pp. 380–388. IEEE publisher, India: Bhubaneswar (2019)
Ling, O.Y., Theng, L.B., Weiyen, A.C., Mccarthy, C.: Development of vertical text interpreter for natural scene images. IEEE Access 9, 144341–144351 (2021)
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: Learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)
Bodla, N., Singh, B., Chellappa, R.: Soft-NMS-Improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision, pp. 5562–5570. IEEE publisher, Venice, Italy (2017)
Liu, Y.F., Lu, B.H., Peng, J.Y.: Research on the use of YOLOv5 object detection algorithm in mask recognition. World Scientific Reaearch J. 6(11), 377–383 (2020)
Duan, K.W., Song, B., Xie, L.: Center Net: keypoint triplets for object detection. In: Proceedings of 2019IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6568–6577. IEEE publisher, Seoul, Korea (2019)
Acknowledgements
We acknowledge funding from the sub project of national key R & D plan covid-19 patient rehabilitation training posture monitoring bracelet based on 4G network (Grant No.2021YFC0863200–6), the Hebei College and Middle School Students Science and Technology Innovation Ability Cultivation Special Project (Grant No.22E50075D), (Grant No.2021H011404) and (Grant No.2021H010203).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, J., Zhao, M., Wang, J., Zhao, Y., Ma, L., Yu, P. (2023). Webpage Text Detection Based on Improved Faster-RCNN Model. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13657. Springer, Cham. https://doi.org/10.1007/978-3-031-20102-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-20102-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20101-1
Online ISBN: 978-3-031-20102-8
eBook Packages: Computer ScienceComputer Science (R0)