Abstract
Sketch-based 3D model retrieval suffers large visual discrepancy between 3D models and 2D sketches. Most existing methods directly project samples from both modalities into a same semantic embedding space to alleviate the discrepancy. We argue that simultaneous learning of those two modalities would restrict the discrimination of 3D model representation, resulting in inferior retrieval results. In this work, we propose a novel sequential learning (SL) framework for sketch-based 3D model retrieval to learn 3D model representation and 2D sketch representation separately and sequentially. Specifically, the SL framework is composed of two modules, 3D model network (3DMN) and 2D sketch network (2DSN). Firstly, we train 3DMN with a discriminative loss formulated only on 3D models to promote discrimination. Then, the learned representations of 3D models guide 2DSN to learn discriminative 2D sketch representations. In the second phase, we further mine the implicit fine-grained class information of 3D models by unsupervised clustering algorithms. An alignment loss is formulated on 2D sketches and corresponding fine-grained class centers of 3D models. Extensive experiments on three large-scale benchmark datasets for 3D model retrieval validate the efficacy of the proposed SL framework and fine-grained class representations.












Similar content being viewed by others
References
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Latecki, L.J.: GIFT: a real-time and scalable 3D shape search engine. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 5023–5032. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.543
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: GIFT: towards scalable 3D shape retrieval. IEEE Trans. Multimed. 19(6), 1257–1271 (2017)
Banchs, R.E.: A comparative evaluation of 2D and 3D visual exploration of document search results. In: A. Jaafar, N.M. Ali, S.A.M. Noah, A.F. Smeaton, P. Bruza, Z.A. Bakar, N. Jamil, T.M.T. Sembok (eds.) Information Retrieval Technology—10th Asia Information Retrieval Societies Conference, AIRS 2014, Kuching, Malaysia, December 3–5, 2014. Proceedings, Lecture Notes in Computer Science, vol. 8870, pp. 100–111. Springer (2014). https://doi.org/10.1007/978-3-319-12844-3_9
Chen, J., Fang, Y.: Deep cross-modality adaptation via semantics preserving adversarial learning for sketch-based 3D shape retrieval. In: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIII, Lecture Notes in Computer Science, vol. 11217, pp. 624–640. Springer (2018). https://doi.org/10.1007/978-3-030-01261-8_37
Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27(7), 3374–3386 (2018). https://doi.org/10.1109/TIP.2018.2817042
Dai, G., Xie, J., Zhu, F., Fang, Y.: Deep correlated metric learning for sketch-based 3D shape retrieval. In: S.P. Singh, S. Markovitch (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA, pp. 4002–4008. AAAI Press (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14431
Darom, T., Keller, Y.: Scale-invariant features for 3-D mesh models. IEEE Trans. Image Process. 21(5), 2758–2769 (2012)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp. 248–255. IEEE Computer Society (2009). https://doi.org/10.1109/CVPR.2009.5206848
Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. 31(4), 44:1-44:10 (2012)
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 264–272. IEEE Computer Society (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Feng_GVCNN_Group-View_Convolutional_CVPR_2018_paper.html
Furukawa, M., Akagi, Y., Kawai, Y., Kawasaki, H.: Interactive 3D animation creation and viewing system based on motion graph and pose estimation method. In: K.A. Hua, Y. Rui, R. Steinmetz, A. Hanjalic, A. Natsev, W. Zhu (eds.) Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03–07, 2014, pp. 1213–1216. ACM (2014). https://doi.org/10.1145/2647868.2655055
Furuya, T., Ohbuchi, R.: Ranking on cross-domain manifold for sketch-based 3D model retrieval. In: X. Mao, L. Hong (eds.) 2013 International Conference on Cyberworlds, Yokohama, Japan, October 21–23, 2013, pp. 274–281. IEEE Computer Society (2013). https://doi.org/10.1109/CW.2013.60
Furuya, T., Ohbuchi, R.: Deep aggregation of local 3D geometric features for 3D model retrieval. In: R.C. Wilson, E.R. Hancock, W.A.P. Smith (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper121/index.html
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Hesamian, M.H., Jia, W., He, X., Kennedy, P.J.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)
Kawamura, S., Usui, K., Furuya, T., Ohbuchi, R.: Local goemetrical feature with spatial context for shape-based 3D model retrieval. In: M. Spagnuolo, M.M. Bronstein, A.M. Bronstein, A. Ferreira (eds.) 5th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2012, Cagliari, Sardinia, Italy, May 13, 2012, pp. 55–58. Eurographics Association (2012). https://doi.org/10.2312/3DOR/3DOR12/055-058
Klokov, R., Lempitsky, V.S.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 863–872. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.99
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: P.L. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp. 1106–1114 (2012). https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Kuang, Z., Yu, J., Zhu, S., Li, Z., Fan, J.: Effective 3-D shape retrieval by integrating traditional descriptors and pointwise convolution. IEEE Trans. Multimed. 21(12), 3164–3177 (2019)
Lei, Y., Zhou, Z., Zhang, P., Guo, Y., Ma, Z., Liu, L.: Deep point-to-subspace metric learning for sketch-based 3D shape retrieval. Pattern Recognit. 96, 106981 (2019)
Li, B., Lu, Y., Duan, F., Dong, S., Fan, Y., Qian, L., Laga, H., Li, H., Li, Y., Lui, P., Ovsjanikov, M., Tabia, H., Ye, Y., Yin, H., Xu, Z.: Shrec’16 track: 3D sketch-based 3D shape retrieval. In: Eurographics Workshop on 3D Object Retrieval (3DOR) (2016)
Li, B., Lu, Y., Godil, A., Schreck, T., Aono, M., Johan, H., Saavedra, J.M., Tashiro, S.: Shrec’13 track: large scale sketch-based 3D shape retrieval. In: U. Castellani, T. Schreck, S. Biasotti, I. Pratikakis, A. Godil, R.C. Veltkamp (eds.) 6th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2013, Girona, Spain, May 11, 2013, pp. 89–96. Eurographics Association (2013). https://doi.org/10.2312/3DOR/3DOR13/089-096
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Burtscher, M., Fu, H., Furuya, T., Johan, H., Liu, J., Ohbuchi, R., Tatsuma, A., Zou, C.: Extended large scale sketch-based 3D shape retrieval. In: B. Bustos, H. Tabia, J. Vandeborre, R.C. Veltkamp (eds.) 7th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2014, Strasbourg, France, April 6, 2014, pp. 121–130. Eurographics Association (2014). https://doi.org/10.2312/3dor.20141058
Li, Z., Xu, C., Leng, B.: Angular triplet-center loss for multi-view 3D shape retrieval. In: AAAI, pp. 8682–8689 (2019)
Maturana, D., Scherer, S.A.: Voxnet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, Hamburg, Germany, September 28-October 2, 2015, pp. 922–928. IEEE (2015). https://doi.org/10.1109/IROS.2015.7353481
Nie, W., Wang, K., Wang, H., Su, Y.: The assessment of 3D model representation for retrieval with CNN–RNN networks. Multimed. Tools Appl. 78(12), 16979–16994 (2019)
de Oliveira Rente, P., Brites, C., Ascenso, J., Pereira, F.: Graph-based static 3D point clouds geometry coding. IEEE Trans. Multimed. 21(2), 284–299 (2019). https://doi.org/10.1109/TMM.2018.2859591
Ouyang, W., Zeng, X., Wang, X.: Learning mutual visibility relationship for pedestrian detection with a deep model. Int. J. Comput. Vis. 120(1), 14–27 (2016). https://doi.org/10.1007/s11263-016-0890-9
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
Peed, E., Lee, N.: 3D printing, history of. In: Lee, N. (ed.) Encyclopedia of Computer Graphics and Games. Springer, Berlin (2019). https://doi.org/10.1007/978-3-319-08234-9_279-2
Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)
Qi, A., Song, Y., Xiang, T.: Semantic embedding for sketch-based 3D shape retrieval. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 43. BMVA Press (2018). http://bmvc2018.org/contents/papers/0040.pdf
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 77–85. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.16
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 5648–5656. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.609
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S.V.N. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5099–5108 (2017). https://proceedings.neurips.cc/paper/2017/hash/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html
Saavedra, J.M., Bustos, B., Schreck, T., Yoon, S.M., Scherer, M.: Sketch-based 3D model retrieval using keyshapes for global and local representation. In: M. Spagnuolo, M.M. Bronstein, A.M. Bronstein, A. Ferreira (eds.) 5th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2012, Cagliari, Sardinia, Italy, May 13, 2012, pp. 47–50. Eurographics Association (2012). https://doi.org/10.2312/3DOR/3DOR12/047-050
Saravi, S., Joannou, D., Kalawsky, R., King, M.R.N., Marr, I.P., Hall, M., Wright, P.C.J., Ravindranath, R., Hill, A.: A systems engineering hackathon—a methodology involving multiple stakeholders to progress conceptual design of a complex engineered product. IEEE Access 6, 38399–38410 (2018)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 815–823. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298682
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Shilane, P., Min, P., Kazhdan, M.M., Funkhouser, T.A.: The Princeton shape benchmark. In: 2004 International Conference on Shape Modeling and Applications (SMI 2004), 7–9 June 2004, Genova, Italy, pp. 167–178. IEEE Computer Society (2004). https://doi.org/10.1109/SMI.2004.1314504
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp. 945–953. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.114
Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, JMLR Workshop and Conference Proceedings, vol. 28, pp. 1139–1147. JMLR.org (2013). http://proceedings.mlr.press/v28/sutskever13.html
Van Der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2625 (2008)
Wang, F., Kang, L., Li, Y.: Sketch-based 3D shape retrieval using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 1875–1883. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298797
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3360–3367. IEEE Computer Society (2010). https://doi.org/10.1109/CVPR.2010.5540018
Wang, P., Liu, Y., Guo, Y., Sun, C., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 72:1-72:11 (2017)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146:1-146:12 (2019). https://doi.org/10.1145/3326362
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: B. Leibe, J. Matas, N. Sebe, M. Welling (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII, Lecture Notes in Computer Science, vol. 9911, pp. 499–515. Springer (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 1912–1920. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298801
Xie, J., Dai, G., Fang, Y.: Deep multimetric learning for shape-based 3d model retrieval. IEEE Trans. Multimed. 19(11), 2463–2474 (2017)
Xie, J., Dai, G., Zhu, F., Fang, Y.: Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 3615–3623. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.385
Xie, J., Dai, G., Zhu, F., Wong, E.K., Fang, Y.: Deepshape: deep-learned shape descriptor for 3D shape retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1335–1345 (2017)
Yoon, S.M., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: A.D. Bimbo, S. Chang, A.W.M. Smeulders (eds.) Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25–29, 2010, pp. 193–200. ACM (2010). https://doi.org/10.1145/1873951.1873961
Zhu, F., Xie, J., Fang, Y.: Learning cross-domain neural networks for sketch-based 3D shape retrieval. In: D. Schuurmans, M.P. Wellman (eds.) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12–17, 2016, Phoenix, Arizona, USA, pp. 3683–3689. AAAI Press (2016). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11889
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by B-K Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by National Natural Science Foundation of China (NSFC) under Grants No. 61772108, No. 61572096, and No. 61733002, and Dalian Science and Technology Innovation Fund with No. 2019J11CY004.
Rights and permissions
About this article
Cite this article
Yang, H., Tian, Y., Yang, C. et al. Sequential learning for sketch-based 3D model retrieval. Multimedia Systems 28, 761–778 (2022). https://doi.org/10.1007/s00530-021-00871-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00871-w