A self-attention model for viewport prediction based on distance constraint

Lan , ChengDong; Qiu, Xu; Miao, Chenqi; Zheng, MengTing

doi:10.1007/s00371-023-03149-6

A self-attention model for viewport prediction based on distance constraint

Original article
Published: 28 November 2023

Volume 40, pages 5997–6014, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

ChengDong Lan ORCID: orcid.org/0000-0002-7843-8577^1,2,3,
Xu Qiu¹,
Chenqi Miao^2,3 &
…
MengTing Zheng³

281 Accesses
Explore all metrics

Abstract

Panoramic video multimedia technology has made significant advancements in recent years, providing users with an immersive experience by displaying the entire 360° spherical scene centered around their virtual location. However, due to its larger data volume compared to traditional video formats, transmitting high-quality videos requires more bandwidth. It is important to note that users do not see the whole 360° content simultaneously, but only a portion that is within their viewport. To save bandwidth, viewport-based adaptive streaming has become a significant technology that transmits only the viewports of interest to the user in high quality. Therefore, the accuracy of viewport prediction plays a crucial role. However, the performance of viewport prediction is affected by the size of the prediction window, which decreases significantly as the window size increases. In order to address this issue, we propose an effective self-attention viewport prediction model based on distance constraint in this paper. Firstly, by analyzing the existing viewport trajectory dataset, we find the randomness and continuity of the viewport trajectory. Secondly, to solve the randomness problem, we design a viewport prediction model based on a self-attention mechanism to provide more trajectory information for long inputs. Thirdly, in order to ensure the continuity of the predicted viewport trajectory, the loss function is modified with the distance constraint to reduce the change in the continuity of prediction results. Finally, the experimental results based on the real viewport trajectory datasets show that the algorithm we propose has higher prediction accuracy and stability compared with the advanced models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

How Can I See My Future? FvTraj: Using First-Person View for Pedestrian Trajectory Prediction

Two-Layer FoV Prediction Model for Viewport Dependent Streaming of 360-Degree Videos

Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram

Article 27 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Kloiber, S., Settgast, V., Schinko, C., et al.: Immersive analysis of user motion in VR applications. Vis. Comput. 36, 1937–1949 (2020). https://doi.org/10.1007/s00371-020-01942-1
Article Google Scholar
Sherstyuk, A., Vincent, D., et al.: Toward natural selection in virtual reality. IEEE Comput. Graphics Appl. 30(2), 93–96 (2010)
Article Google Scholar
Ng, K., Chan, S.: Data compression and transmission aspects of panoramic videos. Heung-Yeung Shum 15(1), 82–95 (2005)
Google Scholar
Xie, L., Xu, Z., Ban, Y., Zhang, X., Guo, Z., 360ProbDASH: Improving QoE of 360 video streaming using tile-based HTTP adaptive streaming. In: Proc. 25th ACM Int. Conf. Multimedia, ser. MM ’17. New York, NY, USA: ACM, 2017, pp 315–323
Sreedhar, K.K., Aminlou, A., Hannuksela, M.M., et al.: Viewport-adaptive encoding and streaming of 360-degree video for virtual reality applications. IEEE Int. Symp. Multimed. 2016, 583–586 (2016)
Google Scholar
Zare, A., Aminlou, A., Hannuksela, M.M. HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications. In: Proceedings of the 24th ACM International Conference on Multimedia, ACM, 2016:601–605
ZhiQian, J., Xu, Z., YiLin, X., et al.: Reinforcement learning based rate adaptation for 360-degree video streaming. IEEE Trans. Broadcast. 67(2), 409–423 (2020)
Google Scholar
Nguyen, D.V., Tran, H.T.T., Thang, T.C.: An evaluation of tile selection methods for viewport-adaptive streaming of 360-degree video. ACM Trans. Multimed. Comput. 16(1), 1–24 (2020)
Article Google Scholar
Yaqoob, A., Bi, T., G.-M. Muntean A Survey on Adaptive 360° Video Streaming: Solutions, Challenges and Opportunities. In: IEEE Communications Surveys & Tutorials, vol. 22, no. 4, pp. 2801–2838, Fourthquarter (2020) doi: https://doi.org/10.1109/COMST.2020.3006999.
Assens, M., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E. SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, , pp 2331–2338, (2017) doi: https://doi.org/10.1109/ICCVW.2017.275.
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z. Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2693–2708, (2019), doi: https://doi.org/10.1109/TPAMI.2018.2858783
Zhang, M., Ma, K.T., Lim, J.H., Zhao, Q., Feng, J. Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 3539–3548, (2017) doi: https://doi.org/10.1109/CVPR.2017.377.
Xu, Y. et al. Gaze Prediction in Dynamic 360° Immersive Videos, In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 5333–5342, (2018) doi: https://doi.org/10.1109/CVPR.2018.00559.
Yang, Q., Zou, J., Tang, K., Li, C., Xiong, H. Single and Sequential Viewports Prediction for 360-Degree Video Streaming. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, (2019), pp. 1-5, doi: https://doi.org/10.1109/ISCAS.2019.8702654
Feng, X., Liu, Y., Wei, S. LiveDeep: online viewport prediction for live virtual reality streaming using lifelong deep learning. In: 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Atlanta, GA, USA, pp. 800–808, (2020) doi: https://doi.org/10.1109/VR46266.2020.00104
Zhang, R., Chen, C., Zhang, J., et al.: 360-degree visual saliency detection based on fast-mapped convolution and adaptive equator-bias perception. Vis. Comput. 39, 1163–1180 (2023). https://doi.org/10.1007/s00371-021-02395-w
Article Google Scholar
Duanmu, F., Kurdoglu, E., Hosseini, S.A. et al. Prioritized buffer control in two-tier 360 video streaming. In: Proceedings of the Workshop on Virtual Reality and Augmented Reality Network. 13–18. (2017)
Ban, Y., Xie, L., Xu, Z et al. CUB360: exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 1–6. (2018)
Park, J., Nahrstedt, K., Navigation Graph for Tiled Media Streaming. In: Proceedings of the 27th ACM International Conference on Multimedia, ACM, 447–455. (2019)
Yang, Q., Zou, J., Tang, K. et al. Single and sequential viewports prediction for 360-degree video streaming. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 1-5. (2019)
Jamali, M., Stéphane, C., Vakili, A. et al. LSTM-Based Viewpoint Prediction for Multi-Quality Tiled Video Coding in Virtual Reality Streaming. In: 2020 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, (2020)
Qian, F., Ji, L., Han, B. et al. Optimizing 360 video delivery over cellular networks. In: Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, ACM, 1–6. (2016)
Xu, Z. et al. Probabilistic Viewport Adaptive Streaming for 360-degree Videos. In: IEEE Int. Symp on Circuits and Systems (ISCAS), 2018.
Mavlankar, A., Girod, B. Video Streaming With Interactive Pan/Tilt/Zoom. Berlin, Germany: Springer, pp 431–455. (2010) https://doi.org/10.1007/978-3-642-12802-8_19
Chen, J., Luo, X., Hu, M., Wu, D., Zhou, Y.: Sparkle: user-aware viewport prediction in 360-degree video streaming. IEEE Trans. Multimedia 23, 3853–3866 (2021). https://doi.org/10.1109/TMM.2020.3033127
Article Google Scholar
Atev, S., Miller, G., Papanikolopoulos, N.P.: Clustering of vehicle trajectories. IEEE Trans. Intell. Transp. Syst. 11(3), 647–657 (2010)
Article Google Scholar
Petrangeli, S., Simon, G., Swaminathan, V. Trajectory-based viewport prediction for 360-degree virtual reality videos. In: 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 157–160, (2018)
Rossi, S., De Simone, F., Frossard, P., Toni, L., Spherical clustering of users navigating 360◦ content. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 4020–4024. (2019)
Nasrabadi, A.T., Samiei, A., Prakash, R. Viewport prediction for 360° videos: a clustering approach. In: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video ser. NOSSDAV ’20, pp. 34–39, (2020)
Van Damme, S., Maria Torres, V., Filip De, T. Machine learning based content-agnostic viewport prediction for 360-degree video. In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18.2: 1–24. (2022)
Bao, Y., Wu, H., Zhang, T., Ramli, A.A., Liu, X. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In: Proc. IEEE Int. Conf. Big Data (Big Data), pp 1161–1170. (2016)
Jiang, X., Chiang, Y., Zhao, Y. et al. Plato: learning-based adaptive streaming of 360-degree videos. In: 2018 IEEE 43rd Conference on Local Computer Networks (LCN), IEEE, 393–400. (2019)
Hou, X., Dey, S., Zhang, J. et al. Predictive View Generation to Enable Mobile 360-degree and VR Experiences[C]. In: Proceedings of the 2018 Morning Workshop on Virtual Reality and Augmented Reality Network, ACM, 20–26. (2018)
Xiao, G., Wu, M., Shi, Q., et al.: DeepVR: deep reinforcement learning for predictive panoramic video streaming. IEEE Trans. Cognit. Commun. Netw. 5(4), 1167–1177 (2019)
Article Google Scholar
Zou, J., Li, C., Cheng, et al.: Probabilistic tile visibility-based server-side rate adaptation for adaptive 360-degree video streaming. IEEE J. Select. Top. Signal Process. 14(1), 161–176 (2019)
Article Google Scholar
Yu, J., Liu, Y. Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks. In: Proc 11th ACM Workshop Immersive Mixed Virtual Environ. Syst., pp. 37–42. (2019)
Chao, F.-Y., Ozcinar, C., Smolic, A. Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need. In: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, pp. 1-6, (2021) doi: https://doi.org/10.1109/MMSP53017.2021.9733647
Lo, W., Fan, C., Lee, J., et al. 360° Video Viewing Dataset in Head-Mounted Virtual Reality. In: Proceedings of the 8th ACM on Multimedia Systems Conference, ACM, 211–216 (2017)
Vaswani, A., Shazeer, N., Parmar, N., et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, ACM, 6000–6010. (2017)
Chenglei, W., Zhihao, T., Zhi W., Shiqiang Y. A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys'17). Association for Computing Machinery, New York, NY, USA, 193–198. (2017) https://doi.org/10.1145/3083187.3083210.
Chakareski, J., Aksu, R., Corbillon, X. et al. Viewport-Driven Rate-Distortion Optimized 360º Video Streaming. In: 2018 IEEE International Conference on Communications (ICC), IEEE, 2018:1–7
Katharopoulos, A., Vyas, A., Pappas, N. et al. Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning. PMLR, 5156–5165 (2020)
Choromanski, K.M., Likhosherstov, V., Dohan, D. et al. Rethinking Attention with Performers. In: International Conference on Learning Representations. (2020)

Download references

Author information

Authors and Affiliations

School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362200, Fujian, China
ChengDong Lan & Xu Qiu
Fujian Provincial Key Laboratory of Media Information Intelligent Processing and Wireless Transmission, Fuzhou, 350108, China
ChengDong Lan & Chenqi Miao
College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
ChengDong Lan , Chenqi Miao & MengTing Zheng

Authors

ChengDong Lan
View author publications
You can also search for this author inPubMed Google Scholar
Xu Qiu
View author publications
You can also search for this author inPubMed Google Scholar
Chenqi Miao
View author publications
You can also search for this author inPubMed Google Scholar
MengTing Zheng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to ChengDong Lan .

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lan , C., Qiu, X., Miao, C. et al. A self-attention model for viewport prediction based on distance constraint. Vis Comput 40, 5997–6014 (2024). https://doi.org/10.1007/s00371-023-03149-6

Download citation

Accepted: 23 October 2023
Published: 28 November 2023
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00371-023-03149-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A self-attention model for viewport prediction based on distance constraint

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How Can I See My Future? FvTraj: Using First-Person View for Pedestrian Trajectory Prediction

Two-Layer FoV Prediction Model for Viewport Dependent Streaming of 360-Degree Videos

Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now