Abstract
The goal of spatio-temporal data mining is to discover previously unknown but useful patterns from the spatial and temporal data. However, explosive growth of the spatiotemporal data emphasizes the need for developing novel computationally efficient methods for large-scale data mining applications. Since lots of spatiotemporal data mining problems can be converted to an optimization problem, in this paper, we propose an efficient parameter-level parallel optimization algorithm for large-scale spatiotemporal data mining. In detail, most of previous optimization methods are based on gradient descent methods, which iteratively update the model and provide model-level convergence control for all parameters. Namely, they treat all parameters equally and keep updating all parameters until every parameter has converged. However, we find that during the iterative process, the convergence rates of model parameters are different from each other. This may cause redundant computation and reduce the performance. To solve this problem, we propose a parameter-level stochastic gradient descent (plpSGD), in which the convergence of each parameter is considered independently and only unconvergent parameters are updated in each iteration. Moreover, the updating of model parameters are parallelized in plpSGD to further improve the performance of SGD. We have conducted extensive experiments to evaluate the performance of plpSGD. The experimental results show that compared to previous SGD methods, plpSGD can significantly accelerate the convergence of SGD and achieve the excellent scalability with little sacrifice of the solution accuracy.
Similar content being viewed by others
References
Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 1655–1661 (2017)
Yuan, Z., Zhou, X., Yang, T.: Hetero-convlstm: a deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 984–992 (2018)
Kurth, T., Treichler, S., Romero, J., Mudigonda, M., Luehr, N., Phillips, E., Mahesh, A., Matheson, M., Jack, D., Massimiliano, F., Prabhat, M.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 51:1–51:12 (2018)
Culotta, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122 (2010)
Atluri, G., Karpatne, A., Kumar, V.: Spatio-temporal data mining: a survey of problems and methods. ACM Comput. Surv. 51(4), 83:1–83:41 (2018)
Jun, G., Ghosh, J.: Spatially adaptive classification of land cover with remote sensing data. IEEE Trans. Geosci. Remote Sens. 49(7), 2662–2673 (2011)
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (2009)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’14, pp. 1725–1732 (2014)
Kumar, S., Madria, S., Linderman, M.: M-grid: a distributed framework for multidimensional indexing and querying of location based data. Distrib. Parallel Databases 35(1), 55–81 (2017)
Villarroya, S., Viqueira, J.R., Regueiro, M.A., Taboada, J.A., Cotos, J.M.: Soda: a framework for spatial observation data analysis. Distrib. Parallel Databases 34(1), 65–99 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, vol. 14, pp. 571–582 (2014)
Zhou, J., Li, X., Zhao, P., Chen, C., Li, L., Yang, X., Cui, Q., Yu, J., Chen, X., Ding, Y., Qi, Y.A.: Kunpeng: parameter server based distributed learning systems and its applications in alibaba and ant financial. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1693–1702 (2017)
Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1647–1655 (2011)
Teo, C.H., Vishwanthan, S.V.N., Smola, A.J., Le, Q.V.: Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 11(1), 311–365 (2010)
Zinkevich, M., Langford, J., Smola, A.J.: Slow learners are fast. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 2331–2339 (2009)
Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 2595–2603 (2010)
Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for l1-regularized loss minimization. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 321–328 (2011)
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y.Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 281–288 (2006)
Li, M., Andersen, D.G., Smola, A., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 19–27 (2014)
Huo, Z., Huang, H.: Asynchronous mini-batch gradient descent with variance reduction for non-convex optimization. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence, pp. 2043–2049 (2017)
Agarwal, A., Duchi, J.C.: Distributed delayed stochastic optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 873–881 (2011)
Namkoong, H., Sinha, A., Yadlowsky, S., Duchi, J.C.: Adaptive sampling probabilities for non-smooth optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2574–2583 (2017)
Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1–9 (2015)
Gopal, S.: Adaptive sampling for sgd by exploiting side information. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 364–372 (2016)
Reddi, S.J., Hefny, A., Sra, S., Póczos, B., Smola, A.: On variance reduction in stochastic gradient descent and its asynchronous variants. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp. 2629–2637 (2015)
Zhao, P., Zhang, T.: Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv:1405.3080 (2014)
Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–670 (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of the 30th International Conference on Machine Learning, pp. 71–79 (2013)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Niu, F., Recht, B., Re, C., Wright, S.H.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 693–701 (2011)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1017–1025 (2014)
Vainsencher, D., Liu, H., Zhang, T.: Local smoothness in variance reduced optimization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 2170–2178 (2015)
Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.-Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, pp. 583–598 (2014)
Xing, E.P., Ho, Q., Dai, W., Kim, J.-K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1335–1344 (2015)
Li, H., Kadav, A., Kruus, E., Ungureanu, C.: Malt: distributed data-parallelism for existing ml applications. In: Proceedings of the 10th European Conference on Computer Systems, p. 3 (2015)
Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, Srikrishna: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16(1), 285–322 (2015)
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on Machine learning, pp. 408–415. ACM (2008)
Jothimurugesan, E., Tahmasbi, A., Gibbons, P.B., Tirthapura, S.: Variance-reduced stochastic gradient descent on streaming data. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 9928–9937 (2018)
Yuan, K., Ying, B., Sayed, A.H.: Cover: a cluster-based variance reduced method for online learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3102–3106 (2019)
Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th International Conference on Discovery Science, pp. 1–15 (2010)
Acknowledgements
This work is supported by the National Key Research and Development Plan (Nos. 2017YFC0803700) and NSFC (Nos. 61772218 and 61832006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Z., Shi, X., He, L. et al. A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining. Distrib Parallel Databases 38, 739–765 (2020). https://doi.org/10.1007/s10619-020-07287-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07287-x