Abstract
We explore the problem of learning and predicting popularity of articles from online news media. The only available information we exploit is the textual content of the articles and the information whether they became popular – by users clicking on them – or not. First we show that this problem cannot be solved satisfactorily in a naive way by modelling it as a binary classification problem. Next, we cast this problem as a ranking task of pairs of popular and non-popular articles and show that this approach can reach accuracy of up to 76%. Finally we show that prediction performance can improve if more content-based features are used. For all experiments, Support Vector Machines approaches are used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fürnkranz, J., Hüllermeier, E.: Preference learning: An introduction. In: Preference Learning. Springer, Heidelberg (2010)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142. ACM, New York (2002)
Joachims, T., Radlinski, F.: Search engines that learn from implicit feedback. IEEE Computer 40(8), 34–40 (2007)
Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), pp. 756–757. ACM, New York (2009)
Center, P.R.: When technology makes headlines: The media’s double vision about the digital age. Technical report, Pew Research Center’s Project for Excellence in Journalism (2010)
Gans, H.J.: Deciding What’s News: A Study of CBS Evening News, NBC Nightly News, Newsweek, and Time, 25th anniversary edn. Northwestern University Press (2004)
Steinberger, R., Pouliquen, B., Van der Goot, E.: An introduction to the europe media monitor family of applications. In: Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR 2009), pp. 1–8 (2009)
Bautin, M., Ward, C., Patil, A., Skiena, S.: Access: News and blog analsysis for the social sciences. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 1229–1232 (2010)
Flaounas, I., Turchi, M., Ali, O., Fyson, N., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: The structure of EU mediasphere. PLoS ONE 5, e14243 (2010)
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: Proceedings of the 2010 International Conference on Intelligent User Interfaces (IUI), pp. 31–40 (2010)
Wu, F., Huberman, B.A.: Popularity, novelty and attention. In: Proceedings 9th ACM Conference on Electronic Commerce (EC 2008), pp. 240–245 (2008)
Szabó, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM Press, New York (1999)
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representation for text categorization. In: 7th International Conference on Information and Knowledge Management (CIKM), pp. 148–155 (1998)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Conference on Computational Learning Theory (COLT), pp. 144–152 (1992)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Scholkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge Mass (2002)
Turchi, M., Flaounas, I., Ali, O., De Bie, T., Snowsill, T., Cristianini, N.: Found in translation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 746–749. Springer, Heidelberg (2009)
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Liu, B.: Web Data Mining, Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007)
Joachims, T.: Making large-scale svm learning practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Flaounas, I.N., Turchi, M., Cristianini, N.: Detecting macro-patterns in the european mediasphere. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp. 527–530 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hensinger, E., Flaounas, I., Cristianini, N. (2011). Learning Readers’ News Preferences with Support Vector Machines. In: Dobnikar, A., Lotrič, U., Šter, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2011. Lecture Notes in Computer Science, vol 6594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20267-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-20267-4_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20266-7
Online ISBN: 978-3-642-20267-4
eBook Packages: Computer ScienceComputer Science (R0)