{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T01:00:38Z","timestamp":1740099638477,"version":"3.37.3"},"publisher-location":"Cham","reference-count":26,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783030438227"},{"type":"electronic","value":"9783030438234"}],"license":[{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,3,28]],"date-time":"2020-03-28T00:00:00Z","timestamp":1585353600000},"content-version":"vor","delay-in-days":87,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020]]},"abstract":"Abstract<\/jats:title>\n An exploratory data analysis system should be aware of what a user already knows and what the user wants to know of the data. Otherwise it is impossible to provide the user with truly informative and useful views of the data. In our recently introduced framework for human-guided data exploration (Puolam\u00e4ki et al. [20]), both the user\u2019s knowledge and objectives are modelled as distributions over data, parametrised by tile constraints. This makes it possible to show the users the most informative views given their current knowledge and objectives. Often the data, however, comes with a class label and the user is interested only of the features informative related to the class. In non-interactive settings there exist dimensionality reduction methods, such as supervised PCA (Barshan et al. [1]), to make such visualisations, but no such method takes the user\u2019s knowledge or objectives into account. Here, we formulate an information criterion for supervised human-guided data exploration<\/jats:italic> to find the most informative views about the class structure of the data by taking both the user\u2019s current knowledge and objectives into account. We study experimentally the scalability of our method for interactive use, and stability with respect to the size of the class of interest. We show that our method gives understandable and useful results when analysing real-world datasets, and a comparison to SPCA demonstrates the effect of the user\u2019s background knowledge. The implementation will be released as an open source software library.<\/jats:p>","DOI":"10.1007\/978-3-030-43823-4_8","type":"book-chapter","created":{"date-parts":[[2020,3,27]],"date-time":"2020-03-27T22:02:35Z","timestamp":1585346555000},"page":"85-101","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Supervised Human-Guided Data Exploration"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9623-6282","authenticated-orcid":false,"given":"Emilia","family":"Oikarinen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1819-1047","authenticated-orcid":false,"given":"Kai","family":"Puolam\u00e4ki","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6317-9453","authenticated-orcid":false,"given":"Samaneh","family":"Khoshrou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4955-0743","authenticated-orcid":false,"given":"Mykola","family":"Pechenizkiy","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,3,28]]},"reference":[{"issue":"7","key":"8_CR1","doi-asserted-by":"publisher","first-page":"1357","DOI":"10.1016\/j.patcog.2010.12.015","volume":"44","author":"E Barshan","year":"2011","unstructured":"Barshan, E., Ghodsi, A., Azimifar, Z., Jahromi, M.Z.: Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds. Pattern Recogn. 44(7), 1357\u20131371 (2011)","journal-title":"Pattern Recogn."},{"key":"8_CR2","unstructured":"The British National Corpus, v. 3 (BNC XML Edition). Distributed by Oxford University Computing Services on Behalf of the BNC Consortium (2007). http:\/\/www.natcorp.ox.ac.uk\/"},{"key":"8_CR3","doi-asserted-by":"crossref","unstructured":"Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining: interactive local pattern discovery through implicit preference and performance learning. In: KDD-IDEA, pp. 27\u201335 (2013)","DOI":"10.1145\/2501511.2501517"},{"key":"8_CR4","doi-asserted-by":"crossref","unstructured":"Chau, D., Kittur, A., Hong, J., Faloutsos, C.: Apolo: making sense of large network data by combining rich user interaction and machine learning. In: CHI, pp. 167\u2013176 (2011)","DOI":"10.1145\/1978942.1978967"},{"key":"8_CR5","unstructured":"De Bie, T., Lijffijt, J., Santos-Rodriguez, R., Kang, B.: Informative data projections: a framework and two examples. In: ESANN, pp. 635\u2013640 (2016)"},{"issue":"3","key":"8_CR6","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1007\/s10618-010-0209-3","volume":"23","author":"T De Bie","year":"2011","unstructured":"De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407\u2013446 (2011)","journal-title":"Data Min. Knowl. Discov."},{"issue":"11","key":"8_CR7","doi-asserted-by":"publisher","first-page":"2842","DOI":"10.1109\/TKDE.2016.2599168","volume":"28","author":"K Dimitriadou","year":"2016","unstructured":"Dimitriadou, K., Papaemmanouil, O., Diao, Y.: AIDE: an active learning-based approach for interactive data exploration. IEEE Trans. Knowl. Data Eng. 28(11), 2842\u20132856 (2016)","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"8_CR8","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1007\/978-3-642-41398-8_14","volume-title":"Advances in Intelligent Data Analysis XII","author":"V Dzyuba","year":"2013","unstructured":"Dzyuba, V., van Leeuwen, M.: Interactive discovery of interesting subgroup sets. In: Tucker, A., H\u00f6ppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 150\u2013161. Springer, Heidelberg (2013). https:\/\/doi.org\/10.1007\/978-3-642-41398-8_14"},{"key":"8_CR9","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","volume":"7","author":"RA Fisher","year":"1936","unstructured":"Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179\u2013188 (1936)","journal-title":"Ann. Eugen."},{"key":"8_CR10","first-page":"1307","volume":"3","author":"A Globerson","year":"2003","unstructured":"Globerson, A., Tishby, N.: Sufficient dimensionality reduction. J. Mach. Learn. Res. 3, 1307\u20131331 (2003)","journal-title":"J. Mach. Learn. Res."},{"key":"8_CR11","doi-asserted-by":"crossref","unstructured":"Hanhij\u00e4rvi, S., Ojala, M., Vuokko, N., Puolam\u00e4ki, K., Tatti, N., Mannila, H.: Tell me something I don\u2019t know: randomization strategies for iterative data mining. In: KDD, pp. 379\u2013388 (2009)","DOI":"10.1145\/1557019.1557065"},{"key":"8_CR12","doi-asserted-by":"crossref","unstructured":"Kang, B., Lijffijt, J., Santos-Rodr\u00edguez, R., De Bie, T.: Subjectively interesting component analysis: data projections that contrast with prior expectations. In: KDD, pp. 1615\u20131624 (2016)","DOI":"10.1145\/2939672.2939840"},{"issue":"4","key":"8_CR13","doi-asserted-by":"publisher","first-page":"949","DOI":"10.1007\/s10618-018-0558-x","volume":"32","author":"B Kang","year":"2018","unstructured":"Kang, B., Lijffijt, J., Santos-Rodr\u00edguez, R., De Bie, T.: SICA: subjectively interesting component analysis. Data Min. Knowl. Disc. 32(4), 949\u2013987 (2018). https:\/\/doi.org\/10.1007\/s10618-018-0558-x","journal-title":"Data Min. Knowl. Disc."},{"issue":"3","key":"8_CR14","first-page":"37","volume":"5","author":"DW Lee","year":"2001","unstructured":"Lee, D.W.: Genres, registers, text types, domain, and styles: clarifying the concepts and navigating a path through the BNC jungle. Lang. Learn. Technol. 5(3), 37\u201372 (2001)","journal-title":"Lang. Learn. Technol."},{"key":"8_CR15","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-39351-3","volume-title":"Nonlinear Dimensionality Reduction","author":"JA Lee","year":"2007","unstructured":"Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007). https:\/\/doi.org\/10.1007\/978-0-387-39351-3"},{"key":"8_CR16","series-title":"Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence)","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1007\/978-3-319-23461-8_42","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"M van Leeuwen","year":"2015","unstructured":"van Leeuwen, M., Cardinaels, L.: VIPER \u2013 visual pattern explorer. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 333\u2013336. Springer, Cham (2015). https:\/\/doi.org\/10.1007\/978-3-319-23461-8_42"},{"key":"8_CR17","unstructured":"Lijffijt, J., Nevalainen, T.: A simple model for recognizing core genres in the BNC. In: Studies in Variation, Contacts and Change in English, vol. 19 (2017)"},{"key":"8_CR18","series-title":"Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence)","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1007\/978-3-319-46227-1_14","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"K Puolam\u00e4ki","year":"2016","unstructured":"Puolam\u00e4ki, K., Kang, B., Lijffijt, J., De Bie, T.: Interactive visual data exploration with subjective feedback. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 214\u2013229. Springer, Cham (2016). https:\/\/doi.org\/10.1007\/978-3-319-46227-1_14"},{"key":"8_CR19","doi-asserted-by":"crossref","unstructured":"Puolam\u00e4ki, K., Papapetrou, P., Lijffijt, J.: Visually controllable data mining methods. In: ICDMW, pp. 409\u2013417 (2010)","DOI":"10.1109\/ICDMW.2010.141"},{"key":"8_CR20","unstructured":"Puolam\u00e4ki, K., Oikarinen, E., Henelius, A.: Guided visual exploration of relations in data sets. arXiv preprint arXiv:1905.02515 (2019)"},{"key":"8_CR21","doi-asserted-by":"crossref","unstructured":"Puolam\u00e4ki, K., Oikarinen, E., Kang, B., Lijffijt, J., Bie, T.D.: Interactive visual data exploration with subjective feedback: an information-theoretic approach. In: ICDE, pp. 1208\u20131211 (2018)","DOI":"10.1109\/ICDE.2018.00112"},{"issue":"1","key":"8_CR22","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1145\/2656334","volume":"58","author":"T Ruotsalo","year":"2015","unstructured":"Ruotsalo, T., Jacucci, G., Myllym\u00e4ki, P., Kaski, S.: Interactive intent modeling: information discovery beyond search. CACM 58(1), 86\u201392 (2015)","journal-title":"CACM"},{"issue":"1","key":"8_CR23","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1109\/TVCG.2016.2598495","volume":"23","author":"D Sacha","year":"2017","unstructured":"Sacha, D., et al.: Visual interaction with dimensionality reduction: a structured literature analysis. IEEE Trans. Visual Comput. Graphics 23(1), 241\u2013250 (2017)","journal-title":"IEEE Trans. Visual Comput. Graphics"},{"key":"8_CR24","volume-title":"Exploratory Data Analysis","author":"JW Tukey","year":"1977","unstructured":"Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)"},{"issue":"3","key":"8_CR25","first-page":"2182","volume":"8","author":"M Vartak","year":"2015","unstructured":"Vartak, M., Rahman, S., Madden, S., Parameswaran, A., Polyzotis, N.: SeeDB: efficient data-driven visualization recommendations to support visual analytics. PVLDB 8(3), 2182\u20132193 (2015)","journal-title":"PVLDB"},{"key":"8_CR26","unstructured":"Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: NIPS, pp. 521\u2013528 (2003)"}],"container-title":["Communications in Computer and Information Science","Machine Learning and Knowledge Discovery in Databases"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-43823-4_8","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T17:02:41Z","timestamp":1737133361000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-43823-4_8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020]]},"ISBN":["9783030438227","9783030438234"],"references-count":26,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-43823-4_8","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2020]]},"assertion":[{"value":"28 March 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ECML PKDD","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"W\u00fcrzburg","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2019","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"16 September 2019","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"20 September 2019","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"ecml2019","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/ecmlpkdd2019.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Microsoft CMT","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"733","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"130","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"18% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.04","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"5.3","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"ECML PKDD Workshops Information: single-blind review, submissions: 200, full papers accepted: 70, short papers accepted: 46","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}