Abstract
With the onset of massive cosmological data collection through mediums such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are, in fact, labelled as ”Uncertain”. This has driven us to conduct experiments with data obtained from the SDSS database using each galaxy’s right ascension and declination values, together with the Galaxy Zoo morphology class label, and the k-means clustering algorithm. This paper identifies the best attributes for clustering using a heuristic approach and, accordingly, applies an unsupervised learning technique in order to improve the classification of galaxies labelled as ”Uncertain” and increase the overall accuracies of such data clustering processes. Through this heuristic approach, it is observed that the accuracy of classes-to-clusters evaluation, by selecting the best combination of attributes via information gain, is further improved by approximately 10-15%. An accuracy of 82.627% was also achieved after conducting various experiments on the galaxies labelled as ”Uncertain” and replacing them back into the original data set. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ball, N.M., Brunner, R.J.: Data Mining and Machine Learning in Astronomy. International Journal of Modern Physics D, 61 (2009)
Stoughton, C., Lupton, R.H., Bernardi, M., Blanton, M.R., Burles, S., Castander, F.J., et al.: Sloan Digital Sky Survey: Early Data Release. The Astronomical Journal 123(1), 485 (2007)
Borne, K.: Scientific Data Mining in Astronomy. In: Next Generation of Data Mining, pp. 91–114 (2009)
Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A.: A Bayesian Approach to Star-Galaxy Classification. In: Monthly Notices of the Royal Astronomical Society, pp. 2286–2302 (2011)
Kamar, E., Hacker, S., Horvitz, E.: Combining Human and Machine Intelligence in Large-Scale Crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, pp. 467–474 (2012)
de la Calleja, J., Fuentes, O.: Automated Classification of Galaxy Images. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 411–418. Springer, Heidelberg (2004)
Gauci, A., Adami, K.Z., Abela, J.: Machine Learning for Galaxy Morphology Classification. arXiv:1005.0390, pp. 1–9 (2010)
Vasconcellos, E.C., de Carvalho, R.R., Gal, R.R., LaBarbera, F.L., Capelato, H.V., Velho, H.F.C., Ruiz, R.S.R.: Decision Tree Classifiers for Star/Galaxy Separation. The Astronomical Journal 141, 189 (2011)
Banerji, M., Lahav, O., Lintott, C.J., Abdalla, F.B., Schawinski, K., Bamford, S.P., Andreescu, D., Murray, P., Raddick, M.J., Slosar, A., Szalay, A., Thomas, D., Vandenberg, J.: Galaxy Zoo: Reproducing Galaxy Morphologies Via Machine Learning. In: Monthly Notices of the Royal Astronomical Society, pp. 342–353 (2010)
Baehr, S., Vedachalam, A., Borne, K.D., Sponseller, D.: Data Mining the Galaxy Zoo Mergers. In: 2010 Conference on Intelligent Data Understanding (2010)
Ball, N.M., Loveday, J., Fukugita, M., Nakamura, O., Okamura, S., Brinkmann, J., Brunner, R.J.: Galaxy Types in the Sloan Digital Sky Survey Using Supervised Artificial Neural Networks. In: Monthly Notices of the Royal Astronomical Society, pp. 1038–1046 (2004)
Scaringi, S., Cottis, C.E., Knigge, C., Goad, M.R.: Broad Absorption Line Quasar Catalogues with Supervised Neural Networks. arXiv:0810.4396 (2008)
Bazell, D., Peng, Y.: A Comparison of Neural Network Algorithms and Preprocessing Methods for Star-Galaxy Discrimination. The Astrophysical Journal Supplement Series 116(1), 47 (2009)
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Nave Bayes. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 249–256 (2002)
Gao, D., Zhang, Y.X., Zhao, Y.H.: Random Forest Algorithm for Classification of Multi-wavelength Data. Research in Astronomy and Astrophysics 9(2), 220 (2009)
Von Luxburg, U., Bousquet, O., Belkin, M.: Limits of Spectral Clustering. In: Advances in Neural Information Processing Systems (NIPS), pp. 857–864 (2005)
Bradley, P.S., Fayyad, U., Reina, C.: Scaling EM (Expectation-Maximization) Clustering to Large Databases. In: Microsoft Research (1998)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: Hierarchical Clustering Using Dynamic Modeling. Computer 32(8), 68–75 (1999)
Ding, C., He, X.: Cluster Merging and Splitting in Hierarchical Clustering Algorithms. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 139–146 (2002)
Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Le Roux, N., Ouiment, M.: Out-of-Sample Extensions for Lle, Isomap, Mds, Eigenmaps and Spectral Clustering. In: Advances in Neural Information Processing Systems, vol. 16, pp. 177–184 (2004)
Huang, Z.: Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Alsabti, K., Ranka, S., Singh, V.: An Efficient K-Means Clustering Algorithm. Electrical Engineering and Computer Science (43) (1997)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Lintott, C., Schawinski, K., Bamford, S., Slosar, A., Land, K., Thomas, D., et al.: Galaxy Zoo 1: Data Release of Morphological Classifications for Nearly 900.000 Galaxies. Monthly Notices of the Royal Astronomical Society 410(1), 166–178 (2011)
Abazajian, K.N., Adelman-McCarthy, J.K., Agueros, M.A., Allam, S.S., Prieto, C.A., An, D., et al.: The Seventh Data Release of the Sloan Digital Sky Survey. The Astrophysical Journal Supplement Series, 543 (2009)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edwards, K.J., Gaber, M.M. (2013). Identifying Uncertain Galaxy Morphologies Using Unsupervised Learning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-38610-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38609-1
Online ISBN: 978-3-642-38610-7
eBook Packages: Computer ScienceComputer Science (R0)