Abstract
The voices of racial minority groups have rarely been examined systematically with large-scale text analysis in political science. This study fills such a gap by applying an integrated classification framework to the analysis of the commonalities and differences in political issues that appeared in 78,305 articles from Asian American and African American newspapers from the 1960s to the 1980s. The automated text classification shows that Asian American newspapers focused on promoting collective gains more often than African American newspapers. Conversely, African American newspapers concentrated on preventing collective losses more than Asian American newspapers. The content analysis demonstrates that the issue priorities varied between the corpora, especially with respect to policy contexts. Gaining access to government resources was a more urgent issue for Asian Americans, while reducing or ending state violence, such as police brutality, was a more pressing matter for African Americans. It also helped avoid extreme interpretations of the machine coding, as the misalignment of political agendas between the two corpora widened up to 10 times when the training data were measured using the minimum, rather than the maximum, reliability threshold.
Similar content being viewed by others
Data availability
Not available due to copyright restrictions.
Code availability
All replication files can be found at https://github.com/jaeyk/content-analysis-for-evaluating-ML-performances.
Notes
For more information, see https://www.icpsr.umich.edu/icpsrweb/ICPSR/series/163.
For more information, see https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/6841.
For more information, see https://www.pewresearch.org/topics/national-survey-of-latinos/.
For more information, see https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/3832.
For more information, see https://naasurvey.com/data/.
For more information, see https://cmpsurvey.org/.
For more information, see https://www.proquest.com/products-services/ethnic_newswatch.html.
Greaves, Kay, “Davis Bail is Canceled; Poindexter Out on Bail,” Oakland Post, October 22, 1970: 13.
Chin, Karen, ”Pacific/Asian Elderly Conference: Social Service Providers Must Get Their ’Act Together’,” International Examiner, April 30, 1979.
Fleming, Thomas,“Thomas Fleming’s Weekly Report,” Sun Reporter, August 2, 1975:7
Berling, Lynn, “‘Post’ Tries to be Only Daily for Black Community’,’ Oakland Post, February 15, 1981: 6.
For more information, see https://www.proquest.com/products-services/ethnic_newswatch.html.
In practice, a kappa smaller than or equal to 0 indicates no agreement, a kappa in the 0.01–0.02 range indicates slight agreement, a kappa in the 0.21–0.40 range indicates fair agreement, a kappa in the 0.41–0.60 range indicates moderate agreement, a kappa in the 0.61–0.80 range indicates substantial agreement, and a kappa in the 0.81–1 range indicates an almost perfect agreement ([76], 279).
Anonymous, “Cairo, Illinois: From Exploitation To Freedom,” Sun Reporter, March 27, 1971: 8.
Iwamoto, Gary, “A Picture of the 70’s,” International Examiner, December 31, 1979: 8.
References
Alexander, M. (2012). The New Jim Crow: Mass incarceration in the age of colorblindness. New York: The New Press.
Bailey, M. J., & Danziger, S. (2013). Legacies of the war on poverty. New York: Russell Sage Foundation.
Barberá, P., Boydstun, A.E., Linn, S., McMahon, R., & Nagler, J. (2019). “Automated text classification of news articles: a practical guide.” Political Analysis: 1–24.
Bartels, L. M. (1999). Panel effects in the American National election studies. Political Analysis, 8(1), 1–20.
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587–604.
Berelson, B. (1952). Content analysis in communication research. Free press.
Beretta, E., Vetrò, A., Lepri, B., & De Martin, J.C. (2018). “Ethical and Socially-Aware Data Labels.” In Annual International Symposium on Information Management and Big Data, 320–327. Springer.
Birkimer, J. C., & Brown, J. H. (1979). Back to basics: Percentage agreement measures areaAdequate, but there are Easier W ays. Journal of Applied Behavior Analysis, 12(4), 535–543.
Brady, H. E. (2019). The challenge of big data and data science. Annual Review of Political Science, 22, 297–323.
Breiman, L. (1997). Arcing the Edge. Technical report. Technical Report 486, Statistics Department, University of California, Berkeley.
Brilliant, M. (2010). The color of America has changed: How Racial Diversity shaped civil rights reform in California, 1941–1978. Oxford: Oxford University Press.
Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M. (2010). The Balanced Accuracy and Its Posterior Distribution. In 2010 20th International Conference on Pattern Recognition, 3121–3124. IEEE.
Brooks, C. (2009). Alien neighbors, foreign friends: Asian Americans, housing, and the transformation of Urban California. Chicago: University of Chicago Press.
Campbell, A., Converse, P. E., Miller, W. E., & Stokes, D. E. (1980). The American voter. Chicago: University of Chicago Press.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the Multitrait-multimethod Matrix. Psychological Bulletin, 56(2), 81.
Chae, D. H., Takeuchi, D. T., Barbeau, E. M., Bennett, G. G., Lindsey, J., & Krieger, N. (2008). Unfair treatment, racial/ethnic discrimination, ethnic identification, and smoking among Asian Americans in the national Latino and Asian American Study. American Journal of Public Health, 98(3), 485–492.
Chan, A. B. (1983). Gold mountain: The Chinese in the new world. Vancouver: New Star Books.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In KDD ’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
Chin, D. (2001). Seattle’s international district: The making of a Pan-Asian American community. Washington: University of Washington Press.
Chin, G. (2015). Building community, Chinatown style: a half century of leadership in San Francisco Chinatown. San Francisco: Friends of Chinatown Community Development Center.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Covin, D. (2009). Black politics after the civil rights movement: Activity and beliefs in sacramento, 1970–2000. : McFarland.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.
Danziger, S., & Haveman, R. (1981). The Reagan budget: A sharp break with the past. Challenge, 24(2), 5–13.
Dawson, M. (1994a). A Black Counterpublic?: Economic earthquakes, racial agendas, and black politics. Public Culture, 7(1), 195–223.
Dawson, M. (1994b). Behind the Mule: Race and class in African–American politics. Princeton: Princeton University Press.
Dawson, M. (2001). Black visions: The roots of contemporary African–American political ideologies. Chicago: University of Chicago Press.
Denny, M., & Spirling, A. (2017). Text preprocessing for unsupervised learning: Why it matters, When it misleads, and What to do about it.Political Analysis.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pretraining of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805.
Elish, M. C., & Boyd, D. (2018). Situating methods in the magic of big data and AI. Communication Monographs, 85(1), 57–80.
Espiritu, L. Y. (1992). Asian American panethnicity: Bridging institutions and identities. Philadelphia: Temple University Press.
Fraga, L. R., Garcia, J. A., Hero, R. E., Jones-Correa, M., Martinez-Ebers, V., & Segura, G. M. (2011). Latinos in thenew millennium: An almanac of opinion, behavior, and policy rreferences. Cambridge: Cambridge University Press.
Freund, Y., & Schapire, R. (1999). A short introduction to boosting. Journal of Japanese Society For Artificial Intelligence, 14(771–780), 1612.
Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics: 1189–1232.
Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–407.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J.W., Wallach, H., Hal Daumé III, & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
Geiger, R.S., Yu, K., Yang, Y., Dai, M., Qiu, J., Tang, R., Huang, J. (2020). Garbage in, Garbage out? Do machine learning application papers in social computing report where human-labeled training data comes from?” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 325–336.
Gitelman, L. (Ed.). (2013). Raw data. Is an Oxymoron: MIT press.
Goth, G. (2016). Deep or shallow, NLP is breaking out. Communications of the ACM.
Gottschalk, M. (2016). Caught: The Prison State and the lockdown of American Politics. Princeton: Princeton University Press.
Grimmer, J., Messing, S., & Westwood, S. J. (2012). How words and money cultivate a personal vote: The effect of legislator credit claiming on constituent credit allocation. American Political Science Review, 106(4), 703–719.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297.
Grumbach, J. M. (2018). From backwaters to major policymakers: Policy polarization in the states, 1970–2014. Perspectives on Politics, 16(2), 416–435.
Gurin, P., Hatchett, S., Jackson, J.S. (1990). Hope and independence: Blacks’ response to electoral and party politics. Russell Sage Foundation.
Harris-Lacewell, M. V. (2010). Barbershops, bibles, and BET: Everyday talk and black political thought. Princeton: Princeton University Press.
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
Hinton, E. (2015). “A war within our own boundaries”: Lyndon Johnson’s great society and the rise of the carceral state. The Journal of American History, 102(1), 100–112.
Hinton, E. (2016). From the war on poverty to the war on crime. Harvard: Harvard University Press.
Hirschman, C., & Wong, M. G. (1981). Trends in socioeconomic achievement among immigrant and native-born Asian–Americans, 1960–1976. The Sociological Quarterly, 22(4), 495–514.
Ho, F., & Mullen, B. V. (2008). Afro Asia: Revolutionary political and cultural connections between African Americans and Asian Americans. Duke: Duke University Press.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945–960.
Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54(1), 229–247.
Hwang, W.-C., & Goto, S. (2008). The impact of perceived racial discrimination on the mental health of Asian American and Latino College Students. Cultural Diversity and Ethnic Minority Psychology, 14(4), 326.
Ishizuka, K. (2016). Serve the people: Making Asian America in the long sixties. Brooklyn: Verso Books.
Joseph, P. E. (2006). The black power movement: Rethinking the civil rights-black power Era. : Taylor & Francis.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Kannegaard, J.S. (2008). The press of a people: The evolution of Spanish-language news and the changing political community. PhD diss., MassachusettsInstitute of Technology.
Kaufmann, K. M. (2003). Cracks in the rainbow: Group commonality as a basis for Latino and African–American political coalitions. Political Research Quarterly, 56(2), 199–210.
Kim, J. (2020). Racism is not enough: Minority coalition building in San Francisco, Seattle, and Vancouver. Studies in American Political Development, 34(2), 195–215.
King, D. S., & Smith, R. M. (2005). Racial orders in American political development. American Political Science Review, 99(1), 75–92.
Kuramoto, F. H. (1976). Lessons learned in the federal funding game. Social Casework, 57(3), 208–218.
Kwong, P. (1996). The new Chinatown. New York: Macmillan.
Lai, D.C. (2003). From downtown Slums to Suburban Malls: Chinese migration and settlement in Canada. In The Chinese Diaspora: Space, Place, Mobility, and Identity, edited by Laurence JC Ma and Carolyn L Cartier, 311–36. Rowman & Littlefield Publishers, Inc Lanham, Boulder, New York, Oxford.
Lee, E. (2003). At America’s gates: Chinese immigration during the exclusion Era, 1882–1943. Carolina: University of North Carolina Press.
Li, W. (2006). From urban enclave to ethnic Suburb: New Asian communities in Pacific Rim Countries. Hawaii: University of Hawaii Press.
Lien, P., Margaret Conway, M., & Wong, J. (2004). The politics of Asian Americans: Diversity and community. Abingdon: Routledge.
Linder, F., Desmarais, B., Burgess, M., & Giraudy, E. (2018). Text as policy: Measuring policy similarity through bill text reuse. Policy Studies Journal.
Ling, H., & Austin, A. W. (2015). Asian American history and culture: An encyclopedia. Abingdon: Routledge.
Lombard, M., Snyder-Duch, J., & Bracken, C. C. (2002). Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28(4), 587–604.
Maeda, D. (2005). Black Panthers, red guards, and Chinamen: Constructing Asian American identity through performing blackness, 1969–1972. American Quarterly, 57(4), 1079–1103.
Maeda, D. (2012). Rethinking the Asian American Movement. Abingdon: Routledge.
Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the ACM, 8(3), 404–417.
Mason, L., Baxter, J., Bartlett, P.L., & Frean, M.R. (2000.) Boosting algorithms as gradient descent.” In Advances in Neural Information Processing Systems, 512–518.
McClain, P. (2018). Can we all get along?: Racial and ethnic minorities in American politics. Abingdon: Routledge.
McDaniel, E. L. (2009). Politics in the pews: The political mobilization of black churches. Ann Arbor: University of Michigan Press.
McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia Medica, 22(3), 276–282.
Meng, X.-L. (2018). Statistical Paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685–726.
Mikhaylov, S., Laver, M., & Benoit, K. R. (2012). Coder reliability and misclassification in the human coding of party manifestos. Political Analysis, 20(1), 78–91.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D., Gebru, T. (2019). Model cards for model reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229.
Mora, G. C. (2014). Making Hispanics: How activists, bureaucrats, and media constructed a new American. Chicago: University of Chicago Press.
Muñoz, C. (1989). Youth, identity, power: The Chicano movement. Brooklyn: Verso.
Murakawa, N. (2014). The first civil right: How liberals built prison America. Oxford: Oxford University Press.
Nelson, L.K. (2017). Computational grounded theory: A methodological framework. Sociological Methods & Research: 0049124117729703.
Nelson, L. K. (2019). To measure meaning in big data, don’t give me a map, give me transparency and reproducibility. Sociological Methodology, 49(1), 139–143.
Nelson, L.K., Burk, D., Knudsen, M., McCall, L. (2017). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods.Sociological Methods & Research: 0049124118769114.
Ngai, M. M. (2014). Impossible subjects: Illegal aliens and the making of modern America. Princeton: Princeton University Press.
Omi, M., & Winant, H. (1986). Racial formation in the United States: from the1960s to the 1990s (2nd ed.). New York: Routledge.
Omi, M., & Winant, H. (1994). Racial formation in the United States: from the 1960s to the 1990s (2nd ed.). New York: Routledge.
Orleck, A. (2011). The war on poverty from the grass roots up. In A. Orleck & L. G. Hazirjian (Eds.), The war on poverty: A new grassroots history, 1964–1980. Athens: University of Georgia Press.
Pierson, P. (2003). Big, slow-moving, and . . . invisible: Macrosocial processes in the study of comparative politics. Edited by James Mahoney and Dietrich Rueschemeyer: 177–207.
Prashad, V. (2002). Everybody was Kung Fu Fighting: Afro-Asian connections and the Myth of Cultural Purity. Beacon: Beacon Press.
Reardon, S. F., Kalogrides, D., & Shores, K. (2019). The geography of racial/ethnic test score gaps. American Journal of Sociology, 124(4), 1164–1221.
Roberts, M.E., Stewart, B.M., & Tingley, D. (2015). STM: R package for structural topic models. R Package Version 1.1. 0.
Rodriguez, A. (1999). Making Latino news: Race, language, class. Thousand Oaks: SAGE Publications.
Rothstein, R. (2017). The color of law: A forgotten history of how our government segregated America. New York: Liveright Publishing.
Self, R. O. (2005). American babylon: Race and the struggle for postwar Oakland. Princeton: Princeton University Press.
Sides, J. (2006). LA city limits: African American Los Angeles from the great depression to the present. California: University of California Press.
Skocpol, T., & Theda, S. (1979). States and social revolutions: A comparative analysis of France. Russia: Cambridge University Press.
Slater, D., & Ziblatt, D. (2013). The enduring indispensability of the controlled comparison. Comparative Political Studies, 46(10), 1301–1327.
Soss, J., Hacker, J. S., & Mettler, S. (2007). Remaking America: Democracy and public policy in an age of inequality. New York: Russell Sage Foundation.
Suen, H. K., & Lee, P. S. C. (1985). Effects of the use of percentage agreement on behavioral observation reliabilities: a reassessment. Journal of Psychopathology and Behavioral Assessment, 7(3), 221–234.
Tate, K. (1993). From protest to politics: The new black voters in American elections. Harvard: Harvard University Press.
Tibshirani, R. (1996). Regression Shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, 58(1), 267–288.
Trounstine, J. (2018). Segregation by design: Local politics and inequality in American Cities. Cambridge: Cambridge University Press.
Umemoto, K. (1989). “On Strike!” San Francisco state college strike, 1968–69: The role of Asian American students. Amerasia Journal, 15(1), 3–41.
Vincent, T. G. (1973). Voices of a Black Nation: Political journalism in the Harlem Renaissance. : Ramparts Press.
Watkins, R. (2012). Black power, yellow power, and the making of revolutionary identities. Jackson: University Press of Mississippi.
Wei, W. (1993). The Asian American movement. Philadelphia: Temple University Press.
Wilkerson, J., & Casas, A. (2017). Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20, 529–544.
Williams, D. R., Lawrence, J. A., & Davis, B. A. (2019). Racism and health: Evidence and needed research. Annual Review of Public Health, 40, 105–125.
Wolman, H. (1986). The Reagan urban policy and its impacts. Urban Affairs Quarterly, 21(3), 311–335.
Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: emerging constituents and their political identities. New York: Russell Sage Foundation.
Yu, B. (2013). Stability. Bernoulli, 19(4), 1484–1500.
Zaller, J. R., et al. (1992). The nature and origins of mass opinion. Cambridge: Cambridge University Press.
Zhang, H. (2005). Exploring conditions for the optimality of Naive Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(02), 183–198.
Zhou, M. (2010). Chinatown: The socioeconomic potential of an urban enclave. Philadelphia: Temple University Press.
Zipf, G. K. (1936). The psycho-biology of language: An introduction to dynamic philology. Abingdon: Routledge.
Zipf, G. K. (1949). Human behavior and the principle of least effort. Boston: Addison-Wesley.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kim, J.Y. Integrating human and machine coding to measure political issues in ethnic newspaper articles. J Comput Soc Sc 4, 585–612 (2021). https://doi.org/10.1007/s42001-020-00097-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-020-00097-2