← Back to Publications

Natural Language Processing for Theoretical Framework Selection in Engineering Education Research

C. Berdanier, C. McComb, and W. Zhu
2020, IEEE Frontier in Education Conference

This research paper presents recent work exploring the power of natural language processing (NLP) methods applied to qualitative engineering education data. As NLP and other machine learning methods are developed for qualitative data, it is important to prioritize the role that theory plays in rigorous qualitative research, where the selection of a theoretical framework serves as the lens by which the research project is framed, results are analyzed, and findings are brought to light. Indeed, the view from a different theoretical lens can highlight novel or new findings. In this work, we seek to explore the viability of NLP methods for helping researchers select appropriate frameworks. In this work, we present our method to train a Python-based NLP algorithm to analyze an existing data set of interview data using one theoretical lens: Community of Practice theory, an oft-used theory in graduate education literature, which is the topic of the interview corpus to investigate. We present and test two methods for developing dictionaries by which to train the algorithm: An expert-curated dictionary and a machine-generated dictionary compiled by mining the theoretical framework sections of published literature employing Community of Practice theory. We apply these two dictionaries to analyze a corpus of 54 interview transcripts investigating graduate engineering attrition. The high dimensional data from NLP can be compared using Principal Component Analysis (PCA) visualization and pairwise distance plots to determine which method results in the most well-defined structure indicating agreement between the dictionary and the corpus of interview transcripts. In the discussion, we highlight opportunities for using these automated methods to help researchers with qualitative data analysis and warn against potential dangers and ethical ramifications for using machine learning and NLP for social science data. This work will have impact on the disciplinary communities working to embed computational language-based methods into engineering education research, and for the qualitative methods communities across social science and education disciplines.