Date of Award

May 2022

Degree Type


Degree Name

Doctor of Philosophy



First Advisor

Kristen Murphy

Committee Members

Anja Blecking, Joseph Aldstadt, Jorg Woehl, Thomas Pentecost


Assessment Validity Issues, Chemistry Educaiton, Differential Item Functioning, Item Response Theory, Multiple-Choice Assessments, Student Feedback


The following series of experiments were all designed and implemented with the goal of understanding and improving undergraduate students understanding of chemistry. In order to accurately access changes in student’s chemistry understanding, the first chapter focuses on how a student’s chemistry proficiency can be most effectively measured. This includes practical recommendations for the average instructor as well as advanced methods more appropriate for researchers who require highly precise scores. With a better understanding on how to measure student proficiency, the next step was working to improve student understanding through various types of feedback following assessments. On top of establishing a ranking of most to least successful feedback types, an investigation into the types of students benefited (low versus high performing) as well as where the benefit occurred (introductory versus complex concepts) was also conducted. However, one limitation with the methods employed thus far in the determination of student proficiency and tracking of its change, is both assume the assessment results are reflective only of the students chemistry reasoning. To investigate the legitimacy of this assumption, a search for validity issues in many of the multiple-choice questions previously used was conducted and ultimately led to the development of a new more robust method for flagging for validity threats. Another potential factor influencing student performance on assessments outside of their proficiency is the students sex. To address this potential impact, differential item functioning was performed to flag for differences in chemistry performance based on question attributes (conceptual/algorithmic, presence of diagrams, question format, ext.) and content areas. However, traditional DIF methods such as these are often limited by the assumption that the grouping variable chosen is the only factor influencing performance. To correct for this assumption a new method was developed that calculates and corrects for the impact of external factors prior to conducting the desired DIF analysis. This new method was refined through a pilot analysis which corrected for students incoming math/reading proficiency prior to conducting sex-based DIF. Lastly, given the utility of DIF procedures, DIF was also conducted to compare differences between a control and treatment group following an institutional transformation in course prerequisite structures. The impact of changing chemistry prerequisites was first investigated through course-level statistics (e.g. course grade, credits accrued, overall GPA) but followed up with a more refined DIF analysis on exam-level results. The DIF results were not only able to identify the same results as the course-level statistics, but also reveal further insight into where differences were specifically occurring between the control and treatment.