Page 93 - Balancing between the present and the past
P. 93

                                teacher behavior, (3) reformulate unclear items, and (4) formulate new items that they thought were missing. In total, the experts excluded 24 items, reformulated 12 items, and created no new items, resulting in a list of 58 items.
Subsequently, we trained 10 student history teachers on the use of the observation
instrument, and they observed one videotaped history lesson using the instrument.
We calculated Cronbach’s alpha (jury alpha) for their observation scores to explore
the instrument’s internal consistency. This jury alpha was .58 (poor internal
consistency). After deleting 10 items that threatened internal consistency, the jury
alpha increased to .81 (good internal consistency). Examples of the deleted items are
“appoints relations between historical phenomena,” “uses substantive concepts when 4 explaining historical phenomena,” and “uses general schemas to explain historical phenomena.” We asked the experts in the first panel session to determine whether
the 10 deleted items could jeopardize the instrument’s face and content validity; they found no threats. The same experts were also asked to observe three videotaped history lessons taught by three different history teachers using the 48 items. After discussing each lesson, three items (“explains the importance of placing phenomena in a chronological framework,” “explains the importance of placing phenomena in a spatial framework,” and “explains the importance of viewing phenomena from different dimensions”) led to strong disagreement among the experts; thus, we deleted these items. This resulted in a total list of 45 items in the first version of the Framework for Analyzing the Teaching of Historical Contextualization (FAT-HC).
4.4.3 Research design
Following Hill et al. (2012), we adopted generalizability theory to explore the instrument’s dimensionality and to determine its reliability (Brennan, 2001; Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Shavelson & Webb, 1991). Compared to the classical test theory, generalizability theory is more informative and useful in educational systems because the classical test theory considers only one source of measurement error at a time. Additionally, it does not result in specific information on how many forms, items, occasions, or observers are required (Shavelson, Webb, & Rowley, 1989). A generalizability study (G-study) can accommodate any observational situation and is restricted by only the practical limitations of data collection and software (Lei, Smith, & Suen, 2007). A G-study views a behavioral measurement (for example, an observed score) as a sample from a universe of admissible observations. Each aspect (called a facet) in the measurement procedure is considered a possible
Testing an observation instrument
 91






















































































   91   92   93   94   95