Page 95 - Balancing between the present and the past
P. 95
Table 12. Observers’ characteristics
1 Female 29
2 Female 32
3 Male 32
4 Male 33
5 Male 29
Masters Masters Masters Masters Masters
7 7 8 7 8
Dutch Dutch Dutch Dutch Dutch
Testing an observation instrument
Observer
Gender
Age
Educational qualification
Years’ work experience
Nationality
We videotaped two different lessons for each
taught in the two highest tracks of secondary
system. We observed only the lessons for upper secondary school students in the
two highest tracks because the Dutch formal exam program considers the ability to 4 perform historical contextualization to be an important aim for these students (Board
of Tests and Examinations, 2015). A total of 267 students, with a mean age of 16.2
(SD = 0.7) years old, were involved. The mean duration of analyzed lessons was 39 min
(SD = 2.4). Each observer individually evaluated the 10 videotaped lessons using the
developed observation instrument, yielding a total of 50 observations.
4.4.5 Training observers to use the instrument
All observers received a 4-hour training. In this training, we used three videotaped history lessons taught by three history teachers (one female teacher with more than 15 years of work experience, one male teacher with 4 years of work experience, and one male teacher with more than 25 years of work experience) from three different schools as training materials. One lesson was about the Ancient Roman period, one was about the Middle Ages, and one was about the Second World War. These three lessons were not used in our data analyses. The observers received an explanation of the 45 items and evaluated the videotaped lessons using a training version of the observation instrument that included more in-depth explanations of the items. After the observers observed each videotaped lesson, their results were discussed, and some items were clarified by the trainers to minimize inter-rater bias.
4.4.6 Data analysis
To explore the instrument’s dimensionality, we conducted a G-study at the item level with seven facets in a crossed design. To estimate the reliability of our instrument and produce a composite of scores with maximum generalizability, we conducted a new G-study and employed multivariate generalizability using a “t × l × o” design, where t represents the observed history teachers, l represents the number of observed
teacher (n = 5), and all lessons were education in the Dutch educational
93