Rzetelność i trafność egzaminów Cambridge English

Numer JOwS: 
str. 62
Fot. Magazyn testów Cambridge English. Źródło: CELA

Egzaminy i testy językowe tworzone przez Cambridge English mogą być różnie wykorzystywane: „lokalnie”przez nauczycieli, np. dla zbadania postępów uczniów, w celach diagnostycznych (umożliwiających np. planowanie lekcji), lub też na szeroką skalę, jako egzaminy o charakterze doniosłym, których wyniki mogą być uwzględniane w rekrutacji na uczelnie czy przez pracodawców. Zespół Cambridge English Language Assessment, zajmujący się konstruowaniem tego typu egzaminów, ma na celu tworzenie testów odpowiadających powszechnemu zapotrzebowaniu i stanowiących trafne i rzetelne narzędzie ewaluacyjne. Tekst w języku angielskim jest poprzedzony krótką syntezą w języku polskim, opracowaną przez dr Agnieszkę Dryjańską, redaktor JOwS.

Item writing draws on a set of Item-Writer Guidelines, which are test specification documents produced for each exam and used by the item writers who are commissioned to produce tasks. (The slimmer Handbooks published for teachers are also test specification documents, in this case intended to support teaching and learning, rather than generate test items.) The Guidelines take a practical approach and include information about the test construct and task requirements, as well as example items and guidelines about topic choice. There is also information, based on past experience, which can help item writers in the writing process; for example advice on what to look for – and avoid – when searching for a source text. The Guidelines contain information relating to the relevant Common European Framework of Reference (CEFR) level, including appropriate grammar, vocabulary and functions or the amount of support or scaffolding to be included. For example, tests at A1 and A2 tend to offer more visuals to provide support and there are empirically sourced wordlists at these levels to guide learners’ lexical development. At all levels, minimum and maximum word lengths are specified for any input reading for each task.

Questions that do not meet the quality criteria from the outset are rejected or rewritten. Questions that are accepted are taken through a thorough editing process. A key component is pretesting, where material is tested on student populations who are as similar as possible to the future candidate populations. This provides performance data for each task, including how difficult the sample of candidates found the questions and how well the questions discriminated between stronger and weaker candidates. These statistics, along with the expert judgement of a pretest-review panel, enable further adaptations to be made.

Materials which finally meet all requirements go into an item bank, the database for the management of test content, ready for test construction. Along with the item-writing process, the item bank supports test comparability and thus plays a pivotal role in the test construction process. The quality of an item bank depends not only on the number of items it contains, but also on the quantity and quality of the data it holds about the items: more information stored about an item allows for more automated selection processes. Item features such as task type, topic, word count, testing focus and target age-group are logged. Other details, such as accent for Listening tasks, can also be recorded. Statistical information, produced after a test administration from candidate responses, is uploaded to the relevant item. This data includes the difficulty of the item, the facility, which is the proportion of candidates who answered the item correctly, and the discrimination index, which indicates how well the item discriminates between the strong and weak candidates. Items are also classified according to their calibration status. An uncalibrated item has no item statistics; an item that is part-calibrated has been pretested, so with statistics indicating how the item is expected to perform in a live test; fully calibrated items have been included in a live test session and taken by a sufficiently high number of candidates with different first language backgrounds. The information stored in the item bank is then utilized for test construction, when a test is compiled for live use.

Quality assurance for marking

Objectively scored items are those which do not require expert judgement for marking, in other words which can be reliably marked using automated processes or where a key containing all possible answers can be supplied to a human marker. This type of marking is used for the task types currently found in the Reading, Listening and Use of English components of Cambridge English examinations.

For the Writing and (face-to-face) Speaking components, assessment scales tied to the Common European Framework of Reference are used, applied by expert examiners, in this case experienced language teaching professionals. It is self-evident that wherever marks are given by human raters using open scales, a comprehensive quality assurance (QA) system needs to be in place to ensure a standardized application. For both Writing and Speaking in Cambridge English examinations, examiners must fulfil Minimum Professional Requirements in order to be considered. They commit to a process of induction, training and ongoing (annual) certification, as well as monitoring during live marking sessions and regular statistical reliability checks afterwards. Examiners receive feedback about their performance and may only continue to mark if the checks indicate they are on track.

To support these QA processes, Cambridge English Language Assessment employs a highly structured Team Leader System which works on a cascade principle. Speaking Examiners are grouped under Team Leaders. Team Leaders are grouped under Regional Team Leaders, who in turn are overseen by Professional Support Leaders. This system is critically important, as Speaking examiners are recruited locally by exam centres. In other words, quality and consistency of marking have to be maintained across a worldwide network of 20,000+ Speaking examiners. The system is similar for Writing, but less elaborate in terms of the Team Leader System, due to the smaller number of examiners involved, in fewer places.

For the examiners’ yearly certification requirements, a series of exemplar videos for Speaking, and a series of scripts for Writing, are marked by groups of senior examiners. The submissions are analyzed and the outcome is a set of robust, standardized marks. All examiners are given a selection of the exemplar scripts/videos, first a set to analyse, then sets to mark. The marks they award are compared to the standardized marks, and examiners must meet a specified level of accuracy before they are certified to conduct live sessions. All examiners up to and including Professional Support Leaders, must undergo the standardization process including marks collection, much of which today takes place online.

In the case of the Speaking tests, alongside reliability with respect to marking, the aspect of procedure is an additional requirement of the QA system. Here again, the Team Leader System around the world ensures that expertise cascades through the system. Team Leaders guide new examiners through a practical training process, familiarizing them with the procedures as well as the assessment of Cambridge English Speaking tests. This includes examiner roles, test security, test format, materials handling, and the function of the standardized interlocutor scripts (“frames”). Training is followed by certification, which from then on, as for assessment, is an annual standardization process. The monitoring which takes place during live tests every two years, either face-to-face or through audio recordings, covers procedure as well as marking. Thus candidates can rely not only on standardized marking but also on standardized administration of the Speaking tests.

Further reading

This article has outlined processes routinely implemented to produce each Cambridge English test paper version, as well as some of the quality assurance processes governing the use of examiners. If you would like to read more about these and other aspects involved in the production and processing of a Cambridge English examination, including grading and score reporting, use of the Cambridge English Scale, and malpractice detection, you will find longer and more detailed articles in Research Notes 59, February 2015. Research Notes is a quarterly published by Cambridge English Language Assessment, reporting on learning, teaching and assessment. Issue 59 centres on the life cycle of Cambridge English language tests, explaining which analyses and processes are used to ensure delivery of accurate and meaningful results. It is downloadable free of charge from http://www.cambridgeenglish.org/research-notes/.

The following five volumes in of the Studies in Language Testing (SiLT) series set out the theories of communicative language ability underpinning Cambridge English examinations and how they feed into test development and design (all published by Cambridge University Press):

  • Geranpayeh, A. and Taylor, L. (eds) (2013) Examining Listening: Research and Practice in Assessing Second Language Listening. SiLT, vol. 35.
  • Khalifa, H. and Weir, C. J. (2009) Examining Reading: Research and Practice in Assessing Second Language Reading. SiLT, vol. 29.
  • Shaw, S. D. and Weir, C. J. (2007) Examining Writing: Research and Practice in Assessing Second Language Writing. SiLT, vol. 26.
  • Taylor, L. (ed.) (2011) Examining Speaking: Research and Practice in Assessing Second Language Speaking. SiLT, vol. 30.
  • Weir, C.J., Vidakovic, I., Galaczi, E.D. (2013) Measured Constructs. A history of Cambridge English Language Examinations 1913-2012. SiLT, vol. 37.