New Zealand Principal Magazine

NZCER – Explaining Star Assessment Changes

The New Zealand Council for Education Research (NCER) · 2013 Term 2 June Issue · Practice

NZCER – EXPLAINING STAR ASSESSMENT CHANGES The New Zealand Council for Educational Research (NZCER) team discusses standardised test development and addresses concern about the revised STAR test.

Anyone working in schools today will be aware of Many of you will be familiar with how scales work. Students how much assessment practice has changed in recent years. sit the test and receive a raw test score (e.g 22 out of 40), which is Teachers and school leadership teams need to have in place converted onto the relevant scale (for instance, the PAT: Reading robust processes for assessment and for reporting and analysis Comprehension scale or e-asTTle mathematics scale). The same of student achievement data scale is used for all tests in the at all levels and over time. NZCER stands behind the normative series. For instance, all of the It is a big ask to do this, let PAT: Reading Comprehension alone to keep up with 21st information produced for STAR . . . tests report onto the PAT: century assessment practice Comprehension scale. This process gives us the empirical Reading and the continually changing This can happen because the assessment landscape. process used to convert raw data required for generating a NZCER has been involved scores to scale scores takes into in test development for many robust scale and reliable account the difficulty of the years and has taken a lead national reference data. questions in the test. Because role in assessment thinking each scale covers achievement and practice. The Progressive Achievement Tests (PATs) date across several year levels, an individual student’s progress can from the 1970s and from 2006, the organisation has invested be tracked over time. As students learn more and move through a lot of time and money redeveloping them to draw upon the year levels, they should move up the scale. For example, international current thinking and research in test development. in vocabulary most students in Year 4 start off with a score of This process began with PAT Maths and the most recent around 28 units on the PAT: Reading Vocabulary scale. By Year test redeveloped was STAR Reading (the long name is the 10, most will score around 66 units. Supplmentary Tests of Achievement in Maths). We are currently Scale scores are reported as a range, such as 35 plus or minus working on a new PAT, Punctuation and Grammar, which is 3 units (range of 32 to 38). This indicates the level of precision being piloted and is scheduled for release in term four in time of the result. The key message here is that test scores can never for use in 2014. be perfectly precise. A student’s score will generally vary from What is a standardised test? Standardised tests are designed to one administration to the next. When thinking about a student’s be administered and scored in a consistent (standard) manner. score we are best to think of a probable range on the scale. Standardised tests are often developed to produce information Once a raw score has been converted to a scale score, it that allows us to compare a student’s level of achievement with is possible to see how your students’ results compare to the a nationally representative sample of students at the same achievement of nationally representative groups of students at year level. There are a number of reasons teachers might use a different year levels. NZCER works hard to construct nationally standardised test but there are also other assessment options. representative samples of students by carefully selecting schools NZCER has developed a number of standardised tests such as to be involved in national trials. This allows us to describe the the PATs and STAR and there are others available to schools distribution of achievement for each year level on the scale. For such as e-asTTle. instance, when students sit PAT: Mathematics we can report that a scale score of 60 represents very high achievement for Using the scale score a Year 4 student but represents below average achievement One of the important new features of the revised PATs and STAR for a student in Year 10. is the development of underlying measurement scales. A scale The move to reporting using the scale has been a substantial score provides: shift for many schools. As noted, a real advantage of a scale score is that at points along the scale, the types of knowledge ■■ a clear sense of progress along an equal interval scale and skills that a student is capable of are described. This can ■■ the ability to identify the average expected progress for be invaluable for teachers, but also useful in a parent interview. students Teachers can discuss the types of skills and knowledge the student ■■ the ability to describe the knowledge and skills a student is is comfortable with, working at, or still finds challenging and typically able to demonstrate at a given point on the scale indicates some next steps to work towards. The descriptions of (because the test questions each have a difficulty rating that the scales are in the teacher manuals for each test. can be placed on the same scale as the student’s score).

The issues with STAR STAR (Supplementary Tests of Achievement in Reading) was originally developed by Emeritus Professor Warwick Elley and published in 2001. Dr Elley was also involved in the redevelop­ ment which began in 2010. There were three main drivers for the revision:

STAR Scale vs Stanine

The original STAR was a collection of three independent pairs of parallel tests, each targeted at a specific year level or range of levels (e.g. Year 4 to 6). It was difficult to measure progress when students moved between some year levels, in particular from years 3 to 4 and 6 to 7. ■■ There was some evidence that the Year 4 to 6 test did not discriminate well between higher achieving students, particularly in Year 6 (a ceiling effect). ■■ The content needed updating, for example, it referred to a floppy disk which means nothing to most students in 2013.

Year 4

Scale Score (STAR)

■■

The revised test was published in time for use in the 2012 year. Towards the end of 2011 we started to get contacted by a number of schools concerned at the results their students were getting in STAR when they were tested in term four. We soon realised there was some confusion about how to convert end-of-year test results into stanines. We produced a brochure with more detailed information and tried to get the message out to schools, but we know that some schools were left confused and felt they had got inflated results from STAR. It is great to be able to communicate directly with principals on this issue. While we have had opportunities to respond to various media articles, we have not been able to go into the kind of detail that is possible here. For example, in an article that appeared in the Listener, none of our explanations involving stanines and scale scores were used as they were seen as being too complicated for a general audience. That made it difficult to get to the nub of the issue. NZCER stands behind the normative information produced for STAR. For the standardisation trial schools across New Zealand were systematically sampled to ensure that the range of students involved represented the range of achievement present in each year level. In all, over 7,500 students took part in the trial (about 1,000 at each year level). This process gives us the empirical data required for generating a robust scale and reliable national reference data. Since then, we’ve been able to check the normative information for STAR by using data processed through the NZCER marking service. (see diagram) That exercise showed that the published norms for STAR strongly agree with norms generated from the data we have from schools that are making using of the tests as part of their normal classroom practice. When schools want to use the revised STAR to make normative comparisons they can be confident that the published norms for the new tests accurately

Test Score

Year 3

This diagram shows the stanine distribution for each year alongside the scale. You can see how a STAR scale score of 55 is a stanine 5 using the year 3, term one norms, but stanine 2 for a year 4 student in term one. That is because students make considerable progress between years 3–4, so the term one norms on which stanines are based move up the scale at each year level. It is important to use the correct stanine lookup table for the year level. For example, when testing year 3 students in term 4, NZCER recommends using the year 4 stanine table, as the students have benefitted from a year of teaching and learning.

reflect achievement norms for New Zealand students. So why did many schools feel their results were inflated? Reporting with stanines Under the original STAR test, schools reported results using stanines. These were generated using data from a norming study undertaken for each level of the test. Although the data was collected at one point in time, by mathematically modeling the growth between year levels it was possible to estimate stanines for different times of the year. That meant for instance, that students could do the test in term four and the teacher was able to look up a table that gave a term four stanine. When we revised the test we again did one national norming study which was held in March. This time we did not estimate 2012 Marking Service Data vs STAR

Scale Score (STAR)

Making service data

Norms

Year Level

Comparison of norms based on the 2012 Marking Service data (36,000 students) with published norms for STAR

stanines related to taking the test at different times of the year. Instead, we provided stanines for the different year levels and recommended that when normative comparisons were needed teachers should select the most relevant national reference group. For instance, for end of year testing, we recommend using the norms for the next year level up. This is because the student has had almost a full year of learning and is “more like” a student sitting a test at the beginning of the next year. See diagram for a further explanation. Schools using the revised PAT tests are used to doing this but we know that for STAR, many schools continued to reference their students to the beginning of year norms when testing in term four. If a lot of good teaching and learning has occurred over the year, you would expect the students to be well ahead of the beginning of year norms. It turns out they were – hence the impression of inflated results. We’ve often asked why we didn’t simply calculate norms for the end of the year. Firstly, the only true norms are based on empirical student data from the March norming studies. Norms for different times of the year were always only calculated using a formula rather than real data from students.

Secondly, we have the expertise and technology now to work with scale scores, which provide a measure of progress for students within or across years that doesn’t depend on normative comparisons and that allows teachers to get a sense of the competencies associated with different score levels. Using scale scores can be quite a shift to make from what schools were used to for many years but it is a shift that we believe needs to happen to improve the quality of reporting at a range of levels. We advise schools to use the reference year level above when looking at Term 4 data if reporting with stanines because it gives a more realistic picture given students at this stage have benefited from almost a full year of schooling. However, we believe that scale scores rather than stanines should be emphasized when reporting on achievement. A scale score locates a student’s achievement level on the scale regardless of what time of the year the test was taken. At any point in time, you can look at a scale score and see the progress of a student or group of students. As students move up through the years, one can expect to see an increased level of achievement through an increasing scale score. This makes it easier to communicate progress than with a stanine result where students can often stay at the same stanine level from year to year. We are constantly looking for ways to improve the usability of our tests and the formats for reporting and analysis. For example, we have developed a new report that shows the spread of students at each year level (and point in time) for a school. The concerns over STAR also show we need to do a better job at supporting schools with the use of our standardized tests. We have a number of ideas for how to do this. One initiative we have started recently is a blog on assessment, which can be found at http://www.nzcer.org.nz/nzcer-on-assessment

LOOKING FOR A COMPLETE

SCHOOL SOLUTION?

0800 OFFICE WWW.OPD.CO.NZ

OUNT AN ACC D

N

DE

R*

OPEN A

TA GE

• Fast, friendly service • Passionate, helpful staff • Dedicated Account Managers that can visit you when you need them • Outlets nationwide for easy access • Easy online shopping on our website • Multiple order options to suit your needs • Extensive ranges in all major categories • Competitive pricing everyday and during back to school • Relationships built on long term partnerships • Discount cards for your staff • Locally owned and operated

EW

O

With 20 years experience in office supplies, what we don’t know about the products you use isn’t worth knowing! Contact us for a copy of our School Solutions Catalogue or to make an appointment for an Account Manager to meet to discuss your needs. We can offer you a full solution for your school.

N

YOU NEED AN OFFICE SUPPLY HERO!

YO

UR NEX

R O T

OPEN A NEW ACCOUNT… AND RECEIVE A $100 CREDIT TOWARDS YOUR NEXT ORDER! Terms and conditions: Open a new account between 06/05/2013 and 12/07/2013 and receive a $100 ex GST credit applied to your second order. Second order must be valued at $200 ex GST or more. Credit will be applied to the account at the Dealership of issue and must be spent at the same Dealership. Credit is not exchangeable for cash and must be spent by 31/08/2013. Present this voucher to redeem this offer. Account application and approval criteria applies.