Demystifying Exam Scoring: Translating Raw Scores to Scaled Scores

Demystifying Exam Scoring: Translating Raw Scores to Scaled Scores

Liberty Munson (Microsoft)

As part of my "Dissecting Score Reports" series, I've had a lot of questions, comments, confusion around how questions are scored and how that translates to the score you receive. Here are the key points about question and exam scoring:

1) All questions are worth 1 point unless otherwise noted in the text of the question. We recently added some polytomously scored questions on several of our exams. What does "polytomously scored" mean? These are questions that are worth multiple points, and you can earn all, none, or some of those points. Usually a point is awarded for each action that you take. For example, if we ask you to match a word to its definition, you would get a point for each correct match. Currently, most polytomously scored items are really just several multiple choice questions that we've combined into a single question because this is a better experience for the test taker. Imagine if we had 4 word/definition matches that we wanted to test. It makes more sense to include all 4 in a single question rather than asking 4 separate questions to assess this knowledge. This is different from weighting. Weighting is an "all or none" proposition. You either get all the points for that question or none of them. We currently do not have any items like this on our exams.

2) Your final score is simply the sum of the number of points that you earned for each item. This is called a "raw score."

3) Your raw score is translated to a scaled score that ranges from 0-1000 using a simple mathematical conversion. How points are distributed across this range depends on where the passing score is set. Because we use 700 as our common passing score, the number of raw points below the passing score are equally distributed between 0-700 while the number of points above that score are equally distributed from 700-1000.

Why do we scale scores? Passing scores are not arbitrarily set! (Remember 700 does NOT mean 70%!) We use input from subject matter experts who review the difficulty of the questions in the item pool in relation to the skills and abilities of the target audience and provide guidance on where the passing score should be set. As a result, the actual number of questions that you have to answer correctly to pass may vary from one attempt to another if the difficulty of the question set changes. In other words, if you see a more difficult set of questions, it's hardly fair to expect you to be able to answer the same percentage correct as someone who sees an easier set of questions. Because of this, if we simply reported percentages, you wouldn't be able to compare your scores across time because a higher percentage on an easier set of items doesn't mean that you are doing better on the exam than a lower percentage on a more difficult set of items. By the way, this is an industry standard/best practice. If you take an exam and they don't provide scaled scores, the first question you should ask is "how are they ensuring that each administration is psychometrically equivalent and equally difficult?"

This is why you can't interpret your score as a percent of questions answered correctly, and it should never be related to a "grade" that you might get in school.

A (simple?) example:

Imagine that I have a 25 point exam. My scores across these items look like this: 1,0,0,1,1,1,1,3,0,0,1,2,1,0,0,1,1 (yes this is only 17 items but there are some polytomously scored items on this exams; the question where I earned 3 points was worth 5, the next question where I earned 0 was worth 3, and the last question was worth 3 points; all other questions were worth 1 point = 25 total points).

So, my raw score is 14 (add up the numbers). Imagine that the raw cut score that I needed to pass was 17. The score that I see on my score report is 576. The scoring table looks like this:

Raw Score Scaled Score Passing Status
25 1000    pass
24 962    pass
23 925    pass
22 887    pass
21 850    pass
20 812    pass
19 775    pass
18 737    pass
17 700    pass
16 658    fail
15 617    fail
14 576    fail
13 535    fail
12 494    fail
11 452    fail
10 411    fail
9 370    fail
8 329    fail
7 288    fail
6 247    fail
5 205    fail
4 164    fail
3 123    fail
2 82    fail
1 41    fail
0 0    fail

As you can see the scores from 0-16 are evenly spread across the 0-700 range and those from 17-25 are evenly distributed across the 700-1000 range.

What happens if the passing score changes to 12?

Raw Score Scaled Score Passing Status
25 1000    pass
24 976    pass
23 953    pass
22 930    pass
21 907    pass
20 884    pass
19 861    pass
18 838    pass
17 815    pass
16 792    pass
15 769    pass
14 746    pass
13 723    pass
12 700    pass
11 641    fail
10 583    fail
9 525    fail
8 466    fail
7 408    fail
6 350    fail
5 291    fail
4 233    fail
3 175    fail
2 116    fail
1 58    fail
0 0    fail

Again, the raw points below the passing are evenly distributed from 0-700, while those at the passing score and above are evenly distributed between 700-1000. Don't confuse this process with weighting. Each point earned on the exam is worth 1 point regardless of if it's earned through a dichotomously scored item (correct or incorrect) or polytomously scored item (multiple points possible), and those points are scaled through a mathematical conversion that allows for comparisons of your testing events across time. Even though it looks like points are given a weight in this process, they are NOT weighted. This is just math.

So, clearer? Muddier? What other questions do you have about this?

Comments
  • eric.willman.com
    |

    So, the scores are not weighted, they are just distributed across a differing mathematical range depending on the passing score which is scaled based on the difficulty of the question set.  Hmm, I am pretty sure that is actually the definition of weighting.

  • dimitri.shvorob.gmail.com
    |

    Hi Liberty,

    What is the probabilty of getting the same score twice? I shared my statistically puzzling recent experience with 070-467 in this post  

    social.technet.microsoft.com/.../dear-microsoft-could-070467-exam-scoring-be-broken

    Can you actually calculate the number (for 070-467) and, should it be sufficiently low, investigate if there could be an operational problem with the test scoring?

    Thank you.