Helping You Master Microsoft Technologies

Item 411: What’s In and What’s Out—Answers to Some Questions (Part 3)

Born To Learn

Born To Learn Blogs

Item 411: What’s In and What’s Out—Answers to Some Questions (Part 3)

I’ve had some comments to the previous “parts” of this blog post that required answers longer than a comment should be (although I’ve broken that rule more than once). Will this be the last blog post of this series? Hard to tell… Depends on what other questions I get and what I’ve forgotten to tell you.

Industry Standards for Exam Development: Microsoft develops all MCP exams according to the standards described in the Standards for Educational and Psychological Testing. This is the industry recognized source for exam development and was co-written by a number of well recognized associations, including the American Psychological Association. I think a more descriptive resource for developing personnel selection tests (and certification exams that are developed for use by hiring managers are selection tests) was written by the Society of Industrial/Organizational psychologists--Principles for the Validation and Use of Personnel Selection Procedures. But I’m an I/O psychologist by education so I might be a little biased.

Industry Standards for Psychometrics:The standards that define a good, bad, and questionable item depend on the purpose of the exam, which is why it’s difficult to find specific guidance online (psychometric consulting companies will provide this guidance for a fee, of course, and as a result, are the most likely results when you do an online search for “psychometric standards”). In high stakes testing programs designed to screen out the majority of candidates (or applicants), the guidelines defining good, bad, and questionable items will be different, usually more stringent, than for classroom exams. In the former, these programs may choose to keep items that <10% of candidates answer correctly where most other programs would remove those items because they are too difficult. In academic programs, professors may choose to keep items that >90% of students answer correctly, assuming (perhaps incorrectly) that this means that the content is well covered during the course rather than the item being easily guessed because it’s poorly written.

So, what’s a good item at Microsoft? Good items are those that 25%-90% of candidates answer correctly. Items that fall outside that range are reviewed by SMEs and retained if they cover new functionality that candidates will become familiar with over time (in the case of overly difficult items) or if they are needed for face validity (items that are needed because their exclusion would lead to questions about the quality of the exam although nearly everyone answers them correctly) AND they discriminate between high and low performers. In today’s world of political correctness, “discriminate” is a bad word; in the world of psychometrics, it’s a very good thing.

So, good items also discriminate between high and low performers. This is determined by a statistic called the point biserial correlation. This is the correlation between how candidates perform on that item in relation to how they perform on the overall exam. This correlation should be moderate or strong, and it must be positive (>.20; remember that correlations range from –1 to +1), meaning that candidates who answer the question correctly are doing well on the exam and those that answer it incorrectly are doing poorly on the exam. Negative correlations are bad—basically, negative correlations mean that someone who answers the question correctly is doing poorly on the exam and someone who’s answering it incorrectly is doing well. For some reason, the item is rewarding poor performers and penalizing high performers. Even if SMEs tell me it’s a great question, I cannot keep items that perform like this on the exam…it’s clearly unfair to high performers to have items that are essentially penalizing them on the exam.

Occasionally, additional digging into the option analysis for items with negative correlations suggests a “psychometric miskey.” This means that an option that was not keyed as the correct answer (known as a distractor) is performing as if it should be the correct answer (a high proportion of candidates is selecting it; it’s discriminating between high and low performers better than the keyed correct answer). In these cases, we ask SMEs to verify the correct answer, and change it if the SMEs tell us that we had it keyed incorrectly. This doesn’t happen very often (less than once per exam), beta candidates are not scored on those items, and it is corrected before the exam is published.

This is just the tip of the iceberg, but I’m starting to get into the minutia of statistics. For those of you who stuck with this post this far—you probably know more than you ever wanted to about item selection, but please let me know if you have questions...I’m happy to answer them or extend this series into another post. Thanks for hanging in there!

Posted by libertymunson

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • * Please enter your name
  • * Please enter a comment
  • Post
  • Would love to read a little more of this

    Have a nice day
    jippy sabes
    ______________________________________________
Page 1 of 1 (1 items)