Rather than do some real work, I thought I’d blog some more on our item selection process. (Seriously, is there an intervention for blog addicts?)
As I mentioned previously, item selection begins by looking at the numbers. Next, we read through all of the beta comments to identify any issues that were not identified from the psychometric results. Many times the comments confirm or help us understand the psychometric results, and sometimes they help us identify issues that aren’t obvious in the numbers themselves.
Here’s what we look for in beta comments:
All of these bullets are used to help us prioritize the comments because, although we read them all, we rarely have time to discuss them all with subject matter experts. (By the way, we don’t know which items you flagged for comment—we only know which items you actually commented on, so be sure to provide your most important comments first.)
Based on the statistical information and comments, we identify a set of items that need further review. We review these items with subject matter experts during an item selection meeting. Based on their feedback, we finalize the pool of items from which the live forms will be assembled.
Stay tuned for Part 3 where I’ll address the comment to my first post about the psychometric standards that we use as well as any other questions you might have about this process. Ask away… I’d love to make this a four part series. Who needs to do real work anyway?
Posted by libertymunson
Wednesday, March 18, 2009 7:31 PM by Edward Laverick
Have you noticed any statistical changes in the way question selection works since you opened the beta process beyond the usual pre-selected invitees? Without vetting of beta participants are we likely to see exams becoming easier due to more incorrect answers?
Friday, March 20, 2009 2:06 PM by Steve Maier
Great information Liberty. I agree with Edward in wanting to know if having alot of people that are not trained on a topic muddies the waters when it comes to the questions. Also I have seen in the past where there were questions like on database exams where people complained that the answer that is being looked for is what Microsoft is promoting while other database professionals promote a different concept. For example, typically on a cross-reference table, the exams lean towards having a separate index for this table while other professional dabase users use a composite key made of the keys from the two tables it was referencing. How do resolve this type of thing?
Saturday, March 21, 2009 10:32 PM by Michael Dragone
I have the same thought as Steve and Edward - I know the "right way" to answer the question is with the by-the-book Microsoft answer, so what do you do when everyone answers it with the "real world" answer?
Monday, March 23, 2009 12:29 PM by libertymunson
I started writing my response to your questions, but, as I ve mentioned several times, I tend to get excited (and as a result verbose) when I talk about this stuff, and it was running toooooo long for a comment. I m going to address your questions in a future blog. My next blog is an addition to this one, but I ll blog about your questions later this week.
Thanks for the great questions!
Wednesday, March 25, 2009 6:42 PM by libertymunson
As I was writing my blog post to address some of these questions, it turns out the beta question doesn t really fit with what I was blogging about, so I will comment to that question here.
We still typically target specific populations for our beta exams. The populations of interest are those candidates who are likely to have some experience with the technology; in some cases, that population is very large (and may seem like it includes everyone), but in other cases, it s more limited. When the number of beta participants is not large enough for robust statistics, we will open the beta to a larger group of people.
The most likely result of these "open" betas is that the psychometrics show items that are more difficult than they may actually be. As part of our normal process, though, we discuss items that were flagged as too difficult with SMEs to get a better understanding of why they are difficult (is it tricky (which is bad) or does it appear difficult because the beta candidates may not have the necessary experience to answer it correctly (which is OK)), so our process doesn t really change but our interpretation of what s psychometrically too hard might.
In open betas, overly easy items are more likely to be removed because they clearly are not adding value to the exam if everyone, even those with little to no experience with the technology, can answer them correctly.
All in all, open betas change the interpretation of the what s good and bad from a p-value (percent of people answering correctly) perspective but not our process.
Because the cut score is based on SME input that takes into consideration the difficulty of the item pool in relation to the skills of the minimally qualified candidate, exams aren t easier as the result of an "open" beta. Setting the cut score, though, is a complicated topic that will get its own blog.