How do you know if your evaluation instrument is “good”? Or if the instrument you find on csedresearch.org is a decent one to use in your study?

Evaluation instruments (like surveys, questionnaires, and interview protocols) can go through their own evaluation to assess whether or not they have evidence of reliability or validity. In the Filters section on the Evaluation Instruments page, you can find a category called Assessed where you can include instruments in your search that have been previously shown to have evidence of reliability and validity. So, what do these measures mean? And, what is the difference between them?

Evaluation instruments are often designed to measure the impact of outreach activities, curriculum, and other interventions in computing education. But how do you know if these evaluation instruments actually measure what they say they are measuring? We gain confidence in these instruments by assessing evidence of their reliability and validity.

Instruments with evidence of reliability yield the same results each time they are administered. Let’s say that you created an evaluation instrument in computing education research, and you gave it to the same group of high school students four times at (nearly) the same time. If the instrument was reliable, you would expect that the results of these tests to be the same, statistically speaking.

Instruments with evidence of validity are those that have been checked in one or more ways to determine whether or not the instrument measures what it is supposed to measure. So, if your instrument is designed to measure whether or not parental support of high school students taking computer science courses is positively correlated with their grades in these courses, then statistical tests and other steps can be taken to ensure that the instrument does exactly that.

Those are still very broad definitions. Let’s break it down some more. But before we do, there is one very important caveat.

Evidence of reliability and/or validity are assessed for a specified particular demographic in a particular setting. Using an instrument that has evidence for reliability and/or validity does not mean that the evidence applies to your usage of the instrument. It can provide, however, a greater measure of confidence than an instrument that has no evidence of validity or reliability. And, if you are able to find an instrument that has evidence of validity with a population similar to your own (e.g. Hispanic students in an urban middle school), this can provide even greater confidence.

Now, let’s take a look at what each of these terms mean and how they can be measured.

Select here to go to next page to learn about Reliability.