Assessment Practice
The practice of assessment begins with determining "What is assessment?" Assessment of student learning can be defined as the systematic collection of information about student learning, using the time, knowledge, expertise, and resources available, in order to inform decisions about how to improve learning (Barbara E. Walvoord, Assessment Clear and Simple, Stylus, 2004, p. 2). Assessment results can be used to improve a program's curriculum, pedagogy, structure, advising, and resources.
Other elements that require definition are goals, objectives, outcomes and competencies.
- Goals – broad statements that are aligned with university and/or school mission statements.
- Objectives – statements of accomplishments necessary to achieve the goals.
- Outcomes – statements of the desired result or tangible destination. Student learning outcomes speak to the particular level of knowledge, skills, and abilities that a student has attained at their completion of an academic program, course or experience. Outcomes focus on or combination of three areas: content (cognitive learning), skill acquisition (behavioral learning), and attitudes (affective learning)
- Competencies – statements indicating adequate demonstration of outlined tasks, skill sets, or knowledge.
Next, choose the assessment method(s) to be utilized. According to The NPEC Sourcebook on Assessment, Volume 1 (2000), the following elements need to be reviewed.
Formative vs. Summative Assessment
- Formative Evaluation - Goal is to provide feedback, with the aim of improving teaching, learning, and the curricula; to identify individual students' academic strengths and weaknesses; or to assist institutions with appropriate placement of individual students based on their particular learning needs.
- Summative Evaluation – Goal is to facilitate decision making at the program level and to determine resource (e.g. personnel, funds etc.) allocation.
Internally vs. Externally Developed
- Internally Developed – If there is not a measure that adequately examines the forms of student achievement that have been the focus of curriculum objectives, an assessment tool may need to be developed internally. For formative assessment, the outcome data obtained from locally developed tests may provide enough congruence with the learning objectives and curricular goals aims, yield a sufficient quantity of information, and thus guide decision making.
- Externally Developed - A commercially produced test that samples content and/or skill areas that are being emphasized in a program and typically provides detailed student reports. Assessment conducted for external purposes often employs such tests.
Conceptual Considerations
- Decision Making - Determine if the outcome data will be used for making a decision regarding an important policy issue, and how relevant is the outcome to the particular issue at hand? For example, if an assessment is conducted to determine those writing skills needed for college graduates to function effectively in the business world, the context of an essay test should probably include products such as writing letters and formal reports rather than completing a literary analysis of a poem.
- Utility – Determine if the data generated from a particular measure will guide action on achieving a policy objective. For instance, a policy objective might involve provision of resources based on institutions' sensitivity to the learning needs of students from demographically diverse backgrounds. It would be difficult to convince funding agencies that students' individual needs are being diagnosed and addressed with a measure that is culturally biased.
- Applicability – Determine if the assessment outcome measures relate multiple stakeholder groups. In other words, to what extent will data generated from a critical thinking, problem solving, or writing assessment yield information that can be used by multiple groups, such as faculty and administrators who wish to improve programs, or government officials and prospective employers who desire documentation of skill level achievement or attainment?
- Interpretability – Determine if the outcome data will be provided in a format that is comprehensible to individuals with different backgrounds.
- Credibility – Determine if the information generated by a particular assessment method is believable. Credibility is based on the amount of time, energy, and expertise that goes into a particular measure; the psychometric qualities associated with a test; the ease of interpretation of the materials and results; the amount of detail provided pertaining to student outcomes; and the cultural fairness of the test. Credibility of outcome data is tied closely to the degree to which the assessment information is conceptually related to the actual skills deemed important. .
- Cultural Fairness – Determine if the information yielded by a particular assessment approach is not biased or misleading in favor of particular groups. Bias can be subtle, requiring extensive analysis of item content and analysis of performance by students with comparable abilities, who differ only in terms of group association, to ensure fairness. A measurement analysis, Differential Item Functioning (DIF), allows for the control of ability level so that bias can be detected. In this way, cultural fairness is a measurement issue.
- Other - Sample size, time of testing, the audience, and assessment design (pre/post-testing) are just a few examples of variables that greatly affect assessment outcomes. Methodological and conceptual considerations should also guide the assessment tool selection.
Measures
Assessment of student learning in higher education has traditionally taken two forms: indirect (multiple-choice) and direct (constructed response) measurement.
- Indirect Assessment Methods - Questionnaires, interviews, focus groups, satisfaction studies, advisory boards, retention rates, job and graduate school placement data.
- Direct Assessment Methods - Exams, performance assessments, standardized tests, licensure exams, oral presentations, projects, demonstrations, case studies, simulations, portfolios, research papers, and juried activities.
It is important to also determine the measurement critieria. Decisions as the levels of the student learning outcome and how the data will be interpreted must be made. As to the latter, one consideration is if the data will be compared or benchmarked against comparable programs.
Reliability & Validity
The reliability and validity of a test cover an immense amount of information regarding the consistency and usefulness of scores. As a first step in the review process, it should be noted that reliability must be established before validity issues are addressed. If scores are not consistent, then the inferences made will also be inconsistent. Once reliability is determined, the content of a test, most specifically the definition and domains covered by the test, should be examined for fit with the purpose of testing. Any outcome information regarding the content and inferences made from the test should help to guide the content review. Correlations with other measures can also help to clarify the tests' relationships with other well-known variables. Perhaps the most important information comes from studies that investigate gains in ability not only across time, but across treatment.
-
Reliability Reliability is an estimate of test takers' performance consistency internally, across time, test forms, and raters. Generally, reliability estimates above .70 indicate an acceptable level, although values in the .80 and above are more commonly accepted reliabilities.
-
Validity Validity involves "building a case" that a test is related to the construct it is intended to measure. There are three types of validity: content, criterion, and construct. The most important type of validation is construct validity, because it encompasses both content and criterion validity.
