EdFix Episode 30: Should Tests be Socioculturally Responsive?

Dr. Randy Bennett, the Norman O. Fredericksen Chair in Assessment Innovation at ETS, describes tests as “cultural artifacts,” many of which were created at a time when our country was much less diverse. So what does that mean for school accountability or university admissions testing today? In an article* for the journal Educational Measurement - Issues and Practice, Dr. Bennett argues that we need to change the way we test - in content, format, and interpretation - to best fit the needs of a society with rapidly changing demographics.

*To request a copy of Dr. Bennett’s article, “The Good Side of COVID-19,” email rbennett

ets [dot] org (rbennett[at]ets[dot]org).

Transcript

RANDY BENNETT:
Every individual brings to school and to assessment a sociocultural foundation.

MICHAEL J. FEUER:
Welcome to EdFix, your source for insights at the promise and practice of education. I am your host, Michael Feuer. I'm the Dean of the Graduate School of Education and Human Development at George Washington University, and today I am more than delighted to be speaking with Dr. Randy Bennett. Randy is the Norman Frederiksen Chair in Assessment Innovation in the R&D, Research and Development Division of ETS. Randy's work focuses on integrating advances in cognitive science, technology, and measurement in pursuit of equitable assessment approaches to have a positive effect on teaching and learning.

Randy has a long and distinguished career in the world of testing and assessment. He directed the National Assessment of Educational Progress Technology-Based Assessment Project, which included the first administration of computer-based assessments to nationally representative samples of school students. Randy is a past president of the International Association for Educational Assessment, past president of the National Council on Measurement in Education, a fellow of the American Educational Research Association and was recently elected to the National Academy of Education. And you think with all of that, that he still had time to meet and talk to us for this podcast. What a pleasure. Hello, Randy.

RANDY BENNETT:
Michael, it's great to be here. Thank you for having me. I'm really looking forward to our conversation.

MICHAEL J. FEUER:
First of all, just in case some of our listeners need a little bit of help with the various acronyms and organizations. Can you give a reminder about ETS, where it is, what it is?

RANDY BENNETT:
ETS is Educational Testing Service. It's located in Princeton, New Jersey, and has offices in other locations in the United States and throughout the world. ETS was established in 1947. It's a nonprofit 501(c)(3) organization, and it was created by three other nonprofit organizations, the College Entrance Examination Board, the American Council on Education, and the Carnegie Fund for the Advancement of Teaching.

It was created because each of those institutions at that time had fledgling testing programs that had grown into fairly sizable operations and none of those organizations had as their primary mission, the operation of testing programs. So the leaders of those organizations, along with James Conant, who was then the president of Harvard, advocated for the establishment of a single national organization that could operate those testing programs and in addition, could do measurement research and research in related areas of education that would further advance the then developing science of educational measurement.

MICHAEL J. FEUER:
It's good to have the broader historical context of this over those, now, let's see, almost seven decades since the founding of ETS. What have the trends been in this, what I would call unfinished business of fulfilling certain aspirations for equity and access to higher education?

RANDY BENNETT:
Well, I think one important trend to recognize is that the demographics of our country are shifting very dramatically. The primary indication of that is the ways in which our population is changing. In 1920, which would be about 27 years prior to the establishment of ETS, the U.S. population was 90% Caucasian. By 1980, which would be 60 years later, we were 80% Caucasian. By 2020, 40 years after that, the most recent census, we were 66% Caucasian. So a greater difference, a greater change in the percentage, 14 percentage points in a shorter period of time. And by 2060, the projection is that our population will be 44% white. Our public school population as of 2018 was 53% students of color and 52% eligible for free and reduced price lunch. Our largest state by population, California, had a public school population in 2019 that was 78% students of color.

The point is that the demographics of our population are changing quite dramatically, have changed quite dramatically over only relatively short periods of time. The second point is that our public school populations are today quite diverse from a racial ethnic perspective. The implication of that is that greater demographic diversity suggests greater cultural diversity, and that fact is really important. It's important because tests are themselves cultural artifacts. They privilege certain competencies, certain ways of knowing, certain forms of representation, and certain modes of expression that are rooted in our nation's common culture. Tests today emerged from a period when we were far less diverse than we are today. They weren't designed for the multicultural pluralistic society that we now find ourselves within. Tests have not substantively evolved to the same degree as our society has.

MICHAEL J. FEUER:
What about other uses of testing and assessment overall these decades, and what's your sense of the extent to which we have a good common public understanding of the different uses and possibly misuses of the kind of data that come from standardized testing programs?

RANDY BENNETT:
Well, first, I think it might be valuable to understand what a standardized test is and what an assessment is. The way I think about assessment as the more general category is as a type of inquiry used to inform decision-making. Usually, about individuals, groups, or institutions that involves four fundamental acts. The first act is to engineer opportunities to observe evidence of the competencies we care about. The second act is to connect that evidence, the evidence we observe to inferences or judgments about those competencies. The third act is to communicate or otherwise use those inferences for making some sort of decision. Finally, the fourth act is to value to quality and impact of those opportunities, the evidence we gathered, the inferences and decisions we've made. Those four fundamental acts can be carried out from many different purposes and the acts can take many different forms. Different implementations of those acts, I would call assessment methods.

A standardized test done for say school accountability is a particular implementation or method of assessment. A classroom formative assessment done to help plan and adjust instruction is a different implementation. Both the formative assessment and the standardized accountability test engineer evidence gathering opportunities, but of different kinds. They both gather evidence of different kinds and they both facilitate inferences of different kinds. In the case of the accountability test, about the extent to which students have mastered curriculum standards broadly versus in the case of formative assessment, what this particular student perhaps or perhaps a class might understand about a single concept.

That's how I think of assessment. I think of a standardized test as a particular type of assessment that can be used for different purposes. So methods and purposes are distinct from one another, and it's important not to conflate the two. It's also important to understand that all methods, all assessment methods have their advantages and limitations. They all require that when we make inferences about what students know and can do, because we can't see inside a student's head. So our inferences are always uncertain regardless of the methods that we use.

Another limitation is that all of those methods bring with them sociocultural influences of one type or another. Standardized tests bring the sociocultural influences of those who create them, who score them, and who interpret the results, as well as the sociocultural perspectives inherent in the tools and the knowledge representations that are used in a given test. Formative assessments do the same, including grades. They are not immune from sociocultural influences because both formative assessments and grades are given through the sociocultural lens of the teacher, which may or may not fit the sociocultural background of all of the students that that teacher is in fact instructing.

MICHAEL J. FEUER:
This is wonderful clarification, Randy, on the notion that assessment is a broad rubric for mental processing that we do in order to help us make decisions and we gather evidence to support those kinds of decisions. To what extent do you think the measurement community and the general public appreciate the difference between measuring tire pressure and measuring a student's performance on third grade arithmetic?

RANDY BENNETT:
I don't think that certainly the public or educators in general appreciates the difference. There's far more uncertainty in the measurement of students academic capabilities, academic skills, academic knowledge than there is in the measurement of their height and weight, for example. When we make decisions about measuring students' academic competencies, one of the first decisions we make is a decision of what to measure. By making that decision, we're immediately excluding other important things that we could have measured. So with respect to the elementary and secondary education, what we measure for purposes of school accountability is English language arts, mathematics, and less frequently science. But we leave out the measurement of what many in the public and in the education community would argue are also important things to know about, attributes to know about. Even within English language arts, mathematics, and science, we focus on particular standards, standards that have been defined by states or by, in the case of NAEP, by framework committees. Those standards again, represent a subset of what those three domains are constituted of.

When we make design decisions with respect to the ways in which we are going to measure those standards, we again reduce what it is we are measuring to a significant degree. Because we choose particular test formats, we choose particular ways of measuring those individual standards. In fact, in some cases, we exclude standards entirely, as in the case of English language arts, we typically do not measure speaking and listening. So we progressively get to the point where what it is we do measure represents a very limited subset of what it is we could measure about an individual had we infinite time and resources of course. But the point is that we have reduced an important characterization of what an individual knows and can do to something that is very small relative to a physical measurement.

MICHAEL J. FEUER:
Let's talk about this push toward bringing the apparatus of assessment using tests and other mechanisms in line with what you refer to and others refer to as a more socioculturally responsive overarching framework. What do we really mean by that in terms of socioculturally responsive assessment, and to what extent does it collide with rather more simplified or perhaps simplistic views of fairness when one makes inferences across different individuals or groups?

RANDY BENNETT:
Good question. Socioculturally responsive assessment begins with the premise that every individual brings to school and to assessment a sociocultural foundation, which is formed as a function of their development, their development and growth within a family, within a community that values certain ways of knowing, certain modes of expression, and other things that form that individuals views and ways of operating and making sense of the world. That's important because not all sociocultural backgrounds align with the ones our tests are based upon. As I said, our tests are cultural artifacts and that misalignment creates the potential for perceptions of bias, it creates the potential for causing some individuals to disengage, to be demotivated, to lower their self-worth because of the ways that they perform on those assessments. It causes potential for underestimating competencies, and it may even contribute to limiting opportunity to learn and life chances for those individuals.

Socioculturally responsive assessment is an attempt to address these issues. You can think of it as one of a family of related approaches. That includes culturally responsive assessment, universal design for assessment, anti-racist assessment, justice oriented assessment, culturally and socially responsible assessment. Those different approaches differ in the target populations they address, the literatures in which they're built, their underlying premises and principles, the goals they aspire to achieve and the rhetorical stances that they take. But they all emphasize at least one key idea, and that is to design for the social, cultural, and other relevant characteristics of individuals and the context from which they come.

What that means does collide with traditional notions of fairness and standardization, which suggests the idea of treating each person identically, giving each person the same test content, the same items in the same test formats under the same administrative conditions. That idea of sameness is the foundation for comparability, for being able to compare one individual's performance to another.

Socioculturally responsive assessment takes a different point of view. It argues that instead of treating individuals identically, we ought to be treating individuals differently so as to help them show what it is they know and can do. For example, including problems that connect to the cultural identity, background, and lived experiences of each individual, especially individuals from traditionally underserved groups. For example, allowing forms of expression and representation in problem presentation and solution that help individuals better demonstrate their competencies. For example, by adapting to individual's personal characteristics, including their cultural identity. The point then is to try and treat individuals differently in ways that allow them to evidence their best performance as opposed to trying to treat them identically in ways that might not allow them to show their best performance, in ways that fit the majority population as opposed to every individual. So that's the idea and that's the collision.

MICHAEL J. FEUER:
Do you encounter any kind of resistance to this sensitivity to sociocultural differences? If the inference that you want or that you're looking for is whether young people are having difficulty with a certain kind of subject matter, take dividing fractions or multiplying three-digit numbers or something like that. If you take the position that all children should be learning so that they can do well in those kinds of tests, and you now have an assessment that somehow adjusts based on certain kinds of social or cultural or economic or other differences, are we in some sense undermining the idea of creating opportunities for all children to learn key information and key skills?

RANDY BENNETT:
I don't think so, Michael. The reason I don't think so is that I think what might be happening here is a conflation of purpose and method, which is why I talked about and tried to make a distinction between purpose and method earlier. No one is arguing that we shouldn't be attempting to try to understand what students know and can do with respect to the constituents of important domains. The argument is over what methods we use to make those judgments and whether those methods are suited to effectively representing the competencies of diverse students, students coming from the wide range of cultures that currently characterize our public school population. The proposal is to try and adjust our methods so that they are more sensitive to what students from that diversity of sociocultural backgrounds, knows and can do.

That's the issue. It's not a question of purpose, it's a question of method. We can use different methods depending upon the individual to better understand what they know and can do. Some methods will be better suited to some individuals versus others. You wouldn't, for example, attempt to assess a blind individual's competencies in reading via a printed assessment unless you had a reading machine. You would suit that measurement method to the characteristics of that individual, which would be either Braille presentation or the use of a reading machine. But you wouldn't guess that individual to demonstrate reading competency with a printed test on its own. The idea is the same.

MICHAEL J. FEUER:
Yeah, no, that's very helpful and I think you very eloquently make the point that reactions to the idea of socioculturally responsive assessment are sometimes missing the point about method rather than underlying construct that we are seeking evidence about. That's not something that is going to be easy, especially in a rather tense political environment that we are currently living in, which brings me to the point that you made in your most recent essay, which is how even if many of us in the research world have been thinking about these kinds of problems for a long time, the general public may be coming more aware of some of these complexities because of the concurrence of the COVID pandemic, which has had such a devastating effect on so many communities and in particular on communities of disadvantaged and minority status, but concurrent with the pandemic of racial violence and the awakening in a large segment of the American population about some of these issues and disparities that have been with us for a long time.

So am I picking up on one of the themes that you were trying to convey in that essay? What else would you want to say about where we are today post COVID in terms of these various issues?

RANDY BENNETT:
Well, I think one of the effects of the COVID-19 pandemic was to fuse testing to social injustice in a way that could actually help us reinvent our field. That is the field of educational measurement. Because part of what the COVID-19 pandemic did was awaken those of us in the society who hadn't experienced that social injustice directly to the vast inequities that characterize our society. Those inequities extend very significantly to educational opportunity, which we know is not equal, certainly not in our public school system, which is driven by a localized approaches to funding, which is characterized by degrees of segregation that have increased over the past several decades and other factors that cause students to have unequal opportunities to learn, which are then reflected in standardized test performance of course, and in performance on measures that are used, have been used up until recently by virtually all institutions to make decisions about admission for post-secondary experiences.

What that pandemic has alerted us to is the need to rethink how it is we go about characterizing what students know and can do. That has led to proposals like the one for socioculturally responsive assessment in the hope that we might be able to better represent the competencies that students from diverse backgrounds bring to school and bring to educational assessment.

MICHAEL J. FEUER:
To say that all of this is complicated would of course be the understatement of the year. I just want to say, Randy, that I'm very grateful to you both individually and because of what you have done professionally and in your years at ETS for persisting in this pursuit of an improved science over topics that are really very, very hard.

Hey, Randy, thank you so, so much for being with me today. Your article* in Educational Measurement: Issues and Practice, we're going to include as a link so that listeners can take this small appetizer portion of the history and future of educational testing and assessment and turn it into a more complete meal by reading other of your work and getting into these issues more deeply. It's been a pleasure.

If our listeners have enjoyed this as much as I have, then I would encourage them to subscribe to our EdFix podcast. We have a website, edfixpodcast.com. You can get it on Apple Podcasts, Spotify, iHeartRadio, or other such platforms. Special thanks to our executive producer, designer, engineer, and all purpose podcast, maven Touran Waters. Randy, thank you so very much.

RANDY BENNETT:
Michael, it was a pleasure. Thanks so much for having me.

*To request a copy of Randy Bennett's article, "The Good Side of COVID-19," please email him at rbennettets [dot] org.

EdFix: A Podcast About the Promise and Practice of Education

Hosted by Dr. Michael Feuer, Former Dean of GW's Graduate School of Education and Human Development (GSEHD), EdFix highlights the effective strategies and provocative ideas of researchers, practitioners and policymakers on how to improve our education system. Listen in as Dr. Feuer connects their worlds to take on some of education's most complex issues.

From preschool to postsecondary, get your fix with EdFix!

Subscribe on Apple Podcasts, Spotify, iHeartRADIO, Google Podcasts, YouTube, or wherever you listen to podcasts.