EdFix Episode 39: ChatGPT and Beyond - Teaching in the AI Era
Transcript
RYAN WATKINS
It's going to push us as instructors to go back and really look at what we are trying to teach and how are we trying to assess that.
MICHAEL J. FEUER:
Welcome to EdFix, your source for insights about the promise and practice of education. I'm your host, Michael Feuer. I am the Dean of the Graduate School of Education and Human Development at George Washington University, and I have a special privilege today and a joy to introduce our listeners to one of my esteemed colleagues, Dr. Ryan Watkins, Professor of Educational Technology, and also the lead faculty member in our PhD program on human and technology collaboration. Dr. Watkins is an expert in a range of fields that we're going to try to touch on at least some of them today, but he works on artificial intelligence, needs assessment, monitoring, evaluation, website design, coding, and he teaches courses on instructional design, research methods and technology integration. Then after lunch he writes books, and he has written 11 of them so far.
Ryan is a prolific contributor to our field of education. He's had close to 100 articles published and in the performance improvement literature, he is among the most cited authors of journal articles in the field. The past year or so plus, he has been kept pretty busy with the explosion of AI in education. AI, of course, is artificial intelligence. On this topic, Ryan has published journal articles, he's been on the radio in various interviews and on television, and he has been in panel discussions. He also has been creating tools for instructors to help them understand the practical and also the ethical uses of AI in the classroom. I emphasize the word ethical because I plan to get back to that in the course of our conversation.
Thank you so much for joining me on EdFix. We're looking forward to this conversation for all kinds of reasons. Ryan, help us here with a little bit of unpacking of what AI really is and what it isn't.
RYAN WATKINS:
Well, thank you for the kind introduction and the invitation to be on your show. Yeah, there are lots of definitions of AI floating around and some are more technical from the computer science side, but some are very pragmatic as well. The one that I settled on that I think captures a good middle ground is AI is when we're using computational tools and they can do things that until recently we thought only humans could do. It could be a variety of things. ChatGPT and large language models are kind of the hot topic now because they're doing things that until recently we thought only humans could do, such as write poetry, create song lyrics, write essays on topics, show some basic forms of causal reasoning.
They're clearly within the definition of AI, but then there are other types of AI that also fit within that definition. If you think of what they're doing with driverless car technology and vision technologies, it's a different type of AI than what would write a song or a poem, but it still falls within that. That until recently we thought it involved a human had to be there to do some of these tasks, and now we're finding that we can build tools that can do it as well.
That's been very exciting, but it also comes with some worries, and I think striking a good balance of being both excited and worried is where most of us probably want to end up. Sometimes I see the technologies and I shift a little more to the worried side. Should we really be doing this? Then I try to come back to the middle ground of a good balance between it's very exciting the things that we can do. It's going to bring lots of exciting advancements in science and in medicine. Our lives will get better, I believe, because of these technologies, but they can equally be used to do harm. Finding that balance is what we'll struggle with is humans as we go forward.
MICHAEL J. FEUER:
Give us a good example of something that you would ask ChatGPT and then what happens when you ask that question, if we could imagine ourselves inside the black box with those fast moving gears that AI is using.
RYAN WATKINS:
Maybe I'll start with what is ChatGPT and how did they build it. I'll use ChatGPT just as an example. They had prior models and we were using those in research context. We were using GPT two prior to the release of ChatGPT, which is 3.5, and I'd say two was responding about like you would expect an elementary or middle school student potentially to respond. It wasn't great. It was interesting, but it wasn't all that useful or exciting. Then there was a big jump with the release a year ago of ChatGPT, which is more like I would say responses that you might get from an advanced undergraduate student or someone in graduate school potentially early on. We made some big leaps in the last year.
Part of that comes from the technologies that are underlying it. When we say that it is a model, what we mean by that is that there is a series of numbers representing words in this case because these are language models. It could represent other things and we see that with the models that create art and images, but if you take a word, let's say the word is “bank,” we can assign a number to that. If you remember to maybe in middle school or high school when you took geometry, you would do plots on x, y, and Z axis and you would say that a point can be on the X axis three, the Y axis two and the Z axis four, and you could plot it in a three-dimensional space.
We could take the word bank and do the same thing. We could put it in a three-dimensional space and have three numbers representing it. We could take the word boy, so if the sentence was the boy went to the bank, we would want another dot on our space to represent where boy is, and we could put that at another number, four, five, seven. Then we start reading, and the more we read about what a boy is, we may shift that number around and it may move further from the word bank and it may move closer to the word bank.
Taking that very basic idea, now have it not just read a few things, but have it read about 300 million words as published on the internet. It read books, it read websites, it read all of Wikipedia, and it's continuing to shift the relationships of those words around. The more it reads, it keeps fine-tuning. As you can imagine, three data points, an X, a Y and a Z axis only gets you so far. My understanding with ChatGPT, and they don't publish all of their background information, is they are using about 1200 dimensions. If X, Y, and Z were three dimensions, imagine now you have 1200 of those. Beyond human comprehension, but it has 1200 numbers and every time it read something new, it made slight adjustments to those to find out the relationships between those words, what we would call tokens.
That's what we have as a model. It read all of that and computers pounded away for weeks, computers with very expensive GPU chips that are about $40,000 a pop, and they had thousands of them running for weeks on end to create this model and it ends at a point and they say, we have 40,000 words and we have each of them represented with these 1200 numbers. That's what we mean by a model, which is great, and that's very useful because now you can make predictions. We know how words reside in a space in relationship to each other.
What we can do now is set up a ChatGPT system where people can put in prompts and ask questions and it can make predictions based on those words. If the sentence starts with the boy went to the bank, what should the next word be? Well, it turns out that's a very complicated task. Even if you have this nice model that tells you what all those relationships are, because it could be a river bank or it could be a financial bank, and the computer has to figure out what type of bank is it talking about?
That's where the statistics come in. It's making a prediction about what word should follow the previous word, but it's doing that that by understanding the context. ChatGPT we think goes back around 8,000 to 10,000 words to understand the context or tokens. Computer scientists refer to tokens and not words because they break words up in different ways. When you use these systems and you see how fast they are, you can just imagine how many calculations are being made for each word. It's using that dictionary of about 40,000 words with all those 1200 numbers to figure out which word should then follow, so it has a dictionary and it has the context and it's making predictions about which word should follow.
Now if you go to a system like ChatGPT and you put into the prompt area and you say something like, what were the differences between the causes of the French Revolution and the American Revolution? Unlike Google who will come back and give you lots of links and sources to go to, these systems will give you an answer. And it will start writing what words it believes should be in response to that.
There is some additional elements to this. This is a rather simplified version of it, but it will respond and it will say, the American Revolution and the French Revolution were very different. One was caused by these five things and one was caused by these other five things and they will write a comparison of it and it'll be like reading a student's paper if that was the question that you asked them to do. All of it based off of these statistical models. I'll stop there. I could go on further, but I think that's probably a good summary.
MICHAEL J. FEUER:
It sounds to me that from your perspective, Ryan, there is enough about this technology that is on the side of us being excited and anticipating some remarkable progress such as in health and diagnosis of conditions and in even, I suppose, things like improved air traffic control and other complex systems. If I understand you right, given the potential benefits and given the fact that the models are not yet and maybe never will be completely autonomous, that we should be willing to see where this is going to go.
The reason I ask is that apparently the European Union has now made some decisions about regulating this technology, which I suppose is a different approach to managing the question of the benefits and the risks. What's your sense of whether and how governments should get involved in any of this now or when do you think it should get involved?
RYAN WATKINS:
I think it's a typical challenge between science and the political social environment that it functions within, and science makes advancements and then society has to figure out how to deal with it. Regulations is one mechanism that we can use through our governments to set up guardrails, to regulate it in a way that shapes it in the direction that we want it as a society. Europe has a different society than we do in the US, as does China, as does Russia, as does a lot of countries. I think we'll come up with different types of regulations that make sense within those contexts.
It is a very challenging thing to regulate. You could go online today and download one of these models and run it on your computer if you had a sufficiently powerful computer. A lot of the regulation is more about controlling the companies, the major players, the OpenAIs, the Googles, the Microsofts, and Europe has had a contentious relationship over the past few decades with them anyway, so it doesn't surprise me that their regulation leans more heavily in that direction. Of course, they're trying to balance the economic challenges that regulation brings.
Of course, I have a bias as someone who grew up in the United States towards a different type of position on the market than most Europeans do, but it's going to be expensive to introduce models into the European marketplace. Depending on how you're using them and in which sector, they're going to have to be audited, so you're going to have to pay people large amounts of money to audit your system before you can release it and you're going to have to have continuous audits done is my understanding, like ongoing auditing.
It's going to make it hard for small players to get into the market, it's going to make it expensive for Google, Microsoft and some of these big players to stay in the market and they're trying to figure out how to balance that. I think that in the United States we'll probably lean more heavily towards innovation and have a different type of regulatory market where we'll want more pathways for small new startups to roll out new types of products without necessarily having to go through expensive auditing processes and security processes. I think we're just starting to see those balances. Of course, you can look at what China's doing and it's very different from their perspective. I wouldn't say that any of them are right or wrong. They just represent different social values and different perspectives on how to balance safety, security, economic progress, scientific advancement, and that's hard. That's hard to strike a good balance of all of those different factors.
MICHAEL J. FEUER:
If we can, bring this now into the more specific realm of the work that you and I do most of the time, which is engaging with students, teaching, having them hopefully learn, and then in addition to that, having them participate in what I would call knowledge creation or at least interpretation and creative thought. One of the things that comes up a lot is, well, suppose you give an interesting assignment to students. Develop an argument for or against the proposition that the Civil War was worth fighting, leaving aside some obvious sensitivities that a topic like that would provoke even without any technology.
The question that comes up of course is whether the answer that we get, if it is either completely or even partially generated from one of the AI systems, what does that mean in terms of our expectation and our responsibility to develop critical thinking skills among our students?
RYAN WATKINS:
Listeners should know that it does make things up at times, especially if you're working in areas where there is not that much knowledge about it. Asking about the Civil War, where there's been thousands of pages written, books after books about it, it's going to give you a pretty good answer. If you ask it about a specific law that took place during 1866 and there's just not much out there, it's going to make stuff up and we call that hallucinating. When you're at the boundaries of knowledge where there isn't much, it'll make stuff up, so you have to keep an eye out for that.
MICHAEL J. FEUER:
But coming back to the part about wanting to encourage students to become critical thinkers, what do we do in our classrooms then to distinguish among students who are trying to develop some of these answers on their own from students who are taking advantage of the AI interface? Where are we in trying to untangle some of those, shall we say, conundrums of the classroom?
RYAN WATKINS:
Yeah. There are many conundrums and they vary some by discipline. Some disciplines it's much more challenging than in others I find, but I would say that it's going to push us as instructors to go back and really look at what we are trying to teach and how are we trying to assess that. I think in many instances, maybe most instances, we relied on essays as a tool for assessment when it really probably wasn't the best tool, but it was a convenient tool. If I really want to understand a student's critical thinking, just asking for a paper at the end is probably not the best way to get to that in the first place, so we should go back and re-look at what are we teaching, why we're teaching it, and do we have an assessment strategy that's going to help us truly know or truly give us a window into how the student is thinking and are they applying the types of critical thinking patterns we're looking for?
I think that a lot of faculty are finding that they have to break their assignments up and get more into the process and not the product, so having more scaffolding, so having students provide maybe three different outlines of different argument strategies that they could have about whether or not the Civil War was a necessary event, and then use a tool like ChatGPT to look at what are the strengths and weaknesses, where the logic flows might be breaking down between those three strategies, and then assess that and then move them to the next stage of looking at how do you draft arguments and how do you structure an argument in a way that's going to be persuasive to the reader.
Again, having students do the work and then utilizing these tools to enhance it. Just like we might have them use spell-check and grammar check tools as well, why not have them use a tool that can help them at the bigger scale and say, you're jumping from A to C. You forgot to cover B for the reader. Then the tools can find those types of jumps in the logic if you prompt it correctly and then you can move on to writing content. That, I believe, will give the instructor a much richer assessment as to the development of critical thinking skills in their students than just saying, in six weeks turn in a paper, I will read the paper and I'll make an assessment of where I think your critical thinking skills have gained and where maybe you haven't gained. Then I'll put a grade to that.
Again, I don't think that was a good assessment strategy in the beginning, and this isn't a push on us to be more holistic, pragmatic, but it is going to be a challenge. It's going to be more work to make that assessment than just read a five page essay at the end of the term and assume that that represents their critical thinking.
MICHAEL J. FEUER:
I know that you have actually developed some tools to help teachers figure out the best or the most promising uses of AI in their work. We're not going to have time to get into how those tools work, but I do think it's important for our listeners to know that you and perhaps others in the field are really thinking about this from the standpoint also of professional development for the improvement of instruction. I think that's very reassuring to people who are otherwise worried about this.
Anyway, this has been very interesting. I don't want to leave you without giving our listeners a little bit of a hint about your own background. You have three degrees from Florida State, but mainly what I would love you to do is a few sentences about when and what was it in your background that got you involved in this particular line of work.
RYAN WATKINS:
I guess if I go way back, I was teaching seventh grade math after my bachelor's degree and determined early on that it was not the career choice for me to spend my next 30 years teaching seventh graders. While I was well-prepared for the math, I was not well-prepared for the social work nature of the job of a middle school teacher. I went back to school and studied instructional design and I focused primarily in the area of needs assessment, so how to determine when instruction is the appropriate tool for getting the change in human performance that you're looking for. It turns out that training is expensive and not often what we should be doing in order to get the changes in performance within organizations or schools.
I'd been doing that for quite a while. I guess it was 2014, 2015, I had the opportunity to go on sabbatical. During my sabbatical I did online collaboration with colleagues who study needs from different fields. We had 12 different disciplines and I think 13 different countries involved with people who are needs scholars, so needs within social work, needs within medicine, needs within economic perspectives. We spent about nine months in online dialogue just on the single question of what are needs. Coming out of that dialogue, I realized that we are nowhere close to overlap in how different disciplines view what needs are, which is a fundamental issue. If you don't know what is necessary versus what is not necessary, then it gets very challenging to make decisions, and things that are wants or desires can easily be pushed into being thought of as necessary when they are not.
Part of that realization was that it's very complex and that humans have lots of needs and we have lots of things that we elevate to needs even if they're not really necessary. This doesn't even get into what is then sufficient, which is another related concept that we often deal with in needs assessment. I realized that technology is probably our best hope for understanding this. Though I've been working with technology and had developed websites and did some basic coding then, I decided to double down on those efforts and focus in on AI technologies. This was around the time when Google DeepMind did the challenge where their computer beat the best Go player in the world. I've never played Go, but it's supposed to be one of the most challenging games ever invented, and every decision has 10,000 implications to the next decision. They were able to use reinforcement learning to come up with a strategy where it became the best Go player after just a few weeks of training, but of course when I say it trained, it trained on a thousand games an hour, whereas a human could never play that many games.
Early on in that time, you were beginning to see that AI could identify patterns and data that humans were not going to be able to. No matter how good we got, we weren't able to do that. It was a promising area of technology I believe and still believe, I've written a couple articles about this, for us to use in the area of understanding what our needs, how do we prioritize resources based on what's necessary versus what is desired. Here I am, and excited to be here, excited to see where it goes next and join the purview of it as an educator, as an instructional designer, as someone who's dedicated to helping others learn.
It's a fascinating time and I think we'll make great advancements fairly quickly, but it's going to be a lot of work. There's no denying it's going to change all of our jobs. We won't be doing the same things we did a decade ago, but I think the benefits of doing it the new way will be substantial and well worth it, but there will be discomfort because of that, as with any new technology. I was there when we moved to... we're in the middle of the debates around should we have calculators being used in the classroom. Again, I was teaching seventh graders and it was a big issue. Now I have a student who's a seventh grader and they use calculators with their tests. They can use calculators when taking the SAT, and there have been benefits. I think we'll see similar type of struggles and then benefits out of this technology as well.
MICHAEL J. FEUER:
Well, that is a wonderful, and I should say refreshingly optimistic outlook. Ryan, it's been a great pleasure talking to you. I'm so grateful that you could take the time. Just so that my listeners know, you can hear any of the episodes of EdFix on your favorite podcast platform, Apple Podcast, Spotify, iHeartRadio, Google Podcasts, or wherever you listen to them. We have a website called edfixpodcast.com, and most importantly, we have, Touran Waters, who is our technical maven, makes this all possible for which we are very grateful. Again, professor Ryan Watkins, thank you so much for being with us. Good luck and let's hope for a bright future.
RYAN WATKINS:
Thank you so much.
EdFix: A Podcast About the Promise and Practice of Education
Hosted by Michael Feuer, Dean of GW's Graduate School of Education and Human Development (GSEHD), EdFix highlights the effective strategies and provocative ideas of researchers, practitioners and policymakers on how to improve our education system. Listen in as Dean Feuer connects their worlds to take on some of education's most complex issues.
From preschool to postsecondary, get your fix with EdFix!
Subscribe on Apple Podcasts, Spotify, iHeartRADIO, Google Podcasts, YouTube, or wherever you listen to podcasts.