AI, Drones, and the Iran War
Event date
Speakers
- Adjunct Senior Fellow, Technology and National Security Program, Center for a New American Security; Former Inaugural Director, Joint Artificial Intelligence Center, U.S. Department of Defense
- Senior Fellow, Carnegie Endowment for International Peace; Author, Bytes and Bullets: Global Rivalries, Big Tech, and the New Shape of Modern Warfare
- Distinguished Visiting Professor of the Practice, Keough School of Global Affairs, University of Notre Dame; Former Senior Executive, Central Intelligence Agency
Presider
- Staff Writer, The Atlantic; CFR Member
Panelists discuss how artificial intelligence and autonomous drone systems are reshaping the battlefield in conflict, and what these technological developments reveal about the future of warfare and the limits of international law.
RYAN: Thanks very much. Welcome, everyone. My name is Missy Ryan. I’m a staff writer at the Atlantic and I’m really excited to be joined here by three fantastic panelists: Jack Shanahan from the Center for a New American Security; Steven Feldstein from the Carnegie Endowment for International Peace; Amy McAuliffe, who is a professor at Notre Dame and former CIA senior official. And we are talking today about “AI, Drones, and the Iran War.”
We are now more than three months into the war that President Trump launched against Iran in late February. We’re in a protracted state of what sort of resembles a ceasefire being tested on a kind of continual basis. It’s a conflict that I think is especially significant because we’re seeing new ways that the United States military is putting into use these high-powered AI tools and systems, and some of its new autonomous weapons, on the battlefield. So it’s a moment where many of us who are in the national security space are actually seeing—getting to see and understand what military employment of artificial intelligence and autonomous weapons looks like for the first time and how the Pentagon is or is not employing the weapons of this new era of drone warfare. Admiral Brad Cooper, the U.S. Central Command commander, has confirmed that the command is using AI in various ways, but noted that humans make the final kinetic decisions.
So, with that, we’re going to dive in. We’ve got thirty minutes of moderated Q&A and then we’re going to open up to the audience. I’m sure that you have lots of questions.
I’m going to start by asking each panelist to tell us one way in which this conflict—that involves the United States, Iran, and Israel—ways in which this conflict is showcasing novel uses of AI or drones/unmanned weapons. Why don’t we start with you, Steven? Steven.
FELDSTEIN: Great. Thanks for—thanks for having me.
So the point that I would make in terms of the Iran war and what’s unique is I would use the word “scale.” And you know, I think that applies on a—in a lot of different ways, but in particular it applies to the AI targeting that we’ve seen under Palantir’s Maven Smart System. What I think is unique about that—this is not the first time that AI targeting has been used. In fact, we’ve seen many instances of that in other recent conflicts, including the ongoing conflict in Ukraine as well as Israel’s war in Gaza and Lebanon. But what I think is unique here in the Iran war is that we haven’t seen it so fully transparent and used at such—with such a large reach and with such scale.
So we know the figures are something like over 13,000 strikes have been conducted with AI assistance. You know, we know that the pace of that has been unmatched: The 13,000 strikes have taken place within the course of about thirty-eight days. And I think that’s a demonstration not only for the larger public, but particularly for other militaries, showcasing the utility of these systems and how they can be effectively used in a modern conflict by a major military.
RYAN: All right.
I’m going to go to you, Jack. What is the novel usage that you’re seeing?
SHANAHAN: Yeah. So I want to give—maybe a different view on this is the technology itself that Iran is using is not novel. What is novel is essentially they decided at the beginning of the war that they were going to cede the conventional fight to the United States. They were not going to try to fight military on military, air force against air force, navy against navy. They knew they would lose that fight, and they did. They were decimated right off the bat. What they chose to do, though, was to use thousands of very low-cost drones that, by the way, did not have AI capability, as far as we can tell, but they used them very effectively to decide to fight a different kind of conflict, sort of an economic war as opposed to sort of a conventional military conflict. And then—so that’s one part of it.
And then perhaps narrow down into what I think is a very novel, unfortunate use of the drones, is to attack in this case an Amazon datacenter. And this is a case of going after what we would say typically is a dual-use infrastructure. Is it being used for military purposes or civilian? Well, you could easily make the case it could be both. So, as these lines get increasingly blurred in the future between military and commercial use of something—in this case, a datacenter—then you’re going to see states less and less reluctant to go after what we would normally say you shouldn’t hit that; that’s commercial infrastructure. Whether or not they were making a point and decided not to do more strikes than they carried out, it reverberated not only through the region but globally when Amazon suddenly—(audio break)—Iran did hitting hotels in Dubai that were more commercial. But this case of dual use with a drone that did not have AI but undoubtedly had precision coordinates, perhaps supplied or assisted by Russia and/or China. So the use of the technology in creative ways, we have to say, really made an impact that will last long after this conflict is over.
RYAN: Thanks, Jack. And I do think, hopefully, we can come back after this first question to some of the terminology and definitions to make sure we’re all on the same page about that.
Over to you, Amy.
MCAULIFFE: Thanks. Thanks to CFR for having me.
I’d really focus on the adversary collaboration and the proliferation angle. So I completely agree with Jack. Iran has done an effective job waging asymmetric warfare using its missile force and essentially its, quote, “dumb” drone force, meaning the bulk of the attacks have been done by preprogrammed attack drones. Think of that bat-winged Shahed-136. Most people have seen pictures of that.
But the proliferation angle is that starting in about 2022 Iran supplied Russia with prefabricated parts, blueprints to set up its own production plant in Russia. So now we have Russia pumping out Shahed-136 attack drones, Shahed-131 attack drones under their own nomenclature. But what they’re doing is they’re enhancing the capabilities of the drones.
So to get at the terminology angle or one of the terms, what they’re doing is using some AI-enabled technology. So you can improve, for instance, the guidance. You can improve the targeting. And what we’re seeing, there’s some at least initial—and they seem to be credible—reports that Russia is now giving back or selling back to Iran its own drones, but that are more enhanced with these AI-enabled capabilities. And Russia, as we all know, has learned a fair amount on the battlefield in Ukraine. And it sees the utility of these enhancements, unfortunately, in its ability to terrorize Ukrainian civilians.
RYAN: Let me go to you again, Amy. Can you talk a little bit about—so you’ve talked about the some of the drone usage by Iran in this conflict? What is its strategy in the swarms that we’ve seen, in the horizontal attacks that it has? And just to clarify, what can the AI—the AI-enabled drones that Iran may be getting from Russia, what can it do that the pre-programmed ones can’t do?
MCAULIFFE: Sure, so I’ll start with the—with the first question, which—we’ve touched on it a bit. What Iran is doing is successfully waging asymmetric warfare, a term that’s thrown around a lot. But in this case they’re going after the vulnerabilities of the U.S. and its allies. They’re potentially and on purpose targeting the enablers of U.S. airpower, and they’re targeting Gulf oil and gas infrastructure.
So the other part of this equation—and you started to use the term—it’s horizontal escalation. So rather than going and vertically escalating with maybe more advanced technology, they’re taking the battle essentially to not just Israel but U.S. Arab allies in the Gulf.
You know, and in the first few weeks of the war, as Jack talked about, really, the drones were used as part of these large barrages of ballistic missiles, cruise missiles, and drones, and they were going after things like hotels in Israel and the Gulf states. Pretty quickly they changed to a strategy of using the drones—and they’re not that long of distance—to purposely target U.S. military infrastructure in the Gulf, and to target Gulf oil and gas infrastructure, with some success.
On the U.S. side of the house I found it very interesting, again, that they’re targeting some of the enablers for our air superiority, some of the U.S. tanker aircraft, going after radars and communication systems.
And on the Gulf side of the house they have had some success, especially in Qatar, in undercutting and having some destruction in the LNG—the liquid natural gas—facilities setting back Qatar’s ability probably for a number of years to continue producing LNG.
So what can they do with more capable aircraft? With some of the enhancements that we think Iran is working on itself and that could potentially be provided by the Russians, it can result in, for instance, terminal guidance systems, more precise navigation, easier to get to the targets, and a more automated way to get to the targets.
But I’ll stop there.
RYAN: OK.
FELDSTEIN: Missy? Missy, can I just—I want to extend something here.
RYAN: Please. Yeah.
FELDSTEIN: So it’s a very important point of, until pretty recently—and I say “recently” being the last couple of years—in general, sort of the intersection of the drone circle and the AI circle had very little overlap. That Venn diagram, that intersection, is now expanding—and it’s accelerating, the expansion.
We used to—people generally hear drones and automatically think AI. There has not been a lot of AI in drones. We’re seeing that acceleration between Ukraine and Russia as we speak. We still don’t see examples that I can point to globally of true AI-enabled drone swarms—like, intelligent swarms. What we are seeing in the case of Ukraine is maybe one operator controlling five, potentially up to ten drones simultaneously. That’s different than swarms. So the idea of—there wasn’t a whole lot of AI-enabled drone capability globally until recently, but I think we should expect to see that acceleration pretty dramatically over the next couple of years.
RYAN: But that’s not occurring on the Iran battlefield yet.
FELDSTEIN: No. No, it’s not. That’s actually a very important point. That’s the difference in the two conflicts.
RYAN: Right. OK.
MCAULIFFE: And I would just—I would jump in just with one final point. Like, for instance, Jack very appropriately brought up the swarming capability. That’s a capability that we know Iran since, like, 2021 or so was displaying at defense trade fairs, the idea that they were trying to go for some type of swarm capability, which at its utmost extreme—and I’m not saying this is coming anytime soon—it’s fully autonomous. So the drones themselves can do the surveillance, pick the target, and do the kinetic targeting.
I’ll stop there.
RYAN: So, Jack, I’m going to go back to you. Can you—and especially given your background with Jake and Project Maven, now Maven Smart Systems, can you just very briefly summarize for our audience here how the United States is employing a—what do we know about how they’re using it in targeting, data processing, anything like that? And how does this differ, for example, from what the special operators were doing in OIR in Iraq and Syria?
SHANAHAN: Yeah. It’s arguable that the U.S. is behind in this idea of what Mike Horowitz—a friend to CFR, of course, well-known—calls the era of inexpensive, precise mass. I think the United States, you could make the case that they’re behind on that. However, what the U.S. does extremely well is put AI throughout the entire kill chain. The idea of all-domain command and control, command and control being a secret sauce that doesn’t get enough credit, but how do you actually direct this entire force in all domains—joint and combined precision, stealth, all the things United States bring(s) to it? There are elements of AI throughout that entire process. It may be in an individual weapons system like in Aegis, that does have AI—whether or not it’s sort of kind of old, traditional AI, doesn’t matter; it’s there—and then AI being used specifically in Maven Smart Systems.
So you’ve got this idea of Claude as the large language model being integrated into Maven Smart System. And what Maven Smart System is doing is taking a hundred and seventy different sources of information intelligence and using some AI in there to do the things we were doing back in the early days of computer vision, which is object detection, classification, and tracking. So the core of that still resides at the center of Maven Smart System, but it’s doing an awful lot more than that. How much of that is AI-enabled, and how much is just sort of ontology and correlation and fusion is a fair question to ask. But there’s no question that the U.S. is using AI through almost every element of the kill chain, from the intelligence piece of it all the way to the actually targeting piece of it, in some of the weapons systems that are being used.
RYAN: Just really quickly, just to clarify, you know, to set the table for the audience, the policy, as, as I understand it from—Pete Hegseth has reminded Congress of this in recent weeks, and then we can talk later about the execution of that, but the policy is that the United States will retain, or the U.S. military retain—isn’t it appropriate human oversight? And that’s colloquially known as humans in the loop. Can you just very briefly tell us what that is?
SHANAHAN: Yeah. So the governing directive is called 3000.09. It’s autonomy and weapons systems. And just to be very upfront about this, there is nothing in that document that prohibits the use of AI-enabled lethal autonomous weapons. It does call for appropriate levels of human judgment, and you can parse that all day long what that means. But below that, it has—that directive has quite a bit of restrictions, or constraints and restraints, built into it to say: If you’re going to do this idea of independent target selection by a(n) autonomous system—and that’s the difference, not a pre-planned target, not going after a coordinate; independently selecting and targeting—then you have to do all these things. And that includes two different senior-level reviews within the Pentagon before system design and before system fielding. So it does not prohibit the use of lethal autonomous weapons systems, including AI-enabled, but it does put significant constraints on to make sure there is this thing that we would call appropriate level—or, appropriate levels of human judgment, which in the international community they’ll often hear the term “meaningful human control.” And those two are different terms, but they still get conflated quite a bit.
RYAN: And just one point, Jack. When you say independent nomination or whatever the word you used was, that’s a reference to what many people have deduced looking at this conflict, which is that to the extent we think AI is playing a more decisive role in target selection or targeting generally, it would be in the dynamic targeting—
SHANAHAN: Yes.
RYAN: —versus the deliberate targeting. Just for the audience, the deliberate targeting would be something, you know, that is nominated and vetted ahead of time, designated as a target. And the dynamic targeting was something that, you know, operators see on the battlefield, someone emerges from a bunker, or something like that.
SHANAHAN: Yeah. And very quickly—very quick on this, because Steve probably may weigh in on this on the Israeli part as well because there’s some similarities here, to make the distinction, as you just did, Missy, on dynamic targeting.
So there’s target discovery, which could be AI-assisted or -enabled, but then there has to be a process to nominate the target to actually be struck. A human, usually the three-star component commander, will then decide whether or not to hit the targets that have been nominated. That process could play out extremely quickly in dynamic targets, single-digit minutes. So the question really becomes down to a policy question about what guardrails are put in place for human review of these maybe AI-enabled target discovery process. That’s the distinction. Like, a human still should review and approve targets to be hit, how the targets are found in the first place.
And then I’ll just leave it at this, that if you really put a lot of pressure on the system, on the command, to find more and more targets, now you could be in a case of are you concerned about rubber-stamping targets,
RYAN: Right? And I think it’s—I want to get to that a little bit later about the oversight points in the process that may be changing.
OK. Steven, over to you. Can you talk about how Israel is using AI-enabled warfare in this conflict, and how that differs or does not differ than what we saw in Gaza with lavender, and some of the concerns about accuracy of targeting that came up during that conflict.
FELDSTEIN: Yeah, absolutely. And let me just quickly build off a point that Jack made in terms of dynamic targeting. I mean, I think what’s important to emphasize for the audience is that what AI—I mean, you have many more streams, much—many more streams of information and data sources coming in when it comes to identifying and acquiring visibility and details about particular targets.
And you know, I think in the past one of the big challenges was that it was impossible in a rapid manner to process and analyze and derive insights that could go into, potentially, a target-generating package for strikes, because there was just too much information going through. There was too much noise. I mean, we’re talking at different points about, you know, I think in Israel there was something like a million phone calls a day were collected and were—needed to be housed in a server, right? I mean, it’s impossible to kind of sift through all that information without using some type of AI-enhanced analytic ability to process information. So that’s one key thing that these new capabilities, whether it’s Claude paired with Maven or something else, allows you to do—to actually derive insights in a much quicker way, process those, and then generate targeting packages that can be signed off on by different human operators.
Now, I think one of the questions, whether it’s with Maven or whether it’s in the Israeli situation, is what is enough time to allow for sufficient and meaningful human review? Is it seventy seconds? Is it a couple minutes? To what extent do you need to dig deeper and get greater granularity in terms of how these targets were generated or nominated, you know, for review? And I think that’s where a lot of the questions come about.
In the Israeli context, you know, what we—what we understand is that—and this is coming from reporting and so forth from the Guardian, +972, Call, other sources along those lines. But you have several—a couple different systems, Gospel and Lavender, that have been employed by the IDF in Gaza and have been doing this sort of—has this sort of same dynamic at play where, essentially, you have a significant number of targets that are nominated, dynamic targeting, whether it’s for individual suspects—so, Palestinians who potentially are suspected to be part of either Hamas or a related militant group—or, in the case of Gospel, I believe, in structures that are viewed as being assets for Hamas that are then nominated for being destroyed.
And you know, the question, similar to Maven, has been: To what extent are there errors associated with that process? To what extent is the system as accurate and precise as it needs to be, even in terms of what’s nominating them for human review? And to what extent is there meaningful—is there a meaningful opportunity for humans to then look through this information and say, yes, that’s a legitimate target; let’s go ahead and strike? And there’s been, you know, reporting that shows that that period of time for review has been pretty quick, and that’s where some of the questions have evolved in terms of that that question.
RYAN: But, Steven, do we know what—whether Israel is using those same systems in the Iran context? Do we know anything about that?
FELDSTEIN: Again, you know, Jack or Amy might have, you know, better sources on this front. I’ve heard anecdotally that they are, that they are using similar AI targeting systems that they’ve been used—been using in Gaza. But you know, that’s just from outside reporting.
SHANAHAN: Yeah. There’s no question in my mind. They built the system to do—to do war. A crisis, peacetime, war, they’re going to use it no matter what the conflict is. So, yes, I fully expect they’re using versions of it, whatever versions they’re on now, to do similar targeting support against Iran.
MCAULIFFE: I’d just make a general point—and this might help tee us up, perhaps, for later questions—is some principles of targeting, whether you’re in DOD or in the intelligence community, have to do with the certainty that the target is who you think the target is and collateral damage. There’s also debates, I think—ethical, moral, and utility-wise—about what level of person you might be targeting, right, senior Hamas official versus someone who might be, like, an errand person. And I think the faster and more dynamic targeting that you’re involved in, it’s hard to adhere to some of those overarching principles.
I’ll stop there.
RYAN: Yeah, no, that’s great, and we’re going to move into that now. I do think it’s just maybe interesting to note that Admiral Cooper, when he was pressed in the Senate in the last couple of weeks about potential U.S.-caused civilian casualties in Iran, he pointed to what he said was a number—a dozen at least—incidents that he believed were results of partner-force munitions, or that they had only—he very strongly implied that a number of the incidents that were flagged, essentially, by the media or by open reporting were the result of Israeli strikes. And I think that that, you know, as a joint conflict then raises these other kind of accountability questions for the U.S. military.
So let’s jump, actually, into that. I’m interested—from any of you, please jump in—what are you seeing in Iran that deepens your kind of going in concerns, your preexisting concerns about the potential misuse of these new technologies, either AI-enabled warfare or some of the asymmetric drone warfare—about the misuse or negative consequences of these technologies? You know, for example, we’ve already heard people like Frank Bradley, the SOCOM commander, express concerns about AI use. There isn’t a great deal of detail that we have about the extent to which, if at all, some of the moments of doubt and skepticism and oversight that are built into the targeting process are being now automated and potentially foregone in this system.
SHANAHAN: Yeah. I’m happy to start because I have something to say on this one is—the first one I’m most concerned about would be the temporal dimension, the pressure to—every conflict the U.S. has ever been in, after about two weeks of going through the deliberate target list you start hearing things like find more targets, get more targets. If AI is now playing an integral role of that, then there’s going to be a pressure of more and more.
The idea of essentially bragging that you’re hitting a thousand targets a day, I have questions about that. It’s to do what? Maybe you only need to hit ten targets if they’re the ten targets that collapses the regime. So we should be very careful about just number of targets connecting ends, ways, and means. So that would be my first one, is this time dimension to it.
The second piece—and I think there’s a lot of unknowns here—is the integration of frontier models into anything related to combat operations or intelligence analysis. I would prefer that there is a period of sort of experimentation, sandbox use of these systems, because they’re coming straight from a commercial company. The companies themselves have said: We don’t think these systems are mature enough to do certain things, so be very careful. We recommend to the Pentagon that you do your independent evaluation, third-party assessment. I’m not convinced that is happening before these things are being used in combat, really for the first time. Now, they may be used for some mundane things like help me write an order or something like that. But when you start talking about target prioritization or course-of-action development, we really have to start thinking about the impacts on the humans, whether you call it cognitive surrender, cognitive offloading, reliance too much on the machine results.
These are serious questions. These are no longer you and I using these systems at home; you’re using them for life and—life-and-death consequences. So we ought to have some deliberate method to evaluate them, to govern them, and to understand how they get integrated into the system. Those would be two of my big concerns.
RYAN: Yeah, no, just really briefly, anybody could go on the internet and on YouTube and watch Cameron Bradley, the CEO—
SHANAHAN: Yeah, Stanley.
RYAN: —Stanley, excuse me—talk about—on a Palantir presentation talk about—walk through kind of the model that you would go through in order to approve targets. And it is very much like you’re going through, you know, your kind of HR training, and you get—it seems like it would be very easy to click—
SHANAHAN: Right-click to kill.
RYAN: Exactly.
FELDSTEIN: Yeah, four clicks. It’s like a CRM, business software that’s repackaged for military use. It’s very odd.
I mean, let me just build on one point—put a finer point on what Jack said, which is that, I mean, one of the things that we have heard from reporting is that there is something like a 10 percent error rate associated with the strikes that have taken place in Gaza using the AI targeting systems like Gospel and Lavender, which are similar to Maven, that we’ve seen. Now, I don’t know what the targeting error rate is in the Iran war. That information either is classified or hasn’t been done. But I would say that it would be incumbent to provide a greater level of transparency to test and understand as an after-action: How well did these systems perform? To what extent were the targets that were generated legitimate targets? To what extent were they struck, and struck effectively? None of that to me is clear. And until we have a better sense of how well these systems are actually working in warfare, in a—in wartime conditions as opposed to sort of thinking about them more in isolation or abstractly, I think to me, like, those are—those are grave concerns.
MCAULIFFE: And then I—oh, go ahead.
SHANAHAN: And I just want to echo—sorry, I’ve just got to say how much I strongly agree with that, to include there should be—there must be a comparison of human performance, machine performance, and then the integration of human-machine. That has to be part of the after-action.
Sorry, Amy.
MCAULIFFE: Oh, no problem.
I would agree with everything that Jack and Steven just said. I would just maybe make a more strategic point. And it’s just the moral and ethical point. When regular, old kind of dumb drone warfare looked like it was going to be on the horizon, you know, people pointed out it’s a lot easier to not consider the moral, the political, the economic consequences when you’re waging warfare maybe very remotely or somewhat remotely.
And there are human beings involved here. It’s decisions to kill human beings. And I’m concerned that the battlefield is marching forward before we have any real discussions between and among countries like Russia, China, Iran, and the United States. Certainly, there are discussions about lethal autonomous weapons, et cetera, but nobody really wants to sign up to a prohibition, which may make sense. But I think it’s really important to get the discussions going in a significant way. And it means sharing information, so it’s quite difficult. I’ll stop there.
RYAN: I’m going to sneak in one final question before we open it up, and maybe just for one person to answer. How far is the U.S. military from having the counter-drone systems, counter-UAS systems that it needs to handle this new—this new—the scale, the new usage? Anyone want to take that? Jack?
SHANAHAN: I don’t think there’s going to—I don’t think there’s going to be a finish line. I think it’ll be constantly evolving. They have individual systems in different types, but they haven’t fully figured out how to integrate all of them together—directed energy, nets, very old-fashioned drone interceptors going after other drones. The idea is, how do you integrate all of that together?
RYAN: Great. OK. So we are ready to open it up to audience questions. We are going to get questions from our online audience. And please remember to identify yourself and make sure the question is a question.
OPERATOR: (Gives queuing instructions.)
RYAN: Think we’re ready for our first question.
OPERATOR: We will take our first question from Ken Morse.
Q: This has been great. Thank you, guys, for enlightening us.
Most of what you covered was above ground or on the ground. A rapidly growing area of interest is autonomous underwater vehicles, who can’t communicate back in anything like real time before they have to make a decision. Could you comment on AUVs?
SHANAHAN: Yes, I’d love to comment on that. There’s a lot going on in this space. I was associated with an IEEE subcommittee that spent over a year looking at a scenario, and this was the scenario, because it brings in a number of challenges. The most important, I think, is what you’re—you just suggested, is the ability to communicate with an underwater system that is operating autonomously. So there are some things going on with this. DIU, Defense Innovation Unit, on the West Coast, has a project called Project AMMO. And that is part of what that project designed to do, is you’re putting AI on a UUV, an underwater uncrewed vehicle, and then figuring out how to pass updates to it.
There are different ways they’re exploring to do it. One of them, of course, you come up and you’re able to transmit while it’s out there. But absent an effective way of timely updates—first of all, any AI model on a system drifts over time. It gets exposed to new operational data. It just naturally is never going to perform exactly the way you intend it to. So you have to assume you will need updates, regular periodic updates. How to do that is going to be a challenge. And I think it’s easy to say it’s a very big ocean and small UUVs. Not going to collide into things. But they will. And the idea of sending it out with one version of a model and getting updates to it while it’s underwater is going to be one of the most difficult challenges to work through. But the good news is the U.S. military is doing that right now as we speak, figuring out how to push those updates into it, because more and more we’re going to see the U.S. Navy begin to integrate not just the surface UUVs—or, surface vessels, but also the underwater piece of it.
RYAN: All right. I think we’re ready for the next question.
OPERATOR: We will take our next question from Larry Rubin.
Q: Hi, there. Larry Rubin from Georgia Tech.
My question is really about how to try to figure out comparing human error rates on similar types of operations versus machine ones. Because it seems like the discourse has changed considerably. It’s always about what the error rates of, say, the machine is. And how do you have that type of conversation without it sounding like you’re justifying some type of action there? And I’m just curious some of your thoughts on this.
MCAULIFFE: I can make a quick introductory point, and then my colleagues should jump in. I would just say it’s a matter of course. Different parts of the U.S. government/the intelligence community do do battle damage assessment for lethal strikes. And a best practice is to have that done by the unit that is not the operational unit so you get a really clear picture of did you hit the target that you were seeking to hit? And if not, why? And it gets back to those issues that I raised earlier. A degree of certainty about the target that you had, generally not wanting any or low collateral damage. So those processes are in place. I think an interesting question would be, is there a way to work kind of AI into that type of analysis in a way that helps kind of the unbiased, independent analyst make perhaps more accurate judgments? I’ll stop there.
RYAN: Or maybe use it for the investigations, the civilian casualty investigations, that the, you know, center of excellence at the Pentagon or the CENTCOM civilian casualties cell.
SHANAHAN: Yeah, I’ll just—I’ll say I agree. Larry, it’s really—you framed the question the right way. I used to think about this all the time in the early days of Project Maven and the Joint AI Center. Is we sort of assume that humans don’t make a bunch of mistakes, and machines should be held infallible. So the question is, where do you set the bar? And there are times when I felt like we were trying to set the bar higher for machines than humans. And I understand why you would do that, without trying to make the conclusion or draw the leading—or ask the leading question about shouldn’t machines be held to a similar standard, as opposed to a standard that they can’t meet?
So the only way you can answer that question is data-based evidence. It cannot be heuristics or instinct. Well, I think. It has to be—it has to be based on real data. And as much as I am genuinely concerned about automation capture, more and more so as frontier models get more and more advanced, I am equally concerned about human biases. There are people who write books on biases in the intel community because we’re humans and we have all sorts of biases. How you filter all that out and compare human-machine—and, I believe, one of the best things we ought to be doing is where does human smart machine interaction look like, and can we really get the best out of both? I think the answer to that is yes, but I’d like to see more evidence. That, to me, is a little bit of an open book.
RYAN: I just want to ask a quick—
FELDSTEIN: And just to—
RYAN: Oh, go ahead, Steven.
FELDSTEIN: Yeah, I was just going to say, just to add, you know, I mean, I think one of the issues is that we seem to be kind of moving kind of in a more binary fashion, where you either say it’s human targeting or it’s machine and fully autonomous devices. And I think trying to kind of find somewhere more in the middle where you’re sort of, you know, continuing to keep the human in the loop and enhancing what they’re able to do with AI technology would be a more ideal scenario.
I also think on the point, Larry, that you mentioned, which is that while we ascribe a higher standard to machines, I mean, that that may not be objectively rational, but it’s an emotional attunement in terms of how we approach these things. I mean, you look at, you know, autonomous—you know, self-driving cars. I mean, we already know from the data that we have that they’re probably far safer than—you know, in the aggregate, than humans driving cars. And yet the standard that we set is so significantly high because of our discomfort allowing machines to fully control something. And you sort of translate to combat operations, and I think that same emotional view is going to remain fixed in place.
RYAN: I just want to follow up, Jack, as a former senior Air Force officer, what are the points in the targeting chain where humans typically use judgment that are now being potentially taken over by AI in ways that you feel like could eliminate the skepticism and doubt that’s baked into the system? Is it collateral damage estimate, like, pattern of life? Please talk about that. And why do you—why do you think that the Minab strike, the school strike, was not AI error?
SHANAHAN: Well, let me address that first. And I have no—I don’t have a lot of facts associated with this. My instinct is based on thirty-six years in uniform being involved in every single part of the cycle for targeting—from the delivering weapons on the on the fighter aircraft end to having responsibility for the Air Force’s premier targeting squadrons that did the targeting and building JASSM targeting for a living, to being in the Pentagon trying to revive a moribund target profession which has sort of gone dormant because of the kind of fight we were fighting in the counterterrorism, counterinsurgency fight. That was an uphill battle, by the way. I think we’ll see this ties into the school strike. We probably didn’t have enough targeting experts in Central Command. We don’t have enough globally, period. But what I think is it was a deliberate target in an existing database, probably had newer imagery but the newer imagery might not have been so different from the previous imagery because, essentially, you had the school move into an IRGC facility that humans did not catch that mistake. And there’s no other way to describe this other than a tragic—it’s a tragic mistake.
I would like to believe—and this is more opinion than anything else—that in the future that AI could actually be used to flag there are additional sources of information available, such as Google Maps, that show clearly a school is at that location. But that might not have been taken into the Maven Smart system. I don’t know. That’s speculation my part. But I see enough there to suggest that wasn’t—AI was the root cause of this. In general, there are so many different ways—Steve was getting to this earlier—there’s just too much information for humans to process, beginning in the targeting cycle itself, that AI could help get through that information and correlate, infuse, and say, this is a target, that’s not a target, here’s why it’s not a target. And then as you get ready to nominate the targets for approval, all of that—there should be a second and third check of that, that AI could be very helpful for.
I’ve watched the intel analysis, it’s been years now, but do the collateral damage estimates and civilian casualty estimates using pretty good automation programs. They could be AI-enhanced to make those even better. What I’d love to see is a human targeteer, a weaponeer, and intel analysts, have a little bit more time back in their lives to review things. I don’t know. I used to say that all the time. I’m not sure now, because the pressure has become so intense to do more, do more, do more. Now we’re into, is there a danger, as I said earlier, of kind of rubber stamping this. We ought to be very concerned about that piece.
RYAN: OK. Let’s take our next question.
OPERATOR: We will take the next question from Ken Kraetzer.
Q: Oh, good morning. Ken Kraetzer, CaMMVets Media.
Yeah, I was just going to ask about the school strike on February 28, because it sounded like there were a thousand strikes done that day to launch the—you know, the war, the campaign. And my concern was that the thousand sites were picked by AI. And was there a human operator in the military who checked each one of those? You know, interesting comment about difference in imagery and not taking mapping that might have been commercially available into that. But do you think it’s a premise that they would have a human military member or civilian employee check every designated site for a large campaign like that? As you said, probably very little time to put together.
SHANAHAN: I do. I think it’s the going-in argument for any targeting team in any of the combatant commands. Absolutely, positively, must be. Now, Steve probably want to segue onto what I’ll end with, which is go back to the discussion we had earlier. The question is, how much time? How much time is considered sufficient to review and approve that? If I’m the three-star component commander, I am relying on my team of professionals to say, why are we hitting these targets? So, those may be very brief conversations, except for extremely sensitive targets that will require potentially four-star approval, maybe even all the way to the secretary of defense or to the White House, depending on how sensitive they are. So what is considered deliberate target, in this case being an IRGC facility, or at least thought it was, probably was approved very quickly because it was part of the standing target deck. That’s my assumption of this, but I’ll leave it at that, Steve.
FELDSTEIN: Yeah, no, I mean, I agree with what Jack said. I mean, I think essentially what we’re looking at—especially if you’re looking at a thousand strikes in the of course a day, is approximately about seventy seconds to make decisions, if you’re kind of cycling through, like, rapidly one after the other. And, you know, the question I think there is, whether it was AI or whether it was human error, either way there was—there was information that was presented, and there was a—I think, a bias to sort of accept that as it stood, without kind of having the time to kind of dig a little further and say, wait a second, how old is the information that’s here? Are we sure that what looked like an IRGC facility back then, when the information was sort of compiled, is still one today ten or twelve years later, as we’re about to authorize a strike? Was there either enough time to sort of dig beneath and ask that question? And was there a predisposition to question what was coming in? Say, wait, something doesn’t quite feel right? And I don’t know how that kind of works in the split seconds or the few seconds needed in terms of making that determination. Either way, you know, a tragic error occurred.
RYAN: Let’s take our next question.
OPERATOR: We will take the next question from Lawrence Wright.
Q: Lawrence Wright with the New Yorker.
I’m interested in, you know, the use of drones, possibly in a domestic sense. Naturally, I look at all this technology we’re developing and wondering what Osama bin Laden would do with such capabilities. And I look at our cities in America. They’re totally undefended for such an assault. And our focus has always been abroad with these weapons, but how do we protect ourselves domestically from such drone strikes?
SHANAHAN: I’ll start, because I could not agree more with the premise. I am genuinely surprised, I’ve even used the word shocked, that we have not been hit yet in this country with the equivalent of what the Ukrainians did with Operation Spider Web, which is go deep into Russia. For all its fancy technology, I’m not convinced Golden Dome will solve this problem. There has to be a lot of local, very local, measures put in place to detect the potential drones to counter the drones. This is going to be a massive national coordination issue. If you go—remember back to when drones showed up in New Jersey, they showed up on Langley Air Force Base.
I’ve talked to the four-stars both at Air Combat Command, Northern Command at the time. They flagged this. I think, both in retirement, they were on 60 Minutes saying they’re uncomfortable with the state of the ability for Northern Command to prepare for this, because there’s so much that has to be done. We have been so focused on the outside piece of this, we’re not nearly as prepared for the inside threat of this. So a lot more has to be done, but it really does need to integrate very local level, state level, federal level, National Guard, Northern Command, all these different commands. Going to be a coordination issue.
MCAULIFFE: I’d just make one overall point, which is you’ve pointed out an important phenomenon which is the commercial, off-the-shelf aspects of many of these drones. And, for instance, with the Shahed-136 and -131s that have crashed in Ukraine, when they’ve been exploited, what’s been found? Commercial, off-the-shelf parts from Europe, from China, from the United States. So one can buy these piece parts and, with some level of technical knowledge, assemble them. So it’s an important consideration.
FELDSTEIN: And let me, just to build off what Amy said, you don’t even need to have a Shahed drone. You can do it with a much smaller quadcopter. You can retrofit that with some type of explosive device. You can send out ten, fifteen, twenty of them. It doesn’t cost much money. And that’s where so much of the risk is. It’s this dual purpose, consumer grade, cheap, off-the-shelf devices that can be retrofitted pretty easily with home shops, 3-D printers, and so forth. And I think that’s a huge vulnerability.
RYAN: And that’s actually arguably how Ukraine kept itself alive until it got its military-grade drone industry online—the quadcopters with, you know, grenades attached to them. We have another question. Let’s go ahead with that.
OPERATOR: We will take the next question from Cynthia Roberts.
Q: Thanks very much for doing this.
I wonder if you could say more about the Russian, Chinese involvement in Iran in terms of helping Iran, supplying intelligence. Of course, the Kremlin has said if we would stop doing that for Ukraine they would stop doing it in Iran. That’s not an acceptable answer for the U.S. Do you see it accelerating, expanding? And what can we do to counter it? Thank you.
MCAULIFFE: Yeah. I’ll jump in here. Very important point. I raised one aspect, which is the kind of technology sharing between and among Iran. It’s not just limited to drones. There’s been missile-related cooperation. There’s, for a long time, been civilian nuclear cooperation. If Iran were to restart its nuclear program, I think we should be concerned about the potential for Russian assistance there. That is a relationship, I think, that has proven useful, you know, to both parties. And I think a main case in point is the drone-related cooperation.
In terms of China, there definitely are valid concerns about the Chinese providing technology to the Russians for Russia’s own drones that they use in the Ukraine conflict. China just tends to do a better job, whether it’s government-sanctioned or it’s some degree of plausible, or not so plausible, deniability, staying a few steps away from some of the commercial, off-the-shelf, and even maybe military transfers that are going to countries such as Russia and Iran. And the last thing I’d add, and my colleagues should jump in here, I mean, I view as very credible the reports in the press that the Russians have also provided imagery and targeting-related data about U.S. forces and U.S. allies in the Gulf to Iran.
FELDSTEIN: Maybe to make one point in addition, which is that when it comes to Chinese involvement what I think is interesting is that, especially when you look at the components that they supply, it’s not just to the Russians. It’s also to the Ukrainians. In fact, there’s all sorts of reporting out there that shows that there are factories in Shenzhen that are essentially rotating out different delegations of Ukrainians and Russians who need to buy, you know, different navigation, you know, units, other sorts of aspects that are essential to drones, purchasing those, and then bringing them back and incorporating them into their—into their factories.
So, China, in some respects, is kind of becoming a factory to the world when it comes to these essential component parts, and is currently selling to both sides in that conflict, but also to the Iranians and many others as well. I mean, to me, this supply chain issue, particularly if we’re concerned about, you know, potential perspective conflict between the U.S. and China in the future, is another kind of big vulnerability that we haven’t gotten our hands around. And, you know, the idea that we can sort of simply substitute it out, or that we can just sort of rely on more expensive components, I think, misses the bigger picture in terms of what that threat vulnerability looks like.
SHANAHAN: And, to break out the cliché of the enemy of my enemy is my friend, there’s no doubt China and Russia are gleefully accepting an opportunity to try to bring the United States down a peg or two, while at the very senior national level saying, not us. We’re not doing it. But the reality is, they’re helping. The question is, how deep does that assistance really go?
RYAN: I’m going to ask a quick question before we move on to our next audience question. How much do you all think the regulatory—the kind of light regulatory or non-regulatory environment that the Trump administration has embraced up until, you know, just last week they had this new AI order, and then the rhetorical environment—you know, Pete Hegseth talking about, you know, gloves off, and, you know, warriors are not going to—you’re going to have lethality—maximum lethality, not tepid legality. How much does that environment matter for technologies that are really being—at this moment, where you’re having these kind of test cases, test uses for these new technologies in a major conflict?
SHANAHAN: It plays into it, with no ifs, ands, and buts about it. What it gets at, though, to me, there’s this broader global concern about racing dynamics. When you hear over and over again how fast the U.S. military is moving with AI—I’ve heard from my counterparts at these track two dialogues on the Chinese side especially, so international non-government dialogue. All they’re hearing is about how much AI the U.S. is using against Iran. They’re now—their assumption is you’re going to do it everywhere, up to and including—this is where it gets alarming—in the nuclear. When we’re saying, no, we’re not—we’re not going to do that, but the headlines are the headlines.
And the idea of go fast, don’t worry as much about the testing and evaluation piece of it, it’s all about lethality, that is a global message that transmits not just down to the corporal on the ground saying, OK, I have top cover to go do something I might otherwise want to do, but internationally with other countries, saying we have no choice. We have to keep up with this. And it brings in what I call is some global instability dynamics that are only going to get worse unless we start to put something in place, what Amy was getting at earlier, about whether it’s nonproliferation, guardrails, international agreements. They’re going to be hard to get, but you need something.
RYAN: Let’s have the next question. Oh, go ahead. OK, go to our—maybe—this is probably the last question or second to last question.
OPERATOR: We will take the next question from Audrey Kurth Cronin.
Q: Hello. This is Audrey Kurth Cronin at Carnegie Mellon University.
I want to say, first, that this conversation is much more advanced and interesting than many that I attend. And I want to thank all of you. I appreciate the sophistication. But let me just say that, beyond human-computer interaction and automation bias, I really do feel that there are very few people, at all levels of command within DOD, who truly understand the brittleness of these systems, how they work, what their flaws are. And the answers to their attitudes toward the systems—and that’s not just DOD, but at other levels of our government—have more to do with ideology and kind of nationalism, very related to the question you just asked, Missy, than they do with actual technical understanding, even at a basic level of how they work. So can you comment further on how to get us out of this ideological approach and into a much wiser approach faster than we are right now?
SHANAHAN: Audrey, I’ve read your books. And I really appreciate your take on this. In fact, I use dual use, but I think you’ve advocated for something like multiuse for a different term, but I accept it. It’s a challenge right now. One of the things—I’ll say that when I was there, before I retired—that doesn’t get headlines but I’m very proud of, is the DOD AI Ethics Principles, were signed out by the secretary of defense. We cared about responsible AI. One of the things that troubled me, as much as I liked about 80 percent of the DOD AI strategy memo that came out in January, was something in there that, one, conflated DEI with responsible AI—the two things that have nothing to do with each other—but also this idea of going fast at all costs.
And a line in there that really angered me is, “we will only use frontier models that provide the objective truth.” Whose objective truth? There is no such thing as objective truth with a frontier model. So that tells me there’s a misunderstanding of what the technology is even capable of doing. What I think we need are some senior people that have a better understanding of the technology, limitations in addition to what they think it can do, and to be somewhat forceful in inserting themselves into the conversation at the right point to ensure the test and evaluation, the benchmarking, all those things that that Jane Pinelis did when she was running test and evaluation for Maven in the JAIC was very focused on. So I don’t have a good answer for you, other than it’s a problem right now. We need to take a more objective look at the entire process, rather than relying on what seems to be a pretty hefty ideological component.
FELDSTEIN: And, just to jump in—and thanks for the question, Audrey. I, too, am a huge fan of your research and work, including your book, Power to the People. But I think we’re facing, from a broader perspective, you know, norm erosion across the board when it comes to the principles that we have upheld related to the international humanitarian law, the Geneva Conventions, and what we look upon as a rules of engagement when it comes to collateral damage. And I think not only do we have both an ideological perspective that is pushing a certain version of war that I think is very reckless and dangerous, and then we also have powerful technologies, many of which are untested, which are becoming increasingly disaggregated from human control. And we’re putting those out in places with very potential damaging effects as a result.
And I think that combination is toxic and worrisome. And I think that’s something that we—you know, whether it’s the U.S. and whether we can do something about it at this moment, or whether it’s other countries that sort of say, look, these are norms we need to uphold and disregarding what the U.S. is doing, we will still push forward these principles, either way I think that’s something that we need to take very seriously as an international community and think hard about.
RYAN: OK. We are two minutes out, so I’m actually just going to ask one final question. I don’t think we have time to go for another audience question. And maybe I will direct this to you, Amy. Do you have any thoughts or takeaways from the skirmish between the Defense Department and Anthropic, in terms of, you know, what does this tell us about the future fault lines between the private sector, the commercial sector, and the government’s increasing reliance on these new technologies?
MCAULIFFE: Yeah, I’ll make two points. During my time in government, I would say it was maybe about 2015, when at least on the government side of it, I’ll say, the intelligence community and DOD really realized the need for increased public-private partnership. It’s been in fits and starts. Sometimes the communications are good, sometimes we talk past each other. But future national security involves the private sector. And so I think some of these decisions have to be made deliberately and jointly.
And I would just beat the drum that I’ve been beating, and my colleagues have been the whole time here. War involves human beings. And it involves precious human life. And we can, and we should, try to compete with China and Russia. AI will be part of the battlefield. But I think it has to be done responsibly. We have to make it a priority to have these confidence-building discussions with the Chinese, and the Russians, and us. And I think ethics, ethical frameworks, human beings are going to die, have to somehow be part of the discussion.
RYAN: All right. I think that’s a great note to end it on. I want to thank my fellow panelists and thank the audience for being here today. Everyone, have a great day.
MCAULIFFE: Thank you.
SHANAHAN: Thank you.
FELDSTEIN: Thank you.
(END)
This is an uncorrected transcript.




