episode.ascii — live render

● episode

Hacking Security Camera AI

Interview AI Hardware & Physical Hacks Security Research & Vulnerabilities

2 Dec 2024

TL;DRResearcher Kazimir Schultz of Hidden Layer reverse-engineered the AI model inside Wyze security cameras to create an adversarial patch that makes a person invisible to the camera's person-detection system.

Can you trick the AI model running locally on a security camera into thinking you're a bird (and not a burglar)? We sat down Kasimir Schulz, principle security researcher at HiddenLayer, to discuss Edge AI, and to learn about how AI running on your device (at the "edge" of the network) can be compromised with something like a QR code.

Transcript

Machine-generated transcript; may contain errors.

Speaker 1: As somebody who has hacked one of these models now, I still think it's great that people are actually employing them. I am firmly in the belief that, yes, we should still keep using them because the benefit usually outweighs anything else.

Speaker 2: If you picture a computer network and you're looking into the network, you can think of your device as being on the edge of the network. It calls into and receives information from devices and servers that are deeper inside. For this reason, some people call computing done locally on that device edge computing. This is a whole category of thing. And if you do artificial intelligence tasks that way, locally on the device on the edge of the network instead of calling a server deeper inside, people have started calling that edge AI. There are a lot of devices that do some version of this. Maybe the cheapest, most accessible, is something like a Wyze camera. A Wyze camera is a security camera, so it relies on AI and machine vision for one thing in particular. A working modern security camera needs to be programmable to send an alert to the user if it sees a person skulking around whatever the camera is pointed at. But it can't send an alert anytime it sees motion, because birds and cars. So the model needs to be able to tell the difference between a person and not a person. It can do this in one of two ways. The camera can send the video to a server where it runs an AI model capable of distinguishing that's a guy, that's a duck, that's a burglar, that's a goose. It can run the duck versus guy model on a server, or it can try and do it locally on the device. And there are some real security and privacy reasons why this is preferable. It's a video feed of your house. How much do you want it sent to some server you don't really know about? But the question we like to ask, can you hack it? Without touching it. Because if you get close enough to a security camera to plug something into it to hack it, it's gonna see you. It's gonna know that's not a bird. Our subject this episode is Kazimir Schultz, principal security researcher at Hidden Layer, who took a run at this problem. He was busy all defcon giving a bunch of interesting talks, but I wanted to understand what he and his team did to crack the Wyze camera. Not by walking up to it and plugging something in, but by figuring out what the AI model running on it is doing to distinguish a person from not a person, and reverse engineering something that you could show that camera that'll make it think, oh, that's a bird, when in reality, that's a burglar. Like a QR code, but instead of bringing up a menu, it tricks a security camera into thinking you're not a person.

Speaker 1: If a person is in the camera with whatever bad thing they have in there, the patch, the camera does not detect the person even though a person is there. And then, ideally, we were going to have it set up in a way so that you aren't just, you know, carrying a bush or holding a tree in front of you, something that people around you might notice. Sure. Right. We wanted something really subtle, so somebody could, you know, come up, steal package off your porch, and you would never notice.

Speaker 2: So I called them up. This one veers a little technical on occasion. If that worries you, know that I am not that technical, and I found this process fascinating. This is hacking edge AI with our guest, Kazimir Schultz, here on hacked. Kasimir, thank you for joining me.

Speaker 1: Thank you. Thank you for having me.

Speaker 2: Okay. So we're here talking about the Wyze Cam and Edge AI. What before we get into the story itself, what led you to wanna research this?

Speaker 1: Yeah. So we were trying to see if there were any actual devices out there trying to use AI on device. So rather than calling a cloud server and the model actually being in the cloud, having it on device because that way we can actually attack it, see, you know, what people are actually using, trying to see if there's a way to really utilize these models in a malicious way or try to bypass them, especially with the new advancements in hardware. So people using MPUs, which allows, you know, these small devices to run low power AI models over a long course of time.

Speaker 2: And just so I understand, that's what edge AI is. It's AI being run locally on the device versus, you know, throwing it to the cloud.

Speaker 1: Yeah. So Edge AI is a term that, we were actually kind of originally looking for. So we were trying we actually had defined the term Edge AI ourselves. And then while looking around trying to find if there was any AI being run on the edge, we saw that Wyze Cam was marketing Edge AI. So Wyze Cam had actually marketed it and named it as Edge AI, which worked out pretty well.

Speaker 2: What's the, like, the etymology there? Why call it Edge AI? Like, are the devices somehow defined as edge devices? Like, what what does that mean?

Speaker 1: Yeah. So the way that the, WISE system used to work is that when a WISE camera detects any sort of motion, it triggers an event. And that event used to send a photo off to one of the WISE servers. So in which case, the camera was the Edge device and then the WISE server was the main, you know, server where everything was being sent to. And then the AI model would run on the server, see, you know, is there a person in the photo, is there a package, a pet, and then send the detections back to the camera. So all the processing was done off device. However, some people had privacy concerns. They didn't want their photos being sent to a server. So instead, the AI was actually put onto the device. So that's why they called it Edge Edge AI.

Speaker 2: And then for anyone that doesn't know, it's kind of intuitive at this point, but what what is a Wyze Cam? Like, what is the speed of devices?

Speaker 1: So they're these little budget cameras. They are fairly popular. I believe, don't quote me, but I believe that one of the, cameras has, I think, upwards of 70,000,000 sales on Amazon. And they're originally meant for indoor cameras. So watching your pet while you're gone, you know, nanny cam type work. However, since then, some outdoor cameras have been developed. There's cameras for doorbells, and they have developed a line of other products. But they normally the ones that we were looking at normally range in the 30 to $50 range. So it's a camera that's accessible to a lot of different people.

Speaker 2: We talked a little bit about what got you in this, but broadly speaking, why look at this device? Like, was there something that you you thought you were looking for, something you were trying to do when you started peeling this thing apart? Or, like, what what got you looking into this?

Speaker 1: Yeah. So first off was it was somebody actually marketing Edge AI. So there was a lot of AI models run on lots of devices. However, most of them aren't connected as well. And what's actually kind of a fun turn of events is that I'd actually hacked, the Wyze cameras a few years previously. So I had experience with them. So, actually, a lot of other hackers and reverse engineers out there that I know have worked with Wyze camera before because they publish their firmware online. And because they're on the cheaper side to buy, you can actually have a device. If you brick one, you can always get another one. But you can instead of having to try to extract the firmware yourself, you can just download it, start doing some reverse engineering. So just the prior knowledge of the device and having reverse engineered it plus the marketing from GI made it a really good use case.

Speaker 2: So they were cheap, and the firmware was publicly available. Why do you think they published the firm why do you think they give you that toehold?

Speaker 1: Yeah. So, it actually, I think, has worked up fairly well for them, just because they've gotten so many reports over the years. And, the other year, they were in Pwn2Own as well. But it yeah. So it's just just a choice on the company. I don't think it actually makes it doesn't make them more insecure. If anything, there's more researchers looking at them.

Speaker 2: Got it. And for anyone that doesn't know Pwn2Own, what's that?

Speaker 1: So Pwn2Own is events that happens, I believe, once a year. And these companies, they go out. They say, hey. We have this device. We want you to hack them. Everything from routers to even a Tesla a few years ago was part of Pwn2Own. And then, you get a big prize if you actually are able to exploit the vulnerability, the day of the competition.

Speaker 2: This is a total aside, but I know you you talked about this at Defcon, I I believe.

Speaker 1: I did. Yes.

Speaker 2: Did you see any of the similar like, did you see the Tesla that they had on the floor there in the Rivian? Like, did you did you walk around any of that stuff?

Speaker 1: No. No. Not too much. I actually had six talks the week of Defcon.

Speaker 2: Oh, dang.

Speaker 1: I was fairly busy.

Speaker 2: Okay. Fair enough. Yeah. I was not because I did not. So I spent a lot of time hanging out with those people. It was very it was very fascinating. Okay. So cheap. They're plentiful. The firmware is available online. You had a little bit of history with it. There's almost like a culture around hacking these things, and Y seems down for it. How does the investigation start? What are some of, like, the early discoveries? What what kicks this all off?

Speaker 1: Yeah. So as I had mentioned, they were part of Pwn2Own, which meant that they were publicly available exploits for older version of the firmware. And we actually had a few older devices laying around, so we decided we would try to see if we could hone one of the or, you know, get a shell on one of the older devices with that older firmware since we hadn't updated yet. And once we were actually on the device, our goal was to see if we could find the AI model. So that's kinda where we started our journey. So the older device we had wasn't actually supposed to have the AI model because the newer devices, were marketed as having AI. The older ones, you couldn't even enable it. But when we actually got on there, we saw there was a folder called edge AI, and there were some binaries in there. What happened though was that the folder didn't exist inside of the actual firmware. So we had initially reverse engineered firmware. That wasn't there. So we noticed that it had to be downloaded some way. And this Edge AI was actually actively being used by the binary of the latest cameras. So even though we knew that so the current camera and the older one would not access the folder. However, the new cameras all access the folder, but we couldn't see the folder on the new cameras. So we decided to poke around and reverse engineer, and we actually found that there were AI models on in that folder. So the way we reverse engineered them is, the AI model, instead of being by itself, was built into a shared object. So loaded up by an executable and then run. And after a bit of reverse engineering, we saw that there were a few layer names in there. So, AI layer names are going to be things like convolution or quantized, input output. And for reverse engineering that, we were able to see that we actually were able to get a model out there. At that point, we wanted to step back a little bit because we were concerned that maybe we're putting too much time into something that isn't actually on the newer devices the same way. So we decided to see how they're actually being how that folder is being downloaded onto the device. And what we did is we set up tcpdump, and then deleted the folder, and we found that there was a binary on the device called syncher that just redownloaded folders, and we just, you know, ran grep for the string for edge AI, found it that way. And having tcpdump, we ran the command. So it redownloaded the binaries. And then, what we'd also done is we had dumped the client secrets for the HTTPS traffic because it was HTTPS, not HTTP. And the way that you do that is on some Linux devices and other devices as well, there's an environment variable you can set. And then every time an HTTPS request happens, it saves the secret for that. And then you can just upload that into Wireshark where we were able to see that there were multiple calls. And in those calls, we could actually see where it was going to get the binary. So it was doing a call to see, based on this firmware version, what should I have. You know, firmware version camera. Then it would tell you where to download the devices and then a final call to actually generate a one time link that expired to actually download the content. So it wasn't just always hosted, which is pretty interesting. So the firmware version and the camera ID were something that we actually had for the newer cameras. So instead of trying to pawn the new cameras at first, we then went ahead, put the new, firmware version, new, camera ID in, and we were able to download the Edge AI directory for the new cameras. And what was actually really interesting was that it was completely different files. So, with the older version, it was libjzdl, where the binary is at one. And the new one, it was libvenus. And we did some online sleuthing. We found one open source repo that had some documentation for, Ingenic, which is the chipset that these cameras are based on. So we were able to see that it was a proprietary model format for that chipset because that chipset had the new NPU to run AI models on it. So we started rehearsing that as well. And, at this point, we had two different models, and we were able to see that they were fairly similar and that, they were actually based off of Yolo. And Yolo is a image recognition model. So the way that it works is that if you pass it an image or number of images, so a video, it will draw rebounding boxes around all the items that are in there and then classify. So you can see if there's a person or two people or a person and a pet. Right. Which makes sense since those were the detections actually coming back out. And then from there, we needed a way to actually run, the model. But the issue was because it was the proprietary format and it was built for the specific set of chips, we actually couldn't run it locally. So we'd initially tried emulating because it was a QEMU, operating system.

Speaker 2: You were you start you tried to emulate the entire camera so that you could run this AI model that you had, through that whole process, get that running on this emulation of the camera. Yeah. Got it.

Speaker 1: So we found out that it was a MIPS architecture. We got to the point that we were actually able to emulate the different binaries on the camera. However, we couldn't emulate the AI model because it needed that very special instruction that was only for that special CPU, MPU, and wasn't actually in chemo. So at this point, we were saying what to do next. Yeah. At that point, we needed to actually run on device. However, the newest AI model didn't run on the old cameras because the old cameras didn't have that special chipset. So what we did is we decided to find a zero day to get onto the new camera.

Speaker 2: Jesus, dude.

Speaker 1: Yeah. A lot easier to do once you are actually on a device. K. So instead of just reverse engineering and statically finding a zero day on the new firmware, we decided to see if we can find a zero day on the old camera and see if it still exists in the new camera.

Speaker 2: Cool.

Speaker 1: So what we decided to do is we decided to see if there was, instead of trying to find a really complex attack vector where, you know, you're trying to send traffic to some cool port or something, you know, what other people have actually checked into in the past, we decided to see if there was a simpler vulnerability that might not be as relevant to an attacker, but is really relevant to us trying to get onto the shell onto the device. So part of the camera setup process is you scan a QR code for your Wi Fi. So that has your Wi Fi SSID and your Wi Fi password. And what's really cool is that when you have your Wi Fi SSID, it adds that string into a command that it runs. So then it tries to find if that SSID is available, which also means since they're just adding the string in, if you have a semicolon or anything else at the end, you can add whatever other commands you want in there as well. And when we looked into the firmware for the new camera, we were able to see that it actually did exist on the new camera as well. So at that point, we were finally on the new camera, which is great. We were able to, just see all the detections, see that all the, all the files that we had pulled from the server were the same files on the camera, which was great because it meant we had actually done good reverse engineering and not lost all that work. So from there, now we were able to see that detections occurred and that the files existed. But what we needed to see was we needed some way to find out what actually were the percentages being returned. So right now, all we got was if there was a person in the photo that the camera saw, it would send a message to our phones. But that's not really useful if you're trying to find create an adversarial example like we did.

Speaker 2: It just so that people understand, when you say create an adversarial example, what is that what would that example look like? What would the negative what would the bad actor try and do that you were trying to recreate?

Speaker 1: Yeah. So as a bad actor, the adversarial example we were trying to create was that if a person is in the camera with whatever bad thing they have in there, the patch, the camera does not detect the person even though a person is there. And then, ideally, we were going to have it set up in a way so that you aren't just, you know, carrying a bush or holding a tree in front of you, something that people around you might notice. Sure. Right? We wanted something really subtle, so somebody could, you know, come up, steal package off your porch, and you would never notice.

Speaker 2: So when you say patch, you're talking about, like, a small physical thing somewhere on their person that would cause this camera to go whatever that detection threshold of I think that's a human, you're not gonna trigger that if you're wearing whatever this thing is. Cool. Okay. Please continue.

Speaker 1: Yeah. Yeah. Yeah. So at this point, to create an adversarial example, it's so much easier when you actually know the percentages that are returned. So, you know, especially with YOLO since there's multiple classes, we can say, hey. This is 90% of person. And then if we add, you know, a small patch, all of a sudden, it's 80% of person, 10% dog. And then, you know, from there, we can slowly try to get the percentages more in our favor. So, luckily, since I'd reverse engineered Wyze in the past, I knew that they dumped a lot of information into their logs, sometimes more than was necessary. And I also knew that the logs were all encrypted with a key that was the same across all devices. And the reason for that is they didn't want logs to necessarily be opened by a person. So when there's a crash, the logs hit save to SD card, and then you send that over to them. And then it's easiest if they have one key that they can just decrypt the logs. So it ended up just being an AES CBC. We double checked the encrypt file on the local file system. It was all the same. So at that point, we were able to take the encrypted log file, decrypt it, and see all of the logs from all the binaries. And we just looked for inference, or person, you know, other things like that. And we were actually really happy to see that in the logs, it was logging all the detections results with the percentages, which is awesome.

Speaker 2: Cool. So now you have a a way of measuring whether or not this patch is successful or not. You can see I'm getting a grade average here. Oh, it a 100% knows I'm a person. This got it down to 90. This got it down to 80. You

Speaker 1: got a road to narrow

Speaker 2: down, basically. Got it.

Speaker 1: Yeah. So at this point, since we could see that and we could see a diff few different files in, the edge AI folder, we decided to take a look back at the edge AI folder and see if there were any files we could mess with. And in there, there were two files, ai params dot ini and, excuse me, model params dot ini. And, ini is normally used for configuration. So we decided to look into those. And you could see that all the classes that the AI model detected were in there. So you have person, pet, package, and face, and vehicle. And then there were thresholds as well. So we saw that person was set to 50. And then what we did is we set the person detection to it had to be a 100% sure. And we started walking in front of the camera, and now we saw that the detection event was fired. It saw a person 95% confident, but we weren't getting an alert on our phones, which meant that after it does the detection, it makes sure to see if you are above a certain threshold before sending an alert. And even though person and face were both classes, if face was detected but person wasn't over the threshold, it would not send an alert to your phone. So that meant that we now knew that our criteria was to get person below that 50% threshold. So even though we could change the IMEI file, that's not something a regular attack can do since you have to actually be on the camera, but it let us know that that is our goal, which helps a lot. Then we reverted that back since now we knew what we needed to do, and we wanted to find some way to send a photo directly to the AI instead of having to walk in front of the, you know, camera. Because, when trying to create an adversarial patch, you're sending lots of photos. You're not always doing them in the physical space. You might, you know, try to put pixels here and there just to kinda get an idea of what can happen. And it wouldn't have been the best if, you know, we spent hours just holding up signs in different ways in front of the camera even though it would have been funny.

Speaker 2: Yeah. Pretty funny, though.

Speaker 1: Yeah. It would have been really, really funny. We do have some good ones. I mean, at some point, we dressed up, like, a package, that ended up actually working. Yeah.

Speaker 2: Wait. Like, you put, like, a cardboard box on, like, Metal Gear Solid style?

Speaker 1: Box, Arms through the, you know, the sides and head. Yeah. Through there as well. Detected package instead of person, which was really fun. I I could laugh when we, showed it at Defcon. It's pretty cool. But yeah. So anyways, while we did do that or fun later on, that wasn't really the best way to go about things. So we needed some way to have ourselves send an image to the AI instead of the camera send an image to the AI. And so, again, we did some reverse engineering, and we saw that there were two main binaries. So there was eye camera, which pretty much governs the entire camera. So that is all the logic. Main logic calls other things. And then there was this edge AI protocol blah blah blah file, like, a really long name, in the edge AI directory, which loaded up the model and did inference. And they talked to each other over a local socket on the camera. So what we did then is we created our own socket, patched the, really long name binary that actually runs the AI to go to our new socket instead of going to the originally created socket.

Speaker 2: Even the end

Speaker 1: Yeah. And then we wrote a Python script that opened a port on the camera, and we sent a photo to that port. Our Python script would add it to the socket, which would then trigger the camera or the AI, send it back. In the end, we had to do some patching, and, we had to hook into shared memory because the way that the cameras worked or the eye camera and the AI work is it wrote the image to shared memory, sent it over socket, and did some fun and then sent an alert over the socket. So we sent the alert over the socket after writing to shared memory. AI reads shared memory, does all it does, and then sends the result back over the socket.

Speaker 2: Starting something new isn't just hard. It can be downright terrifying. You put a lot of work into a thing. You're not entirely sure it's gonna work out. You're taking a huge leap of faith. I've started a few things. Now I know I was right for believing in, you know, the idea, the product, despite all of those fears and hesitations. But boy, does it sure help when you have a partner like Shopify on your side. Shopify is the commerce platform behind millions of businesses around the world and 10% of all e commerce in The US. From household names like, well, hacked podcasts merch to brands just getting started, you can get started with your own design studio with hundreds of ready to use templates. Shopify helps you build a beautiful online store that matches your brand style. Did I mention that that iconic purple shop pay button is used by millions of businesses around the world? I don't know why I wouldn't. I should. It's why Shopify has the best converting checkout on the planet. It also helps boost conversions, meaning less carts, sort of getting abandoned in the parking lot, and more sales for you. It's time to turn those what ifs into sign up for your $1 per month trial at shopify.com/hacked. Go to shopify.com/hacked. One more time, that's shopify.com/hacked.

Speaker 3: When you need to build up your team to handle the growing chaos at work, use Indeed sponsor jobs. It gives your job post the boost it needs to be seen and helps reach people with the right skills, certifications, and more. Spend less time searching and more time actually interviewing candidates who check all your boxes. Listeners of this show will get a $75 sponsored job credit at indeed.com/podcast.podcast. That's indeed.com/podcast. Terms and conditions apply. Need a hiring hero? This is a job for Indeed sponsored jobs.

Speaker 4: No one goes to Hank's for spreadsheets. They go for a darn good pizza. Lately though, the shop's been quiet, so Hank decides to bring back the $1 slice. He asks Copilot in Microsoft Excel to look at his sales and costs and help him see if he can afford it. Copilot shows Hank where the money's going and which little extras make the dollar slice work. Now Hank's has a line out the door. Hank makes the pizza. CoPilot handles the spreadsheets. Learn more at m365copilot.com/work.

Speaker 5: This Father's Day, do more with dad and spend less with low prices guaranteed at the Home Depot. Get him fired up with a new grill and accessories, like the next grill five burner for just $299 so you can spend more time together while he becomes the grill master he was always meant to be. Or build memories with savings on top brand power tools so you can tackle projects side by side. Give more and do more together this Father's Day with help from The Home Depot. Exclusions apply at homedepot.com/pricematch for details.

Speaker 2: So you have a mechanism by which to see how confident this AI is and what it's looking at, and you have a mechanism by which to feed an image into that AI that isn't just the camera so you don't have to dress up like a package.

Speaker 1: Exactly. Got it. Got it. And what's really great is now that we were hooked directly into the AI, we didn't have to look at the log files. We were actually getting the response straight back from the AI, which was really nice. Because to trigger a log file on the camera, you have to get the camera to crash, which we didn't wanna crash the camera every time we had an image. Yeah. Right. So now comes the really fun part. So since we knew that this was YOLO ish model, we had read there were a bunch of academic papers about attacking YOLO models, just because it's a more common model. People use it. And we'd also read some papers about, attack transferability between models that were pretty much the same. So what we did is, there's a bunch of tools out there. So we used dPatch and ART to generate a bunch of adversarial examples, as well as handcrafting a few of our own. So for a few of them, we, took photos of ourselves holding up a small poster board, and then we put images on there of the other glasses. So, you know, put a car or a dog. Got it. So we did a few of those, and then we did a few of the adversarial example ones. And we saw that about 20% of the adversarial example ones transferred from the Yolo hack to our camera, which is awesome because we didn't have to come up with a brand new technique to attack this AI model. So we were able to take academic attack techniques and apply them to a real production system with 20% is a really good rate. The issue with a lot of those is that they were generated for non physical, So if you're only staying in the virtual. So if I have a photo of myself, it's not a problem if I, you know, draw a smiley face up here and hack it. But I can't just walk around with a red smiley face just Right there. There.

Speaker 2: Yeah. Sure.

Speaker 1: So that's why we went more with the holding up the board. So our idea was, if we could override the classes that are there and make another class more confident, then we would be able to, you know, decrease person. And that actually worked really, really well. So, in our blog that we have released with about a 40 page blog with all the technical details, we have that inform you know, we have photos up there. So if we're holding up a photo of a car, it will detect car, things like that. So there are a few limitations, and I always try to make sure that I always list limitations, especially for things like this. Mhmm. While we were able to do it, you had to kinda hold it at a specific angle. So if you're walking, you might mess it up or something. So we were able to bypass the detections fairly easily. But it might you know, it works in that type of setting. It might not always work for, like, a porch pirate. So the takeaway is not like, oh, everyone can steal our packages. The takeaway is, like, hey. This can work against the production system. And, you know, somebody might come up with a better patch, like a t shirt or something that is able to be moved. But it still opens a really interesting door for a lot of research.

Speaker 2: Extremely. So at the at the end, the the the best way you found to compromise this had less to do with, like, a random cluster of pixels that causes the AI to wig out and more to do with getting the AI to think that that human being is actually a car or a package or a dog or any number of these discrete categories that it has been told you don't need to alert the owner in the event of dog. You only need to alert the owner in the event of person.

Speaker 1: And that is mainly just because we were working in a physical space. So if, you know, it was Right. Like, camera it was some server or something. Sending those other random pixel ones would have worked great because they were transferring really well. It's just not something you can have consistently in the physical space.

Speaker 2: Do we see this this type of AI model used in anything to larger scale than a consumer under a $100 camera? Like, is this type of on device AI being used in any other types of hardware where your research might be, relevant?

Speaker 1: Yeah. So, I mean, image classification models are being used everywhere. So consumer camera, maybe non consumer cameras. Say if you have the security system for a larger building, we see them being used in industry. So, for example, in the industrial setting, you'll have these classification models where they'll try to sort out errors in parts. So then, you know, maybe if you're there, you could modify, you know, the part a little bit, and it just, you know, doesn't get detected as an error, stuff like that. So it's really, really interesting of just what is the pen and pencil there, especially cars. You know, the newer cars also have classification models. So they are being used quite widely.

Speaker 2: Yeah. Cars was kind of what was sitting in the back of my head is, like, I obviously, all modern cars are just network connected computers. They're constantly reaching out to a ton of different things, but a lot of it would have to be local.

Speaker 1: Yeah. And you are you actually have seen things like this in the past. So a few years ago, Tesla, there was an issue where if you taped over the stop sign, it would, like, run through the stop sign, things like that. Or, you know, change the speed limit number by, like, putting a little bit of tape to make it look weird. A human's gonna say, oh, no. It's not. It's 75 instead of, you know, 15, but a car might.

Speaker 2: Yeah. The thing I found fascinating about this, like, between when we agreed to have this conversation and now read the whole report that y'all put out, was that potential for it kind of it changed how I understood what these models were actually seeing. As a human being, you put a single line through a stop sign. I'm inferring that's a stop sign. I can still see it's the red. I I know what that is. And it it shifted how I understood what these models were actually perceiving when they look at something that, guy wearing package outfit can trigger it all the way down to holding up a sign with the right depiction of a dog. It's like, oh, they're not really they don't have an internal model of the object they're looking at. They're looking for very specific patterns that are quite easy to disrupt.

Speaker 1: Exactly. So what we found was that one specifically. A lot of image classification, as you mentioned, looks for patterns. So we found we had a lot higher chance of success if, you know, we were disrupting the shoulder outline versus holding it over your chest. So it seemed that that was one of the patterns they were looking for for person detection, which is fun.

Speaker 2: Yeah. You just crack it. You're like, oh, it's the shoulders. It was all in the shoulders all along. That's what they're looking for. Interesting. Did you notice any did you get any other weird little insights into what's going on inside of these this, like, relatively commonly used model of a variation on it sounds like. But what else did you learn about how this thing thinks?

Speaker 1: I mean, so, like, with pets, like, the pointy ears, versus, you know, other things like that. So, I mean, just the shapes that a person might detect is just the person's gonna use a bit more logic rather than just detect a specific shape.

Speaker 2: So it is try it is kind of reproducing the way a human being infers from a limited amount of information to a point. It it it's it's sort of reproducing that, but it's it's still earlier days. So privacy is an obvious benefit of this. It's not constantly calling to the cloud. It can run local. It's probably a little more efficient. Like, I I I can see I can imagine some of the benefits of having these models running locally on these devices.

Speaker 1: Yeah. So, I mean, as somebody who has hacked one of these models now Yeah. I still think it's great that people are actually employing them. Cool. I am firmly in the belief that, yes, we should still keep using them because the benefit usually outweighs, you know, anything else. And we're still early days. So the fact that something was hacked, that's not a bad thing. That just means that people are out there doing the research, and then people are out there securing the models as well, which is great. But, you know, instead of having your photos sent off to a server, especially if you're using the camera inside

Speaker 2: your own

Speaker 1: house, it's a huge benefit. It's just companies, when starting to implement these Edge AIs, they should just think about what is the worst thing that could happen when the AI system fails. Right? So in this case, you know, you might not get a detection. But if you're still saving off all the kit, the video, which most people aren't just because that's a huge amount of storage, you know, you still have something there. Or you still have an alarm system. You know, that's still there as well. So it's that kinda trade off there, but it's still definitely worth having those new systems in place.

Speaker 2: You you still prefer it. You like the idea

Speaker 1: to try to

Speaker 2: handle as much of this as possible locally on the device.

Speaker 1: Mhmm.

Speaker 2: Interesting. I think it it's it's intuitive, and it certainly you're starting to see that a little bit more in the way that certain AI functionality you see in smartphones. Like, the way it's being marketed, that we're gonna handle as much of this locally as humanly possible. I was fascinated by that transition. It felt like AI, it's everywhere now. And then about nine minutes later, it was like, we're gonna do it locally on the device. Don't worry. Because the privacy implications of some of this stuff, for as more remarkable as it is are horrifying, like, that this piece of information that I've just fed into it is, like, oh, we're just gonna throw that off to a server somewhere, and you're gonna have no clue where it's going.

Speaker 1: Exactly.

Speaker 2: You gave a a talk about this, at Def Con. How did you find how did you find people responded to it? Like, is there a lot of excitement about this right now?

Speaker 1: Yeah. So this was actually probably one of my most well received talks. Cool. So as I mentioned, I had six talks that week. And, for this one, people so many people actually came to the talk that they ran out of seating space and people were standing around the seats to watch, which that's always a really good feeling.

Speaker 2: Nice.

Speaker 1: And then, you know, we were able to put a lot of jokes in of, you know, package ban and things like that. But for the rest of the conference, I had people coming up to me, talking to me about, you know, what we had been able to pull off. People were asking when we were releasing the blog post, when they could read the blog post. So there was a lot of buzz around it, which was really, really great.

Speaker 2: What surprised you most? Like, outside of really specific technical details, what about this whole process? And I guess this emerging world of embedded AI, like, what what shocked you?

Speaker 1: I was surprised by how so I come from a vulnerability research background. So my job in the past has been to find zero days. So go out, find the CVEs before they even become CVEs, and then report them, they become CVEs, and they get patched. And, so a lot of that skill set's reverse engineering, you know, digging down deep into decompilers and disassemblers and doing all that fun stuff. And a lot of people, especially my peers, they don't think those skill sets really transferred to the AI security side of things, just because, you know, so many people hear AI and they, you know, think something that they don't really understand. So one of the things that we're actually trying to show with our talk, which I think we were able to because people told us that that's what they understood from it was a lot of the old skill set still applies. And it's because a lot of the new AI security that's happening out there is a little bit too focused on just, you know, the AI model or an LLM. Meanwhile, there's all the supporting infrastructure around AI that might not always be considered. So that was a surprise happy surprise. I was hoping for it. But the fact that that skill set was able to transfer over so well, which is actually how we ended up with the, taking the chip off of the device, because that actually happened after we were able to get the model, backdoor or not backdoor, but, triggered.

Speaker 2: I don't think we're served by this stuff seeming completely inaccessible to people with even a high level of tech literacy. The idea they're like, oh, I I I'm simply a vulnerability researcher. I could never help to interface with like, engage with this stuff. That's not good. We I think it's useful when people can see that the existing this is just built on more of the same technology. The knowledge that you have is still relevant towards this.

Speaker 1: Exactly. And that's just something we've been trying to say because the more people that are trying to hack AI, the better. I mean, hack AI for good, of course. But the more people who are out there doing that security research, the better it's gonna be for everyone.

Speaker 2: Especially for stuff like this where it's consumer facing. Like, this is a camera in your home that you are relying on for potentially, like, personal safety issues. The idea that this feels like this, like, nebulous black box that no one could ever help to under it's like, that's not good. We need to be figuring out we need this ecosystem of hackers to be tearing these things apart, figuring out how they're vulnerable. Very cool. Casper, I appreciate you taking the time to sit down and chat with me about this. Yeah. It was a lot of fun.

Speaker 1: Yeah. Thanks for having me on here. It was really great to be here talking about this.

Speaker 6: This episode is brought to you by Nespresso. Being the best version of yourself is an everyday journey, and it begins in the morning by taking a moment to ground yourself. With the new Nespresso Vertuo Up coffee machine, morning routines become rituals. Just one gentle press. And coffee brews, unfolding into whatever you need today. Old or delicate, iced or hot, familiar or new. Press to explore. Every coffee, a new world. New virtual up. Shop now at nespresso.com.

Speaker 7: Take this as your sign to go. Just get out there and go. This summer at Best Western, get 1,000 bonus points and a chance to win 250,000 bonus points. Life's a trip. Make the most of it at bestwestern.com. No additional purchase necessary for sweeps. See bonus point t's and c's and sweeps rules for details.

Speaker 8: The right window treatments change everything. Your sleep, your privacy, the way every room looks and feels. At blinds.com, we've spent thirty years making it surprisingly simple to get exactly what your home needs. We've covered over 25,000,000 windows and have 50,005 star reviews to prove we deliver. Whether you DIY it or want a pro to handle everything from measure to install, we have you covered. Real design professionals, free samples, zero pressure. Right now, get up to 45% off-site wide, plus get a free professional measure at blinds.com. Rules and restrictions apply.