Apple Intelligence can be hijacked using prompt injection

Apple Intelligence, the AI built into iPhones, Macs, and other Apple devices, was recently hacked by security researchers. Researchers used a technique known as “prompt injection” to force the same AI that is on your Mac into taking undesirable actions. This technique, they say, could be used to cause even greater damage.

In this report, we talk to experts to understand why having on-device AI on your Mac is riskier than AI that lives in the cloud, how this prompt injection works against Apple Intelligence, and how you can stay safe.

Keep your Mac safe from hacking and malware

A Mac-native antivirus like Moonlock will block any malware trying to get in, filter out suspicious traffic, and teach you the basics of cyber safety.

try 7 days free

RSA researchers say 200 million Apple devices are vulnerable to the Apple Intelligence hack

Recently, the Register reported that researchers successfully tested a prompt injection attack against Apple Intelligence. The attack allows attacker-controlled results on your device. Researchers estimated that there are 200 million Apple devices that have Apple Intelligence and up to 1 million apps that use it.

If you have a Mac with an M1 or later model, an iPhone 15 Pro and later, an iPad model with A17 Pro, or an Apple Vision Pro, you have Apple Intelligence, and this attack could be used against you. As a side note, Apple Intelligence is built into Mail, Messages, Notes, Photos, and Safari, and it’s accessible to third-party developers via an API.

The prompt injection attack against Apple Intelligence was developed by RSA researchers. They combined two techniques to bypass Apple’s input and output filters and the safety guardrails on Apple Intelligence’s local model (more on this in the section below). They tested the techniques against 100 random prompts and succeeded 76% of the time.

How Apple Intelligence was manipulated

The 2 techniques used in this prompt injection attack are “Neural Exec” and Unicode.

A diagram on how researchers combined Unicode with Neural Exec to carry out this prompt injection attack against Apple Intelligence. — RSA researchers shared a diagram on how they combined Unicode with Neural Exec to carry out this prompt injection attack against Apple Intelligence. Image: Screenshot, Moonlock.

Neural Exec, unlike other types of prompt injection techniques, does not rely on ethical human hackers trying to trick the AI into generating a specific answer. Instead, it relies on sophisticated machine learning processes that calculate the most likely prompt that would achieve that goal.

On the other hand, Unicode, which is used to type languages that read right-to-left (like Arabic or Hebrew), allowed researchers to hide (encode) malicious/offensive English-language output text by writing it backward and forcing Apple Intelligence to read it from left to right.

This allowed the researchers to manipulate Apple Intelligence into a specific outcome. Apple’s on-device AI model used in Apple Intelligence makes this attack easier, researchers said.

Apple’s on-device AI was hacked. Wait, wasn’t it supposed to be safer?

The most popular AI apps out there, like ChatGPT, run on the cloud and not on your device. When Apple presented Apple Intelligence, the company boasted about its enhanced privacy and security. Apple then said that AI on your Mac or iPhone processes all data at the hardware level on your device and only connects to the cloud when large processing power is required.

On-device, or hardware-level software, means exactly that: software that is located on your device and is not on the cloud. Generally speaking, hardware-level software is considered safer because it provides isolation from third-party risks. If, for example, the cloud server of ChatGPT is hacked, your chats would be at risk, but if the chats are only on your Mac, then they are isolated. However, researchers now say that having Apple Intelligence on your Apple device is a risk as well. We talked to experts to understand why.

“When a model runs locally, it often has direct access to device resources such as files, system prompts, or application data,” Juan Mathews Rebello Santos, cybersecurity researcher, ethical hacker, and owner of the Brazilian National Bank of Vulnerabilities (BNVD) told us.

“If an attacker manages to manipulate the input instructions—through a prompt injection technique—they may influence how the AI interprets commands or processes information,” said Santos.

A different type of weakness for on-device AI

The key point for readers is that “on-device AI” improves privacy in many situations, but it does not remove the need for strong security boundaries and validation of AI inputs, Santos explained.

“The assumption was always that on-device means safer because your data never leaves your phone, nobody’s eavesdropping on a cloud call. And that part’s true—from a data privacy standpoint, on-device is better,” Omair Manzoor, Founder and CEO of ioSENTRIX, a cybersecurity and penetration testing company, told us.

“What the researchers showed is that on-device models have a different weakness,” said Mazoor. “When Apple Intelligence runs locally, it’s a smaller, more constrained model than what you’d get from a cloud API.”

What the researchers showed is that on-device models have a different weakness. When Apple Intelligence runs locally, it’s a smaller, more constrained model than what you’d get from a cloud API.
Omair Manzoor, Founder and CEO of ioSENTRIX

A compact on-device model is more brittle; it’s optimized for speed and size, and the tradeoff is that it’s easier to push it into doing something it shouldn’t, Manzoor explained.

Another element that makes on-device hardware more vulnerable to attacks is that it is widely available to anyone. For example, 200 million devices have Apple Intelligence. Cybercriminals can test their attacks against the technology over and over again, unrestrictedly, and learn how to raise fewer red flags. In contrast, testing adversarial techniques on a cloud platform like OpenAI’s ChatGPT would sound the alarm and likely block attackers’ accounts or traffic.

Neural Exec is a dangerous technique in the AI era. Here’s why.

Neural Exec is dangerous because it’s a machine hacking a machine. “The use of automated systems like Neural Exec to generate prompt injections shows how AI-assisted attacks are evolving,” said Santos. “In the past, crafting these kinds of prompts required significant manual experimentation by security researchers or attackers.”

A Neural Exe prompt injection found in the paper linked above. — A Neural Exec prompt injection was found in the paper by Pasquini et.al., linked above. Image: Screenshot, Moonlock.

Basically, instead of having a team or individual testing which prompts an AI model, Neural Exec calculates the best prompt injection without interacting with the model, then prompts it to jailbreak it. As the name implies, Neural Exec can learn. This means that over time, it gets better at what it does.

“With machine learning systems capable of generating and testing variations automatically, the barrier to experimentation may become lower over time,” said Santos.

How accessible is Neural Exec? Can any cybercriminal use it against you?

Currently, as Mazoor explained, Neural Exec isn’t a point-and-click tool anyone can download off a forum. It requires an understanding of machine learning optimization and adversarial strings; these are considered research-grade skills. However, despite its low accessibility, the technique is still deeply disturbing.

“Researchers showed that the adversarial triggers are universal,” said Manzoor, “meaning once someone computes one that works, it can be reused with arbitrary payloads without starting from scratch.”

This strategy of using a known successful technique in Neural Exec is known as attack transferability.

“Give it 6 to 12 months, and I’d expect this to be packaged into toolkits that lower the bar significantly,” said Manzoor.

A prompt injection attack against your Apple Intelligence can lead to significant damage

We asked experts what damage users can expect if their Apple Intelligence is hijacked using the technique described in the RSA report.

“Apple Intelligence hooks directly into apps through system APIs—mail, messages, calendar, notes, third-party apps,” said Manzoor.

At that level, an attack using this technique could:

Manipulate email summaries
Make phishing attempts go unflagged
Alter meeting invites and message summaries
Cause the assistant to take actions in apps on your behalf based on attacker-controlled instructions
Compromise your data
Be used to launch additional cyberattacks and invoke malware payloads

In summary, controlling your AI outputs gives a threat actor the opportunity to influence your behavior (social engineering), access your data, and launch other cyberattacks, including malware infections.

How to keep safe from Apple Intelligence prompt injection attacks

The following are some tips, suggestions, and advice to deal with prompt injection attacks that target your built-in Apple Intelligence. Cybercriminals are only just warming up their engines when it comes to AI attacks, so this advice can help strengthen your security today and prepare you for what’s coming.

Tips to mitigate Apple Intelligence attack risks

When dealing with AI that has access to many of your apps, it’s a good idea to limit that access. Additionally, not only do AI models make mistakes, but they can be tricked into giving malicious responses. Therefore, be a bit suspicious about anything an AI says or asks you to do.

If an AI is urging you to take a specific action, know that this is not normal behavior. Also, if you get security system warning notifications while using AI, do not ignore them.

Finally, because AI prompt injection attacks are a rapidly evolving novelty, keeping up with breaking cybersecurity news can help you stay ahead of the curve.

Get Moonlock. It scans for threats in real-time and checks everything you interact with to keep your Mac safe.

While the Moonlock antivirus will not check your Apple Intelligence chats, it will run silently in the background, scanning for threats in everything you interact with. This includes your files, apps, and even Mail attachments at all times, so you know for sure that your computer is safe.

Screenshot of Moonlock, a Mac security app: The Dashboard screen.

If the Moonlock app detects that something is wrong, the app will alert you to the issue and move the threat to Quarantine, where it will be fully isolated. You can continue your online work, and check Quarantine on your own time to learn more about the threats Moonlock detected and remove them from your Mac.

You can test drive Moonlock for free for a week and see how it works for you.

Final thoughts

Prompt injection attacks against AI aren’t the most popular trend in the threat landscape. However, they do exist, and experts expect them to become more widespread. Understanding the techniques described in this report and learning how to keep safe is a solid first step to level up your Apple Intelligence and AI security posture.

This is an independent publication, and it has not been authorized, sponsored, or otherwise approved by Apple Inc. Mac, macOS, and Apple Intelligence are trademarks of Apple Inc.

Ray Fernandez

Ray has been covering tech and cybersecurity for over 15 years. His work has appeared on TechRepublic, VentureBeat, Forbes, Entrepreneur, and the Microsoft Blog, among others.