This article is part of a series of written products inspired by discussions from the R Street Institute’s Cybersecurity and Artificial Intelligence Working Group sessions. Additional insights and perspectives from this series are accessible here.

To harness artificial intelligence (AI’s) potential responsibly, we must uphold a commitment to shared core values like transparency, accountability, safety, trust, reliability, and predictability, while balancing the imperative of AI security, the pressure for early market entry, and the desire for continued U.S. leadership in technological innovation. With a deepened understanding of the AI opportunity and threat landscapes, policymakers can contribute significantly to the promotion of these values by crafting flexible solutions, such as frameworks that accommodate varying priorities and needs between industries.

While AI holds immense promise, particularly in cybersecurity applications, to harness its potential, we must understand its current benefits, anticipate its advancement, comprehend the risks it can pose (See Part 1: Examining the Landscape of AI Risks), and address threats that threaten its security. The AI security threat landscape centers around three primary components of AI systems: the integrity of the underlying data; the resilience of its infrastructure; and the challenges posed by unintentional exploits.

1. Data Integrity and Lifecycle Management
Data integrity refers to the process of ensuring that the accuracy, completeness, and quality of data is maintained throughout its lifecycle. The data lifecycle encompasses a series of different phases, such as generation, collection, processing, storage, management, analysis, visualization, and interpretation. Because machine learning (ML) models rely on high-quality data, prioritizing data integrity and lifecycle management against corruption and misuse are critical elements of AI security.

One of the most significant AI security risks is data poisoning—the contamination of the data used to develop and deploy AI and ML systems. AI systems, particularly large-language models (LLMs), adhere to a two-phase development approach: pre-training and fine-tuning. Attackers can poison data in both phases. During the pre-training phase, attackers can attempt to poison the underlying datasets used to train the AI model by injecting biased or misleading information, adding inaccurate data, or subtly changing the data. In contrast, attackers may attempt to poison the pre-trained AI models during the fine-tuning phase by introducing subtle changes to the model’s parameters, altering its decision-making processes, or injecting biased or inaccurate data to corrupt and skew the model’s outputs. While data poisoning attacks in both developmental phases can cause widespread harm to the AI model’s integrity, remediating attacks during the pre-training phase is more difficult due to the limited options for a dataset.

The ramifications of data poisoning are far-reaching. When AI models are trained on poisoned data, they can produce unreliable, biased results, eroding trustworthiness. Organizations seeking to mitigate the effects of data poisoning attacks often face astronomical costs, which can potentially reach hundreds of millions of dollars. Moreover, fixing the damage caused by data poisoning attacks requires extensive time and effort, potentially forcing the shutdown of AI-powered tools and disrupting the platform’s operations.

Data poisoning attacks are a challenging AI security risk to address due to their relative accessibility and the significant impact and damage that they can cause to AI models and systems. The development of tools designed to help execute data poisoning attacks further exacerbates these challenges and raises ethical concerns.

2. AI Infrastructure
AI systems rely on a three-tiered infrastructure stack. The infrastructure layer is foundational to the AI system, consisting of hardware, software, and cloud-computing services, all of which are essential for building and training AI models. Security risks to this layer include traditional cybersecurity threats, such as data breaches, in which attackers aim to exploit vulnerabilities to gain unauthorized access to sensitive data.

The second tier, the model layer, houses three key types of AI models: General AI, Specific AI, and Hyperlocal AI. General AI models replicate a broad range of human cognitive abilities, such as decision-making and recognizing patterns or relationships from various data sources. Specific AI models are trained on tailored datasets for specific tasks like debugging a code snippet or brainstorming a lesson plan. Finally, Hyperlocal AI models are optimized to provide highly relevant and context-aware solutions because they are trained on data specific to a particular location, community, or industry. For instance, Hyperlocal AI models can be capable of brainstorming lesson plans tailored for students of different grade levels or debugging code to comply with industry-specific frameworks. The model layer faces security threats like model stealing, in which attackers attempt to reverse existing AI models for malicious purposes. Furthermore, the model layer is vulnerable to model inversion, which occurs when attackers aim to extract sensitive data from model outputs. Additionally, traditional security risks, such as those related to data storage, network access, and software vulnerabilities, persist within this layer.

At the top of the AI infrastructure stack is the applications layer, which facilitates interactions between AI systems and human end-users. This layer includes various end-to-end workflow tools and user-facing applications. Similar to the model layer, the applications layer faces the risk of traditional cybersecurity threats and model stealing. Distinct from the infrastructure and model layers, however, the applications layer can be a target of prompt injection (both direct and indirect), a security challenge in which attackers manipulate input prompts to coerce AI systems into generating undesirable outputs or engaging in harmful behavior.

As the AI security risk landscape continues to evolve, we must be ready to address the threats, both traditional and novel, against each tier of the AI infrastructure stack.

3. Unintentional AI System Exploits
Unintentional exploits can occur during the development or use of AI systems, leading to unintended consequences and harms. One example is reward hacking, where AI models, trained to maximize rewards, may manipulate the reward system to achieve high rewards without performing the intended tasks. Consider gaming as an example. Reward hacking may involve the AI finding unintended shortcuts to achieve high scores or win the game without completing the game’s objectives or following the game’s rules as intended. Reward hacking can also lead to AI models behaving in ways that are not only harmful, but also difficult to predict or control. Take, for instance, an AI trading system that may manipulate market conditions to maximize short-term profit, potentially causing market instability and uncertainty. Even though this trading AI system maximized the reward system—profit—it also failed to perform the intended task, which was maintaining market stability and ensuring the long-term sustainability of financial markets.

Prompt brittleness, on the other hand, refers to the sensitivity of AI systems to subtle changes in input prompts. Minor alterations in phrasing or context can lead to unexpected and potentially harmful responses. For example, a user aiming to conduct research on how cyberattackers hack into enterprise organizations for educational purposes may encounter prompt brittleness if the AI system provides instructions on conducting the attack instead, showcasing the system’s sensitivity to slight variations in input prompts and its potential to generate unintended and harmful responses. Addressing prompt brittleness is important to ensure that AI systems consistently produce reliable and safe responses to user inputs.

Another example of an unintentional AI system exploit is metaprompt override, which occurs when AI systems override predefined prompts. This results in outputs that may not align with user or developer intentions. For example, a language translation AI disregarding the requested translation context and instead providing a translation that aligns with a different language or intent might result in miscommunication. This exploit underscores the importance of robust prompt design and the need to implement guardrails to prevent AI systems from deviating from intended responses.

The Way Forward
For all its promise, AI resurfaces some old risks and gives birth to novel ones. Given the pace of technological change and the scale and scope of AI-associated risks, stakeholders must uphold a commitment to shared core values like transparency, accountability, safety, trust, reliability, and predictability. From a foundation of shared values, solutions can be cross-cutting, interdisciplinary, and lasting. Policymakers and other stakeholders can enhance these values by formulating safety and security solutions that accommodate the varying priorities and needs of governments and industries across AI’s design, development, and deployment lifecycle. Industry can work collaboratively with governments to ensure the safety and security of their products. The public should also be empowered to use AI and shape its development.

Understanding the security risks to AI systems, in addition to AI’s risks, applications, and benefits, positions industry stakeholders, policymakers, and the public to have the knowledge base necessary for navigating AI regulation and governance efforts now and in the future. Armed with this knowledge, we can distinguish between exaggerated fears and genuine risks, enabling grounded, constructive dialogues that assess risks, prioritize risk mitigation strategies, and foster well-informed decision-making. All actions in the coming months and years will not only shape the future of AI, but also lay the groundwork for how we approach the governance of other emerging technologies. To establish this groundwork effectively, we must ensure that core values are woven into the fabric of AI’s design, development, and deployment.