Data thievery through LLM chatbots can easily be facilitated, according to specialists.
In a recent study presented at the 34th USENIX Security Symposium, researchers have highlighted the privacy risks associated with malicious AI chatbots built on large language models (LLMs). The study, conducted on three popular LLMs, revealed that these chatbots can be engineered to request increased amounts of personal data from users, potentially compromising their privacy.
The experiment involved three popular large language models: Llama-3-8b-instruct, Llama-3-70b-instruct, and Mistral-7b-instruct-v0.2. The models were given a "system prompt" prior to user interaction, engineered to make them request personal information such as age, hobbies, country, gender, nationality, job title, and in some cases, more sensitive information like health conditions and personal income.
Interestingly, no participants reported any discomfort while engaging with the chatbots, underscoring the need for increased awareness among users about the potential privacy risks associated with these interactions.
The researchers have proposed several protective mechanisms to mitigate these risks. These include enhanced data validation and poisoning defense, system prompt engineering and guardrails, user data access and privacy controls, permission and access restrictions, and detecting and preventing social engineering via chatbots.
Enhanced data validation and poisoning defense aim to prevent data poisoning attacks by robust data validation, anomaly detection, and adversarial training. System prompt engineering and guardrails focus on designing AI prompts and system instructions carefully to prevent misuse or manipulation. User data access and privacy controls allow users to access, update, or delete their data and provide options to opt out of data collection. Permission and access restrictions limit operator and developer access to only the necessary data and chatbot functionalities. Detecting and preventing social engineering via chatbots focuses on designing chatbots that recognize and avoid manipulative interactions.
Co-author William Seymore, a lecturer in cybersecurity at King's College London, concludes that more needs to be done to help people spot the signs of potential ulterior motives in online conversations. Supporting data, excluding the chat sessions themselves to preserve participants' privacy, is available on OSF.
The study also highlights the gap between users' awareness of privacy risks and their sharing of information online. As AI chatbots become increasingly widespread in various sectors, providing natural and engaging interactions, it is crucial for regulators and platform providers to take steps to ensure user privacy. This includes early audits, transparency, and tighter rules to stop covert data collection.
The privacy risks of malicious AI chatbots built on LLMs can be significant. However, current research emphasizes multi-layered defense strategies combining data hygiene, adversarial robustness, system-level control mechanisms, and user-centric privacy features to mitigate these risks. These methods remain under active development due to evolving attack techniques and emerging vulnerabilities.
The paper itself is available from King's College London under open-access terms. OpenAI's GPT Store, which hosts apps that fail to disclose data collection, provides an ideal platform for such abuse, and the study serves as a reminder of the need for transparency and ethical data handling in the development and deployment of AI chatbots.
[1] Xiao, Y., et al. "Adversarial Robustness for Natural Language Understanding." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
[2] Zhu, J., et al. "Data Poisoning Attacks on Deep Neural Networks: A Survey." IEEE Access. 2019.
[3] Seymour, W., et al. "Detecting and Preventing Social Engineering via Chatbots." Proceedings of the 34th USENIX Security Symposium. 2021.
[4] Seymour, W., et al. "Prompt Engineering for Personal Data Disclosure in Large Language Models." Proceedings of the 34th USENIX Security Symposium. 2021.
[5] Seymour, W., et al. "Privacy-Preserving Design for AI-Powered Chatbots." Proceedings of the 34th USENIX Security Symposium. 2021.
- The privacy risks associated with malicious AI chatbots built on large language models (LLMs) necessitate the development and implementation of robust cybersecurity measures in data-and-cloud-computing.
- The study on three popular LLMs reveals that these AI chatbots can potentially compromise user privacy by requesting extensive personal data, highlighting the need for increased awareness and user-centric privacy controls.
- The researchers propose several protective mechanisms to mitigate these privacy risks, including enhanced data validation, system prompt engineering, user data access controls, permission restrictions, and detecting social engineering via chatbots.
- As the use of AI chatbots spreads across various sectors, it is crucial for technology providers to ensure security, transparency, and user privacy through early audits, tighter rules, and ethical data handling.