Tech

MIT to Train AI Models Against Harmful Generative Responses

by Inside Telecom Staff - April 24, 2024
Reading time: 2 min

Post Views: 468

Massachusetts Institute of Technology (MIT) researchers are working on a new method called curiosity-driven red teaming (CRT) to train AI models and Large Language Models (LLM) to avoid generating harmful or toxic responses.

This method leverages AI to generate a significant amount of potentially dangerous and harmful responses that could be asked of an AI chatbot, to identify and filter out unwanted content.

In a new paper published on February 29, MIT scientists said this new training method is taking AI to another level by teaching it not to create dangerous answers that are affecting users.

Traditionally, teams used to manually generate prompts that could trigger dangerous responses from AI systems, and this method is known as red teaming. However, by using said method, the teams always faced challenges when it came to predicting possible dangerous responses, which can lead to gaps when training an AI.

For example, when training large language models like ChatGPT, human operators usually produce questions that are most likely to give dangerous answers, with the aim of avoiding such type of content.

The goal behind the development of CRT is to enhance the AI training process by leveraging machine learning in a way that automatically generates a variety of harmful prompts that are even more dangerous than those created by human operators. This has led to more varied responses from AI during the training.

This method also tackles the weak potential of the human red teaming strategy, because humans developing the content manually may not have an idea how harmful the prompt might be. As a result, by encouraging AI to generate content that it hasn’t encountered before, the CRT works on strengthening the ability of the AI to produce non harmful content when interacting with users.

For instance, during the testing of the method on the LLAMA2 model, it generated 196 prompts of harmful content despite being customized by humans, which also shows that this innovative method is promising regarding its ability to control AI models content production.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Tech sections to stay informed and up-to-date with our daily articles.

How Security Teams Should Define “Success” for a Penetration Test

Strengthening Health, Safety and Sustainability to Support Vision 2030

The 2026 CIMP AutoEcosystems Expo—A World Class Automotive Industry Ecosystem Trade Platform in Asia

How AI-Driven Visual Technologies Influence Decision-Making in Physical Spaces

DWDM for AI at Scale: Building a DWDM Network for GPU Cluster Transport

Starlink’s Path to Gigabit Satellite Internet with Gen2, Gen3 Satellites

Fiber, Cable, 5G Vie to Power Next-Gen Industrial Connectivity

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

Pentagon Wants to Become an AI-First Military

Moltbook, Inside the Autonomous World of AI Agents for Social Media

Spotify, DDEX Team Up to Label AI-Generated Music

Microsoft Azure AI Infrastructure’s About Neutral Cloud Powering Its Rivals

Hackers Using AI to Automate the Very Tools Meant to Stop Them

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Nvidia-Powered Robot Bartender Pours into Hospitality

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

MIT to Train AI Models Against Harmful Generative Responses