Tech

Meta’s New Crawler Gathers Data Silently

by Inside Telecom Staff - August 21, 2024
Reading time: 2 min

Post Views: 570

In July, Meta launched a web crawler called Meta External Agent to scrape public data check for AI training.

According to three companies specialized in monitoring web scraping, this tool collects information publicly displayed on websites, such as news articles and discussions, to train Meta’s AI models.

From AI Tools to Scraping Tools

Meta External Agent crawls websites for data to be used in training AI systems, just like OpenAI’s GPTBot.

According to Dark Visitors, a firm that helps block scraper bots, Meta’s new tool explicitly targets the gathering of data for AI training.

This has been confirmed by two other entities that monitor web scrapers.

Late in July, Facebook’s parent company updated its developer website to disclose Meta’s External Agent, without making any public data check announcement.

A spokesperson from Meta highlighted that this is not the first time the company has used such a tool, adding that Meta External Hit is another tool used for different purposes, such as generating link reviews.

“Like other companies, we train our generative AI models on content that is publicly available online,” the spokesperson added.

“We recently updated our guidance on how publishers can exclude their domains from being crawled by Meta’s AI-related bots.”

Always a Price to Pay

Scraping web data for AI training has long been controversial, pushing artists, writers, and other content creators to file lawsuits against the AI companies for using their work without their prior consent.

Recently, in an effort to avoid such actions, some AI companies, such as Microsoft-backed OpenAI and Bezo and Nvidia-backed Perplexity, have made noteworthy deals with other companies to pay public data check for content.

Back in April, OpenAI made a deal with the Financial Times to allow its data trainers to use the archives.

As for Perplexity, in July, it announced a revenue sharing agreement with major media companies, including Time and Fortune magazine.

Web scrapers like the Meta External Agent help collect massive volumes of the data required for large language models (LLMs), with one of its most advanced models being Llama. This AI model operates the Meta AI chatbot integrated across Meta’s platforms. While the data sources used for the training of the latest Llama version is unknown, earlier versions used a scraping tool called Common Crawl.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Tech sections to stay informed and up-to-date with our daily articles.

Tags: Artificial Intelligence China Inside Telecom Meta News Technology U.S.

MontyPay Securing the Future of Payments

MontyPay’s Dashboard Is Changing the Way Businesses Manage Payments

Foresee Solutions Inks Deal with China’s Yonyou to Bring Transformative ERP Solutions to Middle East

Monty eSIM Takes Home Comms Council UK's Best Multinational Service

The Future of Intelligence

Starlink’s Path to Gigabit Satellite Internet with Gen2, Gen3 Satellites

Fiber, Cable, 5G Vie to Power Next-Gen Industrial Connectivity

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

Nike Just Built Dephying Robotic Shoes

PayPal Plays for Power Through ChatGPT Payments Deal

Microsoft AI Chief Has a Problem with ChatGPT Erotica Feature

How AI Shapes and Shakes Children’s Minds

Venture Capital Flows to Vertical AI Startups, Broad Models Hit Limit

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

Meta’s New Crawler Gathers Data Silently

From AI Tools to Scraping Tools

Always a Price to Pay