In a 2-for-1 offer, OpenAI updates ChatGPT status and quietly launches GPTBot, a web crawling bot, has ignited discussions on data ethics and usage as website owners seek to prevent data scraping.
- OpenAI introduced a support page allowing website owners to block GPTBot’s access using their site’s robots.txt file.
- New ChatGPT features are being unveiled, seeking to bridge the gap with Bing Chat’s regular updates.
Because the ChatGPT status updates are never enough on their own, OpenAI quietly introduced its new web crawling bot, GPTBot, designed to scan websites for content to train its large language models (LLMs), triggering concerns from website owners and discussions about data usage ethics.
If the term “web crawler” invokes the image of a creepy spider, you are not very far off. *Shudder* A web crawler, also known as a spider or web spider, is a computer program that systematically browses the World Wide Web, typically to index websites for later retrieval by a search engine. So, they are practically the authors of the internet yellow pages for information.
OpenAI’s initiative, however, triggered a backlash from website owners and creators, who swiftly sought ways to prevent GPTBot from scraping their data. The company responded by introducing a support page for GPTBot, enabling website owners to block the bot’s access via a modification to their site’s robots.txt file. Several prominent web outlets, including The Verge, have proactively blocked GPTBot from scraping their content.
Despite these efforts, concerns loom over the effectiveness of mere GPTBot blocking in halting content integration into LLM training data. An OpenAI spokesperson said in an email that they “periodically collect public data from the internet which may be used to improve the capabilities, accuracy, and safety of future models.” They also go on to say that “Web pages are filtered to remove sources that have paywalls, are known to gather personally identifiable information (PII) or have text that violates our policies.”
Accompanying the launch, the AI company unveiled a grant and partnership with New York University’s Arthur L. Carter Journalism Institute, which aims to foster ethical and responsible AI utilization in journalism.
This launch has forced the masses to question web scaping fairness, legality, and consent. While web scraping publicly accessible data was recently upheld as legal by the U.S. Ninth Circuit of Appeals, it has faced criticism, leading to lawsuits against OpenAI and other technology companies.
And now, for what you were waiting for: the ChatGPT status update. The company is also debuting new ChatGPT features this week. It is trying to catch up with Bing Chat which has been receiving regular updates. Logan Kilpatrick, a member of OpenAI’s developer relations team, unveiled a list of upcoming ChatGPT enhancements on the horizon. These additions promise to streamline interactions and improve output quality. Among the slated features are example prompts, suggested replies, default integration with GPT-4, the ability to upload multiple files into the Code Interpreter for beta users, persistent login sessions, and keyboard shortcuts.
Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.