A spider is shown attached to its web.
News

OpenAI's GPTBot: Charting the Web, Chasing the Future

3 minute read
Jennifer Torres avatar
By
SAVED
OpenAI has launched GPTBot, a web crawler aimed at enhancing the future of online content.

The Gist

  • Tech triumph. OpenAI introduces GPTBot to supercharge AI systems.
  • Ethical enigma. Debates rise over data usage and copyright concerns.
  • Web wonder. Potential to expand ChatGPT knowledge past September 2021.

The ever innovative minds at OpenAI have just unveiled GPTBot, a web crawler that could give a significant boost to the performance of future AI models, including GPT-4 and the much-anticipated GPT-5.

ChatGPT has some work to do in order to move past its current knowledge cut-off of September 2021. If you think of the Internet as a vast library, GPTBot is there to scour the collection, looking for pieces of information to pick up in order to help AI systems become better with more accurate, relevant and timely responses.

This latest news comes on the heels of OpenAI's debut of enhanced conversational capabilities, including prompt examples and suggested replies, just last week.

Related Article: What's Behind ChatGPT's Latest User Experience Update?

How Does OpenAI’s GPTBot Work?

Basically, GPTBot is on a mission to find information on the internet that can make AI systems smarter, more capable and safer.

The data it collects can help these AI systems in several ways.

  • Accuracy. The data can help the AI systems give more correct answers or predictions.
  • Capabilities. The data can help the AI systems learn to do more things.
  • Safety. The data can help the AI systems understand better how to avoid doing things that could cause problems or harm.

Related Article: Can Tools Like ChatGPT Help Personalize Marketing Strategies?

Is There a Downside to OpenAI’s New GPTBot?

It appears that OpenAI is inching its way back into a better connection with the Web after pulling its browser plugin over concerns related to paywall access. With GPTBot, the aim is to ensure that ChatGPT doesn't provide information that's restricted behind paywalls. OpenAI claims GPTBot is very careful about which web pages it looks at, avoiding any sites that require payment to access (paywalls), as well as sites that collect personal information about people (like names, addresses or phone numbers). It is also trained to sidestep sites with content that bump up against OpenAI's rules.

But this latest announcement from OpenAI has caused some debate within web forums about whether it's ethical or legal to use information from the web to train AI systems that can make money — some think if OpenAI is making money from this, they should share the profits. Others worry that copyrighted material (like text, images, videos, music) could be used without providing proper credit.

On the other hand, many think it's fine for OpenAI to use public information from the web — just like anyone else can.

How to Opt-in to OpenAI’s GPTBot

Web crawlers typically scan and index publicly available web pages across the internet and unless you opt-out, GPTBot could access your site automatically. However, if a website owner only wants GPTBot to look at certain parts of their site, they can specify which parts are okay and which are not, like this.

User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/

All visits from GPTBot will come from specific IP addresses to let website owners identify them.

Once again, if you imagine the internet as a massive library, a "user agent token" explores this library and its "user-agent string" provides a full description of the token.

In this case, it looks like this.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

How to Opt out of OpenAI’s GPTBot

If a website owner doesn't want GPTBot on their site, they can use something called a "robots.txt" file to tell GPTBot to stay away. System administrators can elect to opt out of letting GPTBot look at their site by adding the GPTBot to the site’s robots.txt, like this.

User-agent: GPTBot

Disallow: /

Learning Opportunities

GPTBot Dives In: Where Tech Marvel Meets Digital Dilemma

OpenAI's unveiling of GPTBot has raised questions about who owns the information on the internet, what is fair use of this information, and what incentives there are for people who create content for the web. It also underscores the rapid advancements in AI technology.

As this tool dives deep into the internet's vast reservoir of knowledge, it holds the promise of supercharging AI systems, making them more knowledgeable, accurate and safe. However, with progress comes scrutiny. Ethical debates about data usage, copyright concerns and questions of profit-sharing illuminate the delicate balance between innovation and responsibility. As the digital age propels us forward, these discussions become paramount, reminding us that in our quest for technological evolution, transparency and ethics should never be left behind.

About the Author
Jennifer Torres

Jennifer Torres, is a Florida-based journalist with more than two decades of experience covering a wide range of topics. Jennifer formerly served as a staff reporter at CMSWire, where she tackled subjects ranging from artificial intelligence and customer service & support to customer experience and user experience design. Jennifer is also the esteemed author of a collection of 10 mystery and suspense novels, and has formerly held the position of marketing officer at the prestigious Florida Institute of Technology. Connect with Jennifer Torres:

Featured Research