In response to growing concerns about the lack of security for the data provided to ChatGPT, OpenAI announced in late April it would allow users to turn off the chat history feature for its chatbot. But the response isn't enough, according to some critics.
The “history disabled” feature means that conversations marked as such will not be used to train OpenAI’s underlying models and will not be displayed in the history sidebar. They will still be stored on the company’s servers but will only be reviewed on an as-needed basis for abuse — and will be deleted after 30 days.
The decision was part of a wider initiative to try to convince enterprises that their data is safe with ChatGPT and that it would not be used without their consent to train AI models.
The San Francisco-based company also said that it was developing a new ChatGPT Business subscription for organizations that want better control over their data. The new subscription will follow the existing API’s data usage policies, which means that end users’ data will not be used to train models by default.
But will this be sufficient to reassure enterprise leaders to input their data into the model?
Who's Responsible for Protecting Data?
It's a step in the right direction, said Vance Reavie, CEO and founder of SaaS company Junction AI, but it does not resolve, nor address, a variety of issues.
Ultimately, it does not change the basic fundamentals, notably that the data has been submitted in the first place or that the back and forth between the user and the system is still informing the OpenAI service. While it does suggest that the company won’t use the data after the session is completed by the user, the activity itself still happened.
The fact that the "history disabled" is not yet set by default is also a concern. "Enterprise users will need to ensure their staff utilize [it],” Reavie said. “Unfortunately, as we all know all too well with these options, be it social media or otherwise, most users do not take the time and effort to select a more secure option, as they tend to be largely unaware of the need or simply don’t bother.”
Then again, training employees to use the technology according to company guidelines is not the responsibility of OpenAI either. It is up to organizations to inform their staff on how to use these tools and manage the risks involved with sharing data externally.
The problem is that many companies, Reavie said, do not have such policies in place or are not managing them properly. Putting the burden on OpenAI to regulate access may not be fair but, ultimately, each company will determine whether the changes to the tool are enough to warrant their peace of mind when sharing data through ChatGPT.
To do so, enterprise leaders may want to consider:
- Does OpenAI’s policies on this new setting meet internal risk management practice?
- Does the enterprise have a strategy for using OpenAI API service or another LLM/GPT approach that is private for proper enterprise knowledge management?
- What training program is in place for helping staff and management understand these tools and safe usage?
- What are the company policies for dealing with/mitigating an issue caused by the usage, be it a breach of privacy or getting inaccurate generative content back and using it?
“It’s easy to say OpenAI is causing data privacy problems, but the issue is down to the user ultimately,” Reavie said. “OpenAI is and will no doubt address these and improve. I think we will see a lot more options to address this, as they need to serve the vast enterprise market."
Related Article: Are You Giving Employees Guidelines on Generative AI Use? You Should Be
Data Ownership and Use in the AI Era
ChatGPT, large language models (LLMs) and generative AI can be powerful tools for organizations. But there is growing concern that these tools are a double-edged sword and that by embracing them, users give up some of the agency of their own data, said Daniel Geater, VP of AI delivery at Qualitest, a company that develops AI-driven engineering services.
He points out that when creating content with these tools, the ownership of that content becomes unclear. And for now, AI developers, policymakers, content creators and consumers are still trying to sort this all out.
Meanwhile, he said, the option to opt out of having your data used in training is a good start to addressing concerns over IP and content rights, but it fails to address other leading concerns surrounding user privacy. For instance, there are questions around what protections are provided against accidental disclosure of information.
Similarly, users may not realize that unless they disable their history, their data may be used in training, in which case the use of the data by OpenAI may be compliant but it is nonetheless unknown to the user. There are also users who do not consider the data they upload to be sensitive, while their employer or others referenced within the data may believe it is. How do we defend against these kinds of disclosures or detect them if they occur?
Beyond privacy at the user level, there are many other questions that will also need to be answered. If data is shared accidentally or maliciously and is used in training a generative AI, what steps can be taken to remove it from the trained model? What safeguards can be put in place against the use of that data to protect individuals and organizations from exploitation, loss of IP or reputational damage based on the use of their information in an LLM, generated content or model hallucinations?
“There needs to be more understanding of how proprietary LLMs are trained, what data is used and the original purpose under which it was given,” Geater said. “How do we deal with specific concerns like the right to use up-to-date data or the right to be forgotten in the cases of personal information?”
He added that while there are regulatory and legal changes in progress that address AI and its effect on our daily lives, they are in the early stages of development and will take time to enforce. AI, on the other hand, is moving at an incredibly rapid pace.
“We are making small steps to address privacy, security and IP rights surrounding large AI models, but there is still a long way to go with government, regulatory, industry and technological levels,” he said.
Related Article: Ready to Roll out Generative AI at Work? Use These Tips to Reduce Risk
Get to Know the Data-Handling Practices
When using these models, there is also a considerable issue with transparency and how data sets are being handled. To be able to understand if your data is safe, you first need to understand how it is handled and protected by the specific AI system or platform in question, said Oliver Goodwin, founder and CEO of the UK-based virtual media platform Synthesys.
In the case of GPT-4 and ChatGPT, the details of data safety would depend on the practices and policies implemented by OpenAI for these particular versions. However, he said, it is important to review the specific data-handling practices, terms of service and privacy policies of OpenAI to gain a comprehensive understanding of how your data is treated. This includes understanding aspects such as data storage, access controls, encryption and retention periods.
OpenAI's commitment to data safety and privacy should also be evaluated based on their record of accomplishment and their dedication to implementing best practices and industry standards. Transparency about their data-handling processes and regular updates on privacy measures can also contribute to building trust.
“Remember that ensuring data safety is a shared responsibility,” Goodwin said. “As a user, it's crucial to be mindful of the information you provide, understand the privacy settings and options available, and regularly review and manage your data within the platform's guidelines."