A bird opens its talons as it descends into water.
Editorial

Making Self-Service Generative AI Data Safer

4 minute read
Myles Suer avatar
By
SAVED
How can companies protect data in gen AI applications? Understand vector databases and the chat interface.

Without question, generative AI is big. While attending Constellation Research’s “Connected Enterprise,” experts who I talked to compared gen AI to the internet for its business impact. But at the same time, the business risks are also clear. Regulations and standards are coming from the European Union, National Institute of Standards and Technology (NIST) and Open Worldwide Application Security Project (OWASP), and the White House recently released its AI Executive Order.

In this piece, I explain how generative AI will transform self-service business intelligence (BI), how generative AI works, what are the relevant business risks and what approaches can be taken to make gen AI safer to deploy.

What is Self-Service BI?

Self-service BI has been driven by organizations’ needing to increase their agility in supporting a growing hybrid workforce. As a goal, organizations needed to reduce decision-making cycle times. To deliver this, they enabled employees to make more decisions whenever they were needed. This has demanded greater data accessibility and quick and frictionless access to the right data at the right time.

As a goal, self-service BI aimed to build a culture where people seek to understand data and its context. The CIOs claim their goal has been to reduce the cycle time for decision-making. Secondarily, it has been to make data and analytics teams less of a bottleneck for their organizations responding at the speed of today’s business. Miami University CIO David Seidl suggests, in a gen AI chat at X, “As a user, a highly usable data portal or access tool including data discovery and contextualization is critical for more casual, non-power users. I think that's the real destination of self-service BI in the long-term, with a stop at power users along the way.”

Generative AI Transforms Self-Service BI

To many respects, generative AI represents the next wave of self-service BI. Instead of searching for data and then asking for access, it provides answers. Users request data and receive data back at prompt speed. This drives data culture and accelerates business agility. But to allow the keys to the kingdom be accessible in this manner requires businesses deal with several concrete business risks, including those in the image below:

A diagram of a company's company Description automatically generated

In a recent generative AI security webinar hosted by the Association of Enterprise Architects (AEA), I engaged with hundreds of enterprise architects. I asked them about their apprehensions regarding generative AI. The predominant answer was all of these followed by privacy, security, transparency and third-party risk.

A screenshot of a computer Description automatically generated

So let’s look at how risks occur in a typical gen AI conversation flow as shown in the image above. First, there is a risk of abuse. This is because foundational models can be used for non-business purposes or by malicious actors. Second, the outputs of gen AI applications can be inconsistent and pose security and privacy risks. And finally, there is the risk of inappropriate release of sensitive data. Sensitive data, clearly, includes personally identifiable information (PII) and corporate IP. The risk is that this sensitive data can be unintentionally provided to unauthorized users.

Related Article: 5 Generative AI Issues in the Digital Workplace

How Have We Protected Data, and Why Gen AI Needs Something New?

Historical databases are organized by rows and columns. To protect data, one scans a column name for sensitive metadata.

A screenshot of a graph Description automatically generated

Once discovered, policies and protection methods are applied to PII or sensitive data detected. For example, in the image above, you could mask or encrypt chicken brands. This could be accomplished at the database level or through a separate data security software.

Gen AI, however, is fundamentally different in how data is stored. Chicken brands are compared using math, as shown in the image below, and those products are stored.

A diagram of a chicken Description automatically generated
Image by The Open Group.

So how does this work? Values are related to each other using a vector showing magnitude and direction. Next, gen AI large language models (LLMs) compare angles between vectors. By applying a cosine function to the angle, a value between 1 to 0 is created. Highly correlated or related items have values close to 1, while uncorrelated or unrelated items have values close to 0. The cosine values created by LLMs are called embeddings. And these are stored in a specialized database called a vector database. There are no rows and columns to protect. Now to be completely fair, an LLM does not compare just three dimensions. Instead, there are 1,600 dimensions in GPT-2. All of this matters, because it means new methods are needed to protect sensitive data, including IP or PII.

Related Article: Thinking of Building an LLM? You Might Need a Vector Database

Techniques for Dealing With the Risks

The risks for gen AI applications include bias in terms of training data and prompts, prompt replies and stored vectors. The issues with training data can create model bias. And for prompts, the issues include unauthorized retrieval, compliance, PII handling and inadvertent IP release. To deal with the latter, the approach to protect data must have the following three capabilities in the image below:

A diagram of a software application Description automatically generated

First is the ability to scan prompts for sensitive and irrelevant data. Here, the solution should allow or deny requests and redact or mask requests that are inappropriate. On the response side, the same things need to occur. And lastly, the process needs to be auditable. All of this requires new technology to succeed, but with this, it is truly possible to make generative AI safer for self-service business intelligence.

 

Learning Opportunities

Parting Words


The advent of generative AI is truly transformative. Those that succeed will gain business advantage for the markets they serve. And an important use case is the ability to transform how people acquire data and make decisions. Gen AI brings to the forefront the potential many conceived with the advent of personal digital assistants. However, to work, gen AI must be engineered to protect sensitive data and IP and do so in a way that ensures continuing compliance.

 

fa-solid fa-hand-paper Learn how you can join our contributor community.

About the Author
Myles Suer

Myles Suer is an industry analyst, tech journalist and top CIO influencer (Leadtail). He is the emeritus leader of #CIOChat and a research director at Dresner Advisory Services. Connect with Myles Suer:

Main image: Photo by Federico Di Dio Photography
Featured Research