Finish and start pattern line on a race track
Editorial

AI Risks Grow as Companies Prioritize Speed Over Safety

3 minute read
Chris Sheehan avatar
By
SAVED
AI companies are racing to release faster than ever, but at what cost? Growing risks and weak safeguards could turn innovation into disaster.


When OpenAI’s ChatGPT was released toward the end of 2022, it quickly became the fastest-adopted technology product ever, reaching 100 million monthly active users just two months after launch.

ChatGPT reaches one million users.

Machine learning (ML) and AI had been around for years prior to this, but as I noted in my first blog for VKTR, ChatGPT helped move them "from the realm of sci-fi and tech into public consciousness.” It’s been a hyped-up and crazy technology cycle that got even wilder at the end of January when Chinese AI startup DeepSeek rolled out the newest version of its app, which quickly surpassed ChatGPT as the highest-rated free app on the App Store in the United States.

There were numerous articles written about DeepSeek’s impact on the global AI market, and then seemingly just minutes later, Alibaba claimed that its new AI model can perform better than anything existing on the market. In February, AI players like Baidu and others began to offer their services for free in response to DeepSeek’s popularity.

How all of this plays out in terms of market dominance is anyone’s guess, but the news gives us an opportunity to talk (again) (and more) about the risks inherent in AI, how this will only increase the pressure on companies to compete and release faster and why, as AI reasoning gets better and new applications and use cases explode, testing becomes even more critical.

Generative AI Is Risky Business

Due to its speed, range and general unknowns, generative AI compounds existing risks in applications and introduces new ones. These range from biased, toxic, inaccurate and inconsistent responses to misuse from bad actors, legal and security risks and regulatory compliance with local, regional and national requirements — and more may emerge. Given the complexity and probabilistic nature of the systems powering genAI, and countless use cases, you’re not sure what kind of responses you'll get in many cases unless you define them very, very clearly.

For organizations looking to optimize their genAI training and testing, it’s important to start with a risk framework. This needs to be specific to your organization, and regularly reviewed and updated. You should: 

  1. Define the risks that are of highest concern and how you plan to assess them.
  2. Next, decide where these testing procedures fit into your software development life cycle (SDLC).
  3. Finally, make sure that you are conducting ongoing testing to uncover not just problems, but feedback on how the system is performing in the real world.

There is always going to be some level of risk with generative AI. I’ve talked in past blogs about the importance of red teaming, a technique designed to identify points of failure that can be tough to surface through automated tests alone. Developers can then use the resulting information to retrain models or develop “guardrails” — that is, rules to mitigate risk. This systematic, adversarial approach is employed by human testers and reduces issues in AI models and solutions by focusing on common problems related to security, safety, accuracy, functionality and performance.

Related Article: Beyond Regulation: How to Prepare for Ethical and Legal AI Use

Red-Teaming Best Practices

Microsoft had one of the first red teams in the industry and regularly uses the technique as part of its genAI product development. It recently released a white paper outlining key findings from red teaming over 100 AI products.

Its three key takeaways from the experience mirror what my own company has discovered in our AI testing and training engagements. These include:

  1. “Generative AI systems amplify existing security risks and introduce new ones.”
  2. “Defense in depth is key for keeping AI systems safe.”
  3. “Humans are at the center of improving and securing AI.”

Microsoft cites subject-matter expertise, cultural competence and emotional intelligence as key reasons humans are important to keeping AI safe and secure. I agree.

The most effective testing for any application is to leverage real-world users to explore the “mights” — how might a person in X country, with XYZ devices, interact with this? How might a user respond to this answer?

Learning Opportunities

In addition to red teaming, keeping humans in the loop can also help ensure that high-quality training data gathered from various cultural and linguistic backgrounds is part of your AI model. Subject-matter experts can also fine-tune large language models for specific tasks and evaluate quality metrics like accuracy, coherence and tone in a more effective way than automated testing.

Testing the Unknown

The number of gen AI apps and features released into the market in just the past WEEK demonstrates how quickly this industry is moving, and how existing pressures to release quickly are unlikely to let up.

In the face of many unknowns and expanding risks, red teaming, quality assurance and testing with experts and end users are just some of the techniques organizations can use to source the right data, optimize models and ensure that AI features are safe, reliable and being used as intended. 

fa-solid fa-hand-paper Learn how you can join our contributor community.

About the Author
Chris Sheehan

As Applause's SVP and GM of strategic accounts, Chris Sheehan enables the success of Applause’s strategic account business, including strategy, sales and operations to ensure continued growth and customer success of its largest customers. Since joining Applause in 2015, Sheehan has held roles on multiple teams, including software delivery, product strategy and customer success. Connect with Chris Sheehan:

Main image: robsonphoto on Adobe Stock
Tags
Featured Research