There is a lot of writing and work spent on evaluating models in some ways, comparing them and picking the "best" one. I am going to argue why it's not the best use of resources for most businesses to deliver mind-boggling value to the business.
And what matters for that is to pick use cases that are both deliverable today and high value to the business. It’s the 10x augmentation that will make a major difference to the business and prove AI is the powerhouse it is expected to be.
Table of Contents
- Pick a Model But Do Not Lock On It
- Think Big, Yet Deliverable
- High-Value Deliverable Use Cases
- How to Deliver? Get Cycles, Iterate Fast
- Think Go Live: Architecture Matters
- The Way Forward
Pick a Model But Do Not Lock On It
But before we start on use cases, you still have to pick a model …
High-quality models (Claude 3.5, GPT-4, Gemini 1.5) have largely converged today and offer similar capabilities for most use cases. Plus, they regularly interweave, depending on their release cycle, training period, etc. It is also expected the models will continue to converge, as the research community is focusing on the same problems and the same data. The models are also converging in terms of capabilities, as they are all based on the same architecture and similar training data.
A good way to go about selection is to base it on hard facts and constraints:
- How easy it is to access based on your company’s policies, data privacy rules, etc.? Starting with the cloud provider you are currently most using is probably a good place to start. Runs on AWS? Let’s go with Claude. On GCP? Let’s start with Gemini. Or if on Azure, then GPT-4 it is.
- Are the languages you need supported by the model? This can be an important driver, as all models aren’t trained on the same set of languages
- What is the expected input context you need? Models have various capabilities in how much they can take in for a single task. Based on the amount of data you will need to deliver on your business case, this can be a driver for choice.
And last but not least, make sure you do not lock on a model: it must be a run-time decision. To benefit from the rapidly evolving model landscape, it is critical you do not lock on a specific model but you design your project to easily switch and upgrade, so your processing can be moved onto higher-duality models as they come online without disrupting your service.
Think Big, Yet Deliverable
The more important task, as you are starting or advancing your AI journey, is to select the right use cases to deliver value to the business and convince the business of the value of AI. Too many AI initiatives fail to deliver because either the use case was not deliverable, hence producing mediocre results, or not ambitious enough, delivering low value to the business.
As with any project, it is important to understand the business domain, the nature of the processes, how teams work, interact and how information is pulled into the processes.
There are typical characteristics of what constitutes a good use case:
Compression vs. Expansion
LLM are great compression machines for language and meaning. They can take in a large amount of content and compress it into a small output based on your instructions (constraints). They typically work best when you have clear constraints, not general summaries, for example. Therefore, use cases where you can feed in a lot of context and have clear, structured instructions, will perform very well.
Structured Output
Counterintuitively, we’ve observed LLMs deliver much better results when the output is more formal, structured. Be that a good JSON data structure or a formal document, like a review document, a scorecard, etc. Use cases involving structured output tend to do better than free-form content.
Short-Form Content
LLMs have a small output maximum in a single run, typically eight to 16 pages of standard english text — 4,000 to 8,000 tokens with a more recent one going to 16,000. But regardless of the maximum output, we are observing that the longer the content, the lower the quality of the output. This circles back with the first characteristics: compression vs. expansion. There are, however, techniques to deliver long-form content, but they require more sophisticated approaches and systems. So totally suitable down the road, but maybe not early in your journey.
A Simple Framework to Evaluate
To help businesses evaluate potential AI use cases, here's a framework of questions to consider:
Alignment with business objectives
- Does this use case directly support our core business goals?
- How much the efficiency or opportunity would increase if we improve it?
Measurable impact
- Can we clearly define success metrics for this use case?
- How will we measure the ROI of this LLM implementation?
Data availability and quality
- Do we have the necessary content to feed the model?
- Is it easily available?
Technical feasibility
- Is it compression vs. expansion?
- How does it score on the above criteria?
Scalability
- Can this solution be scaled across the organization if successful?
- How will it integrate with our existing systems and processes?
Stakeholder buy-in
- Who are the key stakeholders for this use case?
- How can we ensure their support and engagement throughout the process?
- What do they need to see to benefit from the project?
By systematically working through these questions, you can evaluate potential AI use cases more effectively. This framework helps ensure you're not just chasing the latest AI trend, but focusing on implementations that will deliver real value to your business.
High-Value Deliverable Use Cases
Based on the previous aspects, we can define broadly several no-brainer use cases that can deliver value to any business. Here are some high-value use cases you can readily explore for your business.
Information Extraction
Take unstructured content and transform it into structured data to enable insights. All businesses have unstructured content that isn’t leveraged, and if it were, it could massively improve the business. Think of all the files stored on Sharepoint, Google Drive, Box or locked up within corporate software systems. Examples include licensing contracts, maintenance reports, HR interviews, performance reports, field visits, write-ups, support tickets, customer reviews, postmortems, etc.
By using this information-rich content and turning it automatically into structured data, you can start generating insights for the business at a depth that has never been available, often unlocking new opportunities of growth or efficiencies.
Content Review
Another wide field of application is content reviews: take unstructured content in, apply some guideline or formalized knowledge and decide if the content is compliant or not, flag issues and areas of improvement. It’s a wide category of use cases that is typically present in all businesses and are a core part of key business processes: contract review, licensing approval, overage billing approval, documentation review based on product specs/code, application review, code review, completeness verification, etc.
There are thousands of different business-specific use cases that are about content review. The key is identifying the tasks that are highly similar, where there are clear and documented guidelines on how to review the content and where the output is deterministic based on input.
Content Repurposing
Similar to content generation, but this use case category relies on existing information. Not pure content creation (like this article), but content generation based on a large input context, reference date or even unstructured content (meeting notes, design specs, campaign briefs, slack conversations, etc.). Product release assets are good examples of this use case, such as product documentation, how-tos and introduction guides.
How to Deliver? Get Cycles, Iterate Fast
A major contributor to success is to be in a position to iterate fast on the project. Avoid spending time on low-level details of the LLM, but get in a position where iterating on the prompts, data model and input context is easy and quick. The easier it is, the more iteration cycles you get, the more cycles you get, the more options you test and converge to the best outcome. Too many projects today are bogged down by technical details, because the LLM stack is still rapidly evolving. But there are solutions and vendors to help with that.
A big part of the success and pace to value will lie in how many interactions your team is able to get to deliver what matters to the business.
Think Go Live: Architecture Matters
And the last major point to consider is how to go live. Too many AI initiatives are stopped before they can go live, mired in endless scripts, experiments and unscalable models. AI or not, data privacy matters, IT security matters and data flows matter. Have a plan to go live from day one and leverage solutions that enable buy-in from your IT security team.
Whatever the plan, make sure you have a plan to go live, so that after convincing the business of the value, you can deliver this value in production!!
The Way Forward
In conclusion, the key to unlocking LLM potential for your business isn't about chasing the latest and greatest model or knowing all the quirks they come with. It's about identifying high-value, deliverable use cases that can demonstrate GenAI's power to transform your operations. By focusing on practical applications, maintaining flexibility in your model selection and prioritizing rapid iteration, you can sidestep the pitfalls of endless evaluation and comparison.
Remember, the true measure of AI's success isn't found in benchmark scores or model rankings. It's in the tangible benefits it brings to your business — the insights uncovered, the processes streamlined and the value delivered to your customers. So stop obsessing over model evaluations and start asking the real questions:
- What are our most important business bottlenecks today?
- Where are people spending their time on repetitive cognitive tasks?
- What would we need to accelerate those processes?
By shifting your focus from model comparisons to use case implementation, you'll not only accelerate your AI journey, but also position your organization to reap the rewards of this transformative technology. The future of AI in business belongs to those who can identify and solve real-world problems, not those who endlessly debate model specifications.
It's time to move beyond the hype and start delivering results. Your competitive edge in the AI era doesn't depend on having the "best" model — it depends on how effectively you can leverage AI to solve your unique business challenges. So what are you waiting for? The next big breakthrough for your business might be just one well-chosen use case away.
Learn how you can join our contributor community.