Generative AI is rapidly gaining a reputation as an enabler of consumer and business applications alike. Its ability to process and examine huge swaths of data, automate complex processes and boost efficiency might get a lot of the attention, but it’s the technology’s power to drive improvements in usability that may end up making the case for its implementation.
Consider the recent product announcements by OpenAI and Google.
In May, OpenAI launched ChatGPT GPT-4o, a faster version of ChatGPT that can accept queries and provide responses in real-time, using a mix of audio, visuals and text. The next day, Google announced dozens of AI capabilities at its I/O conference. Among them was Gemma, a vision-language model that provides image and video captioning.
Besides adding multiple media capabilities, both products offer faster processing times and conversational voice interfaces that go a long way toward putting the “natural” in “natural language processing.” In its reporting of the announcements, TechRadar wrote: “[Both] seem to be fast enough for a truly natural conversation where you can interrupt the AI mid-flow.”
These features come at an opportune time. While technology companies have for years talked about simpler ways to interact with advanced products and solutions, the reality has run far behind the promise. Alexa can’t compete with 2001’s HAL 9000, let alone the “more human than human” Nexus series of Blade Runner or Commander Data of Star Trek: The Next Generation.
So, where does that leave us?
Designers’ Whims and Fancies
We’ve come a long way from the command-line interface. When was the last time you typed “C:/PRINT” to generate a hard copy? Most of us haven’t had to do that in decades — if ever. There’s a “Print” button for that now.
But after taking the first leap into the graphical user interface, human nature led designers to make the simple GUI approach more complicated.
“The File menu, with options like New, Open, Save and Exit, became commonplace. Dialog boxes had Ok and Cancel buttons, and all of these things did what they were expected to do,” observed Nick Hodges in a piece he wrote for InfoWorld. But, “everywhere you go on the web these days, you can see that these very useful, helpful notions are being lost.”
For instance, in some applications, developers have eliminated the “OK” button and replaced it with an X at the corner of a dialogue box. When you close that box, does the product instruct the system to save your changes? What if you didn’t want them saved? How do you restore the earlier version, which you wanted to keep?
Another example: It’s become increasingly difficult to move windows around a screen because of how challenging it is to find a spot to position the pointer and drag. Or, there was a time when the color gray indicated that a feature was unavailable. Today, it’s a common design element for active screens and controls.
Basically, Hodges said, “‘looking cool’ seems to have become preferable to ‘useful and usable’… I should be able to do the things that I want to do without having to struggle or wonder, ‘What just happened?’”
No surprise that the declining usability, amid all this great technology, is frustrating users.
Related Article: Prompt Engineers Are Unnecessary. Long Live the Prompt Engineer
Financial Impact of Usability
Fact: Usability affects performance. And for GenAI’s NLP capabilities to gain traction in our daily lives, it must appeal to users. In other words, it needs to be intuitive to use.
Research by Knoa Software found that the great majority of enterprise software errors — an astounding 91% — are user- or process-related. Only 8% are system generated. When an interface sticks to consistency and standards, users conduct each task more efficiently and require less assistance from the support desk or colleagues.
This is where the value of true natural language comes in.
Like the command lines before them, GUIs require users to understand their rules of engagement: this button prints a document, that button closes a window, this other control looks up a definition on the internet.
But the rules governing even simple tasks often vary. To close a window, the Mac’s operating system requires clicking a red dot in a window’s top left corner. Windows users click an X in the top right. To capture a screenshot, Mac users enter command-shift-3. Windows users enter WindowsKey-shift-s.
By introducing more sophisticated voice interfaces, OpenAI and Google — along with all the other developers facing the same issue — are moving toward an environment where each user can develop their own consistency. Rather than say, “resume” when they want to begin a task they suspended, they might say, “OK, keep going.” Rather than say “set a timer for 45 minutes,” they might instruct the system to “wake me at 2:45.”
This is an emphasis on what the user’s doing vs. what the system can do. It’s not about AI so much as it’s about UI. Software engineers and developers may have a different perspective, of course, but users will see this as a simpler, flexible and more natural approach to using advanced technology.
As Ben Wodecki wrote in AI Business, the product’s focus shifts to “consumer integrations rather than the loftier goal of developing artificial general intelligence (AGI) or an AI that thinks like a human.”
Ease use is just one result of these advances. A new perception of AI is another. The increasing use of simple, more human-like interaction is one reason OpenAI Chief Technology Officer Mira Murati describes generative AI solutions as “collaborators.” She told Bloomberg she expects adoption to increase as AI-powered tools become easier to use.