Beware of these three generative AI “gotchas”
The hype around generative AI shows no signs of cooling, so in the coming months expect more executives to ask their teams, “What are we doing with generative AI?” And expect more teams to scramble to propose use cases for the technology.
All of which is well and good, so long as the potential pitfalls of generative AI in any given application are well understood and addressed.
Three of those potential pitfalls that are especially relevant in market research and competitive intelligence settings include plagiarism, hallucination, and copyright violation.
Plagiarism – “the practice of taking someone else’s work or ideas and passing them off as one’s own,” according to OxfordDictionaries – is the most straightforward of the three issues to address. All it requires is citing the source of any non-original material you are using as we did above. Well-constructed generative AI applications – Microsoft’s Bing search engine, for example – include citations automatically.
The second potential problem associated with generative AI is what has come to be known in the industry as hallucination: “a confident response by an AI that does not seem to be justified by its training data.” In other words, the software just makes stuff up, because it lacks information upon which to base an accurate response. This wouldn’t be good in any application, and it’s certainly not good in market research and competitive intelligence, where accuracy is foundational to effective analysis and, ultimately, sound decision-making.
The cure for this generative AI malady is to ensure the application is drawing upon only authoritative content from known, reliable sources, rather than the general internet. For enterprise-class market research and competitive intelligence, sources should include business and industry news outlets that produce high-quality original journalism (not sites that just republish press releases), reputable industry analysts and consulting firms, government databases, and other similarly vetted sources.
Now let’s turn to the challenge of copyright compliance for third-party content, like articles from news sites. If a generative AI application repurposes copyrighted material and creates a new document that contains so much of the original that there is virtually (or actually) nothing left outstanding, you have a problem. It’s essentially the “fair use” issue in a new, high(er) tech context.
“Fair use” is the principle that governs the legality of indexing, excerpting and summarization products. Whether the use of copyrighted content is deemed legally “fair” hinges on several factors, including whether it’s for a commercial purpose, whether the copied version is so complete as to substitute for the original, and whether the new product is “transformative” (truly new and different).
When this topic was adjudicated over a decade ago in a case involving an online news aggregator, a key metric applied to determine whether the use of copyrighted material was “fair” was the click-through rate from the aggregator’s version of an article to the linked original article posted by the publisher. While there’s no clear bright red line, the court held that a click-through rate of less than 1% was indicative of not “fair use,” while a click-through rate above 50% (for Google News) supported a finding of “fair use”. (By the way, Northern Light’s click though rate for our enterprise application’s use of news articles is 108%!)
We can assume similar guidelines will be applied to generative AI-based summaries derived from copyrighted content: the higher the click-through rate, the better (and legally safer).
So go ahead and build and deploy those generative AI-based applications within your enterprise. Just be sure they address the potential pitfalls and meet known standards for “fair” play.