Concerns about how new generative artificial intelligence (AI) programs, and the large language models upon which they are based, use information found on the internet are rampant. A recent article in The New York Times reports, “Fed up with A.I. companies consuming online content without consent, fan fiction writers, actors, social media companies and news organizations are among those rebelling… In recent months, social media companies such as Reddit and Twitter, news organizations including The New York Times and NBC News, authors such as Paul Tremblay and the actress Sarah Silverman have all taken a position against A.I. sucking up their data without permission.”
The root of the issue is copyright compliance, or lack thereof. While all users of generative AI ought to be sensitive to copyright issues, it is especially vital that large enterprises toe the copyright line, as those organizations represent ripe targets for large lawsuits by the rightful owners of intellectual property.
Now, chances are global technology, pharma and financial services companies won’t be building AI “training sets” based on the work of fan fiction authors or comedians, so those artists can rest easy when it comes to corporate use of generative AI. And in enterprise applications that leverage premium licensed content – many market research and competitive intelligence systems, for example — copyright issues are not relevant because the content license controls what is and is not permitted and the content license would supersede any general question of copyright compliance.
“Fair Use” of copyrighted content
However, for publicly available web-based content, such as news feeds, copyright is a consideration, and there are long-standing best practices to address it that can be applied to generative AI. Consider that search engines have been providing search results based on copyrighted text for decades, including short summaries of the presented documents. This practice has been well-litigated in the United States and found to be copyright compliant, being declared by courts as “fair use” of the web content for many reasons.
Generative AI does not change the copyright picture for web-based content indexed for search with text summaries provided to the user. The same classic tests for “fair use” will apply. The key is whether the answers and summaries provided by generative AI are so complete as to replace the need for the original document. If the original document need not be consumed because of the completeness of the search answers and summaries, the search application may not be “fair use.”
Include citations and links in generative AI summaries
Another approach to avoiding copyright violation issues (and plagiarism) with generative AI is to use citations and links. Citations avoid any worry about plagiarism – the original source is overtly credited – and give the user the ability to vet the source of the answer or summary item. The answers are deliberately brief so as not to be a substitute for consuming the original material from the publisher. And the answers and summary items are hot linked to the documents to make clickthrough easy.
Generative AI in search applications is another jump in the ability of search engines to get the best search results to users and get them excited about drilling into the content contributing the answer or summary item. Excellent generative AI-produced answers and summaries, containing both citations and hot links, are poised to advance the dissemination of knowledge within large enterprises, not compromise intellectual property and copyrighted content.