Latest Insights | News

CHATGPT TRAINING: How New York Times Can Succeed With Generative AI

December 15, 2023 | by Mutiu Iyanda | 0

As the discussions over the effect of generative artificial intelligence on journalism practice and revenue of news media organisations rage on across the world, in this piece, our analyst examined one of the recent legal actions of the New York Times, which was widely reported from the political economy perspective. Out of the numerous articles, our analyst selected one from National Public Radio.

It is mostly known as the public radio network of the United States. Every day, NPR connects with millions of Americans on air, online, and in person to explore the news, ideas, and what it means to be human. Though it has ‘public’ as part of its name, available information, however, indicates that it is an independent and non-profit media organisation.

Despite being headquartered in Washington, NPR reports issues and needs of local, national, and global importance towards creating a more informed public. Therefore, reporting issues around artificial intelligence in the news industry resonates with the mission. Being an independent medium, our analyst also expects some level of neutrality and objectivity in the article: New York Times considers legal action against OpenAI as copyright tensions swirl [available here], which was written by Bobby Allyn and supported by David Folkenflik as a contributor.

Tekedia Mini-MBA edition 16 (Feb 10 – May 3, 2025) opens registrations; register today for early bird discounts.

Tekedia AI in Business Masterclass opens registrations here.

Join Tekedia Capital Syndicate and invest in Africa’s finest startups here.

The Subject and the Arguments of the Writer

Lawyers for the Times are considering legal action against OpenAI to protect the intellectual property rights associated with its reporting. The Times and ChatGPT maker have been on intense negotiations over reaching a licensing deal in which OpenAI would pay the Times for incorporating its stories in the tech company’s AI tools. The discussions have become so contentious that the paper is now considering legal action. A lawsuit from the Times against OpenAI would set up what could be the most high-profile legal tussle yet over copyright protection in the age of generative AI. A top concern for the Times is that ChatGPT is becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper’s staff.

Federal copyright law also carries stiff financial penalties, with violators facing fines up to $150,000 for each infringement “committed willfully.” If a federal judge finds that OpenAI illegally copied the Times’ articles to train its AI model, the court could order the company to destroy ChatGPT’s dataset, forcing the company to recreate it using only work that it is authorised to use. Federal law allows for the infringing articles to be destroyed at the end of the case. The Times’ talks with OpenAI follow reports that the paper will not join other media organisations in attempting to negotiate with tech companies over use of content in AI models. If OpenAI is found to have violated any copyrights in this process, federal law allows for the infringing articles to be destroyed at the end of the case.

Legal experts say AI companies are likely to invoke a defense citing “fair use doctrine,” which allows for the use of a work without permission in certain instances, including teaching, criticism, research, and news reporting. There are two legal precedents that will likely play a part in the pending AI copyright disputes: a 2015 federal appeals court ruling that found that Google’s digitally scanning of millions of books for its Google Books library was a legally permissible use of “fair use,” and not copyright infringement. Lawyers for the Times believe OpenAI’s use of the paper’s articles to spit out descriptions of news events should not be protected by fair use, arguing that it risks becoming something of a replacement for the paper’s coverage.

Our Critical Comment

The New York Times has to maintain its revenue from subscriptions and other products, so it has the right to safeguard its content from generative AI. We also concur that ChatGPT is a direct rival to newspapers, especially when one takes into account how quickly the maker will analyse news articles from newspapers for users who have a variety of inquiries. Also, since the New York Times‘ position is protected by copyright law in the United States, OpenAI is expected to be aware that the articles are being used unlawfully. Furthermore, requesting that OpenAI remove the used articles is one thing. Finding out for sure if it has been permanently erased is another matter. As such, this would not be the best way to resolve the problem.

While the author cited several cases won on the basis of the US fair use doctrine, particularly Google scanning of several books for its Google Book library, it is important to note that Google allows the authors of the books to still benefit from their work by stating names and publishers, which serve as a means for potential buyers to approach the publishers or buy from physical and online bookstores. As a result, employing newspaper news stories by OpenAI does not provide newspapers with the possibility to get more readers and possible advertisers because users of the ChatGPT do not need to approach the newspaper as long as their need for condensed information is met.

The essence of the subject lies in the fact that advanced emerging technologies are eroding the quantity of value conventional news media should be capturing. Big technology companies are overshadowing traditional news media organisations, leveraging the capacity to gather, scale, and transform big data being generated by people and organisations. In the long run, OpenAI and others will be more valuable than news media organisations. In our view, we believe that now that OpenAI wants to use large language models for crawling online news media for the training of their software and selling large language products to potential users, the idea of recommodification of newsmakers or sources’ data similar to platforms’ user data is evolving.

Conventional news outlets, like New York Times, already obtained information from sources without paying them. This information was then turned into news and sold to subscribers, with the advertisers being incentivised to purchase specific space for their ads based on the number of views and reads. In this regard, we consider the New York Times‘ legal action against OpenAI as a major contradiction in its drive to defend journalism in the age of generative AI.

There is no doubt the struggle for control over content, financial considerations, and the evolving relationship between traditional media and AI technology are on the rise. The dispute reflects broader tensions in the digital age, where established media organisations confront challenges posed by emerging technologies and their impact on content creation, distribution, and revenue streams.

As we previously explained, traditional media gathers data by obtaining information from newsmakers and other relevant sources and converting it to commodities (editorial contents) without compensating the newsmakers and other sources. Commodities are sold to readers or subscribers in the form of pre-commodities and intermediate commodities, that is advertising spaces. In this context, news media organisations often have constant capital in the form of news sources and others they employ, whereas OpenAI is using the same in its products without legal permission by using large language models.

Legal interventions, in our view, should change from preserving competition and extracting surplus value to defending public interest, creating public wealth, and fostering social, technological, political, and economic alternatives to data commodification. If LLMs can greatly increase the rate of surplus value, OpenAI should be aware that the New York Times, with its adoption of technologies for production and distribution, is a vital stakeholder in its effort to train ChatGPT. As a result, rethinking the concept of surplus and exchange value in the age of technological revolution (particularly redistribute surplus wealth) is critical.

Another approach, which we believe would fix the problem and create a win-win situation, is for OpenAI to adopt the value creation and sharing formula used by Google News, Google Shopping, Google Scholar, and Google Play, among others. In other words, the New York Times should look for ways to persuade OpenAI to pay for the information it gathered on its website.

In addition, the newspaper and others should begin working on a vertical integration plan comparable to content syndication with OpenAI. In this case, OpenAI and others would be required to cite the newspapers that were used to train their ChatGPT in the outputs received by users. This will go a long way toward reinforcing the news industry’s sustainability in the age of generative AI and also enhances the credibility of ChatGPT at the same time.

CHATGPT TRAINING: How New York Times Can Succeed With Generative AI

Like this:

No posts to display

Post Comment Cancel reply

Share this:

Like this:

No posts to display

Post Comment Cancel reply