Legal issues for ChatGPT and Bing Chat round the corner

In case you missed the hype so far, ChatGPT and Bing Chat are large natural language models that are able to mimic human conversation by training on vast amounts of text data available online. You can read more about them on this post.

When it comes to legal issues surrounding AI models, there has already been a spark of activity with Getty Images suing Stability AI, creators of popular AI art tool Stable Diffusion, over alleged copyright violation. Getty Images claims that Stability AI “unlawfully copied and processed millions of images protected by copyright” to train its Stable Diffusion model as reported by The Verge.

According to ChatGPT, it “was trained on a diverse corpus of text that includes web pages, books, articles, and other documents from a wide range of subjects and domains. This includes news articles, academic papers, literature, social media posts, and more.” So, you might naturally ask, did they have legitimate access to all those news articles, social media posts to build an AI language model and commercialise their data? It seems, similar lawsuits are inevitable in the following areas:

Intellectual Property: This is the most obvious one. Developed using a large amount of text data, it certainly contains copyrighted material. It is only a matter of time news outlets and other publishers will follow in the footsteps of Getty Images to stop their content from being scraped and used as trained data. Journalists, writers and blogs which companies like OpenAI regard as high quality data will probably have to compensate them somehow as they make millions. What is truly “free” content on the internet is most likely to be toxic, low quality text that an AI company would not want. The irony of it all is that ChatGPT can create convincing textual responses because it has been trained on text generated by humans. As Nick Cave put it, “a grotesque mockery”.
Plagiarism: When language models generate response, they rely on their training data and in the case of Bing Chat, the search results that come around. Although Bing Chat attempts to give references, they are not always accurate which may lead to issues of it outright plagiarising: the act of presenting someone else’s work or ideas as your own. We are not just concerned with an exact word-by-word copy since the generated responses often vary in the diction they use but more so the ideas, arguments and suggestions it gives. When it comes to writing essays and articles, there is no guarantee that it is not paraphrasing an existing article somewhere in the training corpus.
Privacy: A training corpus or in the case of Bing Chat, having access to the internet, language models are able to process personal information that you might not have consented them to do so. There is nothing stopping them from processing, storing and then sharing that personal information with its users. Think about search engines and the right to be forgotten in which you can ask search engines to remove results about you. Good luck doing the same with ChatGPT as there is no method for un-training a neural network. Sure, Bing Chat can remove search results but that does not remove what the language model has stored from its own training corpus and if a case similar to the Getty Images arises in which there is a provable instances of personal information and that person wants their information removed from ChatGPT, it will surely open up a legal firestorm.
Bias and discrimination: ChatGPT may learn biases or stereotypes from the data it is trained on, which could lead to discriminatory or unfair responses. This automatically tranfers to Bing Chat. If it has found search results, it might focus on summarising certain parts and leave what you might deem as important. It becomes all subjective and from a legal perspective discriminative. Imagine if Bing Chat summarises an article about minority groups in a way that removes critical points or responds to its users differently based on their background. This is already an active area of research as not only independent researchers but also OpenAI and Microsoft are trying to address.
Liability: Any tool based on generative AI will most likely come with lengthy terms of use ruling any liability on their part. For example Bing Chat terms of service comes with “No Guarantees; No Representations or Warranties; Indemnification by You.” But saying you are not liable does not immediately mean no one can actually sue you. This has been the debate for social media networks for a long time in which the platforms said they are not liable for the content their users posts and their terms of service said so but the lawmakers disagree.

It is worth mentioning Google’s work-in-progress AI search Bard. Google seems to be threading way more carefully in this domain, partly perhaps because recently after a mistake made by their AI search bot Bard wiped $100bn off shares. We will undoubtedly see more of these language models enter our lives in the future as they have benefits in terms of what they can offer. But, how our legal system will keep up with them, only time will tell.