Tuesday, January 2, 2024

Artificial Intelligence Ripoff


   
 "Artificial Intelligence is shamelessly ripping off authors and publishers. Some are starting to fight back."

               Tom Sancton


We are at the dawning of the age of Artificial Intelligence. 


I liken this moment to that period 30 years ago when Netscape was new, when AOL was cutting edge, and when internet access came from a dial-up connection. Artificial Intelligence is still new enough that we may be able to shape its use so that it is a tremendous benefit to humankind. If we don't, it may become a monster than destroys us.


Tom Sancton
Tom Sancton is an early victim of AI. He shows what can go very wrong. He is a college classmate, and a graduate of Harvard and Oxford. He spent two decades as a writer and correspondent for TIME and is the author of nine works of fiction and nonfiction. He has taught writing at the American University of Paris and Tulane University. He currently lives in France.


Guest Post by Tom Sancton

Amazon

          If you look up my 2017 nonfiction book The Bettencourt Affair on Amazon, you will find another “book” listed right under it: “Summary of Tom Sancton’s Bettencourt Affair.” I checked it out and saw that it was a 50-page, chapter-by-chapter resumé of my book, done by AI, and published by an outfit nobody ever heard of called Everest Media. It sells for $3.99.

Ripoff
 

       It obviously has no literary merit, but it paraphrases and summarizes the results of my five years of reporting and research and copies my structure. Whoever published it used my copyrighted work without permission or recompense to create what in fact is a competing product. Anybody who just wants to get a quick fix on this story of the world’s richest woman and the boyfriend who took her for a billion dollars can buy this cheap summary instead of my book. (BTW, the  Bettencourt story is currently featured on Netflix under the title “The Billionaire, the Butler, and the Boyfriend.” ) I brought the AI ripoff to the attention of my agent and publisher and was told that there was not much we could do unless the summary lifted passages verbatim from my book, which it apparently did not. 

But if I have no clear recourse, others are taking legal action. My son Julian Sancton, author of Madhouse at the End of the Earth: The Belgica’s Voyage into the Antarctic Night,has become the lead plaintiff in a class action suit against Open AI and Microsoft, claiming they used his book without permission or recompense to train their systems.


      The suit has since been joined by dozens of other nonfiction authors. A similar suit on behalf of fiction authors, including Jonathan Franzen and John Grisham, has also been filed. Recently, the New York Times filed suit against Open AI and Microsoft, claiming they sucked up millions of its copyrighted articles to train their systems and used the Times’ reporting to compete with the Gray Lady as a source of online news. 

What will come of these legal actions, apart from perhaps winning some kind of monetary settlement? Are Open AI, Microsoft and other AI companies really likely to change their ways? Perhaps the horse is already out of the barn and all creative people — authors, journalists, screenwriters, actors, musicians, and artists — will inevitably have their work ripped off, repackaged, and regurgitated in countless ways by AI systems that no one really controls or regulates. 

But that pessimistic view, which I originally tended to share, may not tell the final story. Commenting on the social network formerly known as Twitter, my former Time Magazine colleague Walter Isaacson writes: “These will be the most important cases for journalism and publishing in our lifetime. If AI companies have to cut deals with news organizations and publishers to license their content feeds for use as AI training data, that could save local journalism as well as magazines and publishing. It would provide a business model that supports people who report things, and it would place a financial premium on accurate, high-value journalism. AI systems will compete for which has the most valuable, reliable training data. Kudos to Axel Springer, Mathias Döpfner, and the AP for leading the way and the New York Times for making the legal case.”



 



[Note: To get daily delivery of this blog to your email go to https://petersage.substack.comSubscribe. The blog is free and always will be.]



4 comments:

Michael Trigoboff said...

AI yi yi. 😀

Speaking as a former AI researcher, I hope the authors beat the AI companies like gongs.

But if the AI companies lose, they will just retrain their neural network using solely public domain materials (e.g. Wikipedia). There may be enough of that available on the Internet to produce equivalent AI capabilities.

Perhaps a better solution would be for AI companies to have to pay royalties to anyone whose material was scraped for the training, but that might put them out of business altogether.

it’s not just authors; there is a ton of open source computer code on places like GitHub, And computer programmers may be in the process of putting themselves out of business. I had a great career doing that, and I am currently training the next generation, but I am not so sure things are going to go well for them.

Mike Steely said...

It would be nice to think that as we develop AI, the better angels of our nature will prevail – unlike when we unleashed the power of the atom. I realize it was just a movie, but for some reason the more we hear about AI, the more it conjures images of The Terminator.

M2inFLA said...

This too will not end well.

Say I write a report for a class about a book, short story, or a newspaper or magazine article I've read. It gets graded, but do I owe the authors or publishers anything as I used the information for personal gain; i.e. a grade in a class I am taking. Perhaps that grade will let me graduate, maybe even with honors.

Change that to a thesis that is required for a graduate degree. That thesis may be published by the college or university, and perhaps even result in fame and fortune in the future.

On paper, yes, it seems that the AI digesting efforts are quick, but no one has set any rules on what using this information requires for compensation.

All of us have learned using a variety of source materials, We may have paid for subscriptions or purchased a publication, but do any of those original publishers have any rights for how I use what I have learned? Of course, a simple footnote might be a minimum, but my brain may not recall how I learned everything I now know.

What constitutes fair use for an individual? Another business?

On the one hand all of us hopefully learn from what we hear, watsh, or read, but it can be time consuming. AI consolidators seemingly shorten that learning time for the user of AI. How should those AI consolidators be compensated, and how should the original authors of the base information be consolidated.

Don't forget, all those original sources that AI used may be the result of personal digesting of even more original information that may have bee copyrighted and/or published.

At a slower rate, we all have used a library to research. The authors and publishers of printed audio, visual, or printed media are not paid anything by those researchers; not you or me. We consume and use for free.

Google has figured out part of a solution. They evolved from Google's original homepage that was a white screen with a dialog box in the middle of it, along with a Search button to press. Each user got a summary and links to webpages that might answer to that search request. I along with others probably asked ourselves, "how is Google going to make money with this?". Well, somehow, they figured out how, as they are a multi-billion dollar company who made their money answering simple search inquiries. In those early days, Google only had to pay their Internet provider, and invest in a few servers that crawled the web.

The AI engines of today have simply continued that crawling of just about anything that was made available via the internet and a browser.

Yes, there probably is something in some of the terms and conditions about using that information that was made available to the browser. I don't recall seeing anything that limited me as to how much I could consume each day.

Those AI engines simply can do it a lot faster than I can.

Mike said...

The concerns over AI go far beyond reimbursing the sources it learns from. They include surveillance, identity theft, the spread of disinformation, deepfake technology, autonomous weapons and the list goes on. Thus, the need for robust regulation, ethical guidelines and security measures.

Unfortunately, the legislators responsible can’t even agree that trying to overthrow the government is bad – illegal even – and a disqualifier for one who wants to be president. Considering the amount of money - I mean, "free speech" - tech companies have available for bribery - I mean, campaign contributions - don't expect consumer protections to keep up with demand any time soon.