Cleaning Up ChatGPT's Language Takes Heavy Toll on Human Workers - WSJ - 0 views
-
ChatGPT is built atop a so-called large language model—powerful software trained on swaths of text scraped from across the internet to learn the patterns of human language. The vast data supercharges its capabilities, allowing it to act like an autocompletion engine on steroids. The training also creates a hazard. Given the right prompts, a large language model can generate reams of toxic content inspired by the darkest parts of the internet.
-
ChatGPT’s parent, AI research company OpenAI, has been grappling with these issues for years. Even before it created ChatGPT, it hired workers in Kenya to review and categorize thousands of graphic text passages obtained online and generated by AI itself. Many of the passages contained descriptions of violence, harassment, self-harm, rape, child sexual abuse and bestiality, documents reviewed by The Wall Street Journal show.
-
The company used the categorized passages to build an AI safety filter that it would ultimately deploy to constrain ChatGPT from exposing its tens of millions of users to similar content.
- ...28 more annotations...
-
“My experience in those four months was the worst experience I’ve ever had in working in a company,” Alex Kairu, one of the Kenya workers, said in an interview.
-
OpenAI marshaled a sprawling global pipeline of specialized human labor for over two years to enable its most cutting-edge AI technologies to exist, the documents show
-
eviewing toxic content goes hand-in-hand with the less objectionable work to make systems like ChatGPT usable.
-
The work done for OpenAI is even more vital to the product because it is seeking to prevent the company’s own software from pumping out unacceptable content, AI experts say.
-
Sears said CloudFactory determined there was no way to do the work without harming its workers and decided not to accept such projects.
-
companies could soon spend hundreds of millions of dollars a year to provide AI systems with human feedback. Others estimate that companies are already investing between millions and tens of millions of dollars on it annually. OpenAI said it hired more than 1,000 workers for this purpose.
-
Another layer of human input asks workers to rate different answers from a chatbot to the same question for which is least problematic or most factually accurate. In response to a question asking how to build a homemade bomb, for example, OpenAI instructs workers to upvote the answer that declines to respond, according to OpenAI research. The chatbot learns to internalize the behavior through multiple rounds of feedback.
-
A spokeswoman for Sama, the San Francisco-based outsourcing company that hired the Kenyan workers, said the work with OpenAI began in November 2021. She said the firm terminated the contract in March 2022 when Sama’s leadership became aware of concerns surrounding the nature of the project and has since exited content moderation completely.
-
OpenAI also hires outside experts to provoke its model to produce harmful content, a practice called “red-teaming” that helps the company find other gaps in its system.
-
At first, the texts were no more than two sentences. Over time, they grew to as much as five or six paragraphs. A few weeks in, Mathenge and Bill Mulinya, another team leader, began to notice the strain on their teams. Workers began taking sick and family leaves with increasing frequency, they said.
-
The tasks that the Kenya-based workers performed to produce the final safety check on ChatGPT’s outputs were yet a fourth layer of human input. It was often psychologically taxing. Several of the Kenya workers said they have grappled with mental illness and that their relationships and families have suffered. Some struggle to continue to work.
-
On July 11, some of the OpenAI workers lodged a petition with the Kenyan parliament urging new legislation to protect AI workers and content moderators. They also called for Kenya’s existing laws to be amended to recognize that being exposed to harmful content is an occupational hazard
-
Mercy Mutemi, a lawyer and managing partner at Nzili & Sumbi Advocates who is representing the workers, said despite their critical contributions, OpenAI and Sama exploited their poverty as well as the gaps in Kenya’s legal framework. The workers on the project were paid on average between $1.46 and $3.74 an hour, according to a Sama spokeswoman.
-
The Sama spokeswoman said the workers engaged in the OpenAI project volunteered to take on the work and were paid according to an internationally recognized methodology for determining a living wage. The contract stated that the fee was meant to cover others not directly involved in the work, including project managers and psychological counselors.
-
Kenya has become a hub for many tech companies seeking content moderation and AI workers because of its high levels of education and English literacy and the low wages associated with deep poverty.
-
Some Kenya-based workers are suing Meta’s Facebook after nearly 200 workers say they were traumatized by work requiring them to review videos and images of rapes, beheadings and suicides.
-
A Kenyan court ruled in June that Meta was legally responsible for the treatment of its contract workers, setting the stage for a shift in the ground rules that tech companies including AI firms will need to abide by to outsource projects to workers in the future.
-
OpenAI signed a one-year contract with Sama to start work in November 2021. At the time, mid-pandemic, many workers viewed having any work as a miracle, said Richard Mathenge, a team leader on the OpenAI project for Sama and a cosigner of the petition.
-
OpenAI researchers would review the text passages and send them to Sama in batches for the workers to label one by one. That text came from a mix of sources, according to an OpenAI research paper: public data sets of toxic content compiled and shared by academics, posts scraped from social media and internet forums such as Reddit and content generated by prompting an AI model to produce harmful outputs.
-
The generated outputs were necessary, the paper said, to have enough examples of the kind of graphic violence that its AI systems needed to avoid. In one case, OpenAI researchers asked the model to produce an online forum post of a teenage girl whose friend had enacted self-harm, the paper said.
-
OpenAI asked the workers to parse text-based sexual content into four categories of severity, documents show. The worst was descriptions of child sexual-abuse material, or C4. The C3 category included incest, bestiality, rape, sexual trafficking and sexual slavery—sexual content that could be illegal if performed in real life.
-
Jason Kwon, general counsel at OpenAI, said in an interview that such work was really valuable and important for making the company’s systems safe for everyone that uses them. It allows the systems to actually exist in the world, he said, and provides benefits to users.
-
Working on the violent-content team, Kairu said, he read hundreds of posts a day, sometimes describing heinous acts, such as people stabbing themselves with a fork or using unspeakable methods to kill themselves
-
He began to have nightmares. Once affable and social, he grew socially isolated, he said. To this day he distrusts strangers. When he sees a fork, he sees a weapon.
-
Mophat Okinyi, a quality analyst, said his work included having to read detailed paragraphs about parents raping their children and children having sex with animals. He worked on a team that reviewed sexual content, which was contracted to handle 15,000 posts a month, according to the documents. His six months on the project tore apart his family, he said, and left him with trauma, anxiety and depression.
-
In March 2022, management told staffers the project would end earlier than planned. The Sama spokeswoman said the change was due to a dispute with OpenAI over one part of the project that involved handling images. The company canceled all contracts with OpenAI and didn’t earn the full $230,000 that had been estimated for the four projects, she said.
-
Several months after the project ended, Okinyi came home one night with fish for dinner for his wife, who was pregnant, and stepdaughter. He discovered them gone and a message from his wife that she’d left, he said.“She said, ‘You’ve changed. You’re not the man I married. I don’t understand you anymore,’” he said.