Contents contributed and discussions participated by Ed Webb
AI Mind Map Generator Online | Taskade - 0 views
Google Researchers' Attack Prompts ChatGPT to Reveal Its Training Data - 0 views
-
researchers showed that there are large amounts of privately identifiable information (PII) in OpenAI’s large language models. They also showed that, on a public version of ChatGPT, the chatbot spit out large passages of text scraped verbatim from other places on the internet
-
ChatGPT’s “alignment techniques do not eliminate memorization,” meaning that it sometimes spits out training data verbatim. This included PII, entire poems, “cryptographically-random identifiers” like Bitcoin addresses, passages from copyrighted scientific research papers, website addresses, and much more.
-
The researchers wrote that they spent $200 to create “over 10,000 unique examples” of training data, which they say is a total of “several megabytes” of training data. The researchers suggest that using this attack, with enough money, they could have extracted gigabytes of training data. The entirety of OpenAI’s training data is unknown, but GPT-3 was trained on anywhere from many hundreds of GB to a few dozen terabytes of text data.
- ...1 more annotation...
The Misinformation Susceptibility Test - 0 views
I unintentionally created a biased AI algorithm 25 years ago - tech companies are still... - 0 views
-
How and why do well-educated, well-intentioned scientists produce biased AI systems? Sociological theories of privilege provide one useful lens.
-
Scientists also face a nasty subconscious dilemma when incorporating diversity into machine learning models: Diverse, inclusive models perform worse than narrow models.
-
fairness can still be the victim of competitive pressures in academia and industry. The flawed Bard and Bing chatbots from Google and Microsoft are recent evidence of this grim reality. The commercial necessity of building market share led to the premature release of these systems.
- ...3 more annotations...
ChatGPT Is Nothing Like a Human, Says Linguist Emily Bender - 0 views
-
Please do not conflate word form and meaning. Mind your own credulity.
-
We’ve learned to make “machines that can mindlessly generate text,” Bender told me when we met this winter. “But we haven’t learned how to stop imagining the mind behind it.”
-
A handful of companies control what PricewaterhouseCoopers called a “$15.7 trillion game changer of an industry.” Those companies employ or finance the work of a huge chunk of the academics who understand how to make LLMs. This leaves few people with the expertise and authority to say, “Wait, why are these companies blurring the distinction between what is human and what’s a language model? Is this what we want?”
- ...16 more annotations...
'There is no standard': investigation finds AI algorithms objectify women's bodies | Ar... - 0 views
-
AI tags photos of women in everyday situations as sexually suggestive. They also rate pictures of women as more “racy” or sexually suggestive than comparable pictures of men.
-
suppressed the reach of countless images featuring women’s bodies, and hurt female-led businesses – further amplifying societal disparities.
-
“Objectification of women seems deeply embedded in the system.”
- ...7 more annotations...
The Generative AI Race Has a Dirty Secret | WIRED - 0 views
-
The race to build high-performance, AI-powered search engines is likely to require a dramatic rise in computing power, and with it a massive increase in the amount of energy that tech companies require and the amount of carbon they emit.
-
Every time we see a step change in online processing, we see significant increases in the power and cooling resources required by large processing centres
-
third-party analysis by researchers estimates that the training of GPT-3, which ChatGPT is partly based on, consumed 1,287 MWh, and led to emissions of more than 550 tons of carbon dioxide equivalent—the same amount as a single person taking 550 roundtrips between New York and San Francisco
- ...3 more annotations...
ChatGPT Is a Blurry JPEG of the Web | The New Yorker - 0 views
-
Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry JPEG, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
-
a way to understand the “hallucinations,” or nonsensical answers to factual questions, to which large-language models such as ChatGPT are all too prone. These hallucinations are compression artifacts, but—like the incorrect labels generated by the Xerox photocopier—they are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world. When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.
-
ChatGPT is so good at this form of interpolation that people find it entertaining: they’ve discovered a “blur” tool for paragraphs instead of photos, and are having a blast playing with it.
- ...9 more annotations...