How We Can Control AI - WSJ - 0 views
-
What’s still difficult is to encode human values
-
That currently requires an extra step known as Reinforcement Learning from Human Feedback, in which programmers use their own responses to train the model to be helpful and accurate. Meanwhile, so-called “red teams” provoke the program in order to uncover any possible harmful outputs
-
This combination of human adjustments and guardrails is designed to ensure alignment of AI with human values and overall safety. So far, this seems to have worked reasonably well.
- ...22 more annotations...
-
At some point they will be able to, for example, suggest recipes for novel cyberattacks or biological attacks—all based on publicly available knowledge.
-
But as models become more sophisticated, this approach may prove insufficient. Some models are beginning to exhibit polymathic behavior: They appear to know more than just what is in their training data and can link concepts across fields, languages, and geographies.
-
We need to adopt new approaches to AI safety that track the complexity and innovation speed of the core models themselves.
-
What’s much harder to test for is what’s known as “capability overhang”—meaning not just the model’s current knowledge, but the derived knowledge it could potentially generate on its own.
-
Red teams have so far shown some promise in predicting models’ capabilities, but upcoming technologies could break our current approach to safety in AI. For one, “recursive self-improvement” is a feature that allows AI systems to collect data and get feedback on their own and incorporate it to update their own parameters, thus enabling the models to train themselves
-
This could result in, say, an AI that can build complex system applications (e.g., a simple search engine or a new game) from scratch. But, the full scope of the potential new capabilities that could be enabled by recursive self-improvement is not known.
-
Another example would be “multi-agent systems,” where multiple independent AI systems are able to coordinate with each other to build something new.
-
This so-called “combinatorial innovation,” where systems are merged to build something new, will be a threat simply because the number of combinations will quickly exceed the capacity of human oversight.
-
Short of pulling the plug on the computers doing this work, it will likely be very difficult to monitor such technologies once these breakthroughs occur
-
Current regulatory approaches are based on individual model size and training effort, and are based on passing increasingly rigorous tests, but these techniques will break down as the systems become orders of magnitude more powerful and potentially elusive
-
AI regulatory approaches will need to evolve to identify and govern the new emergent capabilities and the scaling of those capabilities.
-
But the AI Act has already fallen behind the frontier of innovation, as open-source AI models—which are largely exempt from the legislation—expand in scope and number
-
both Biden’s order and Europe’s AI Act lack intrinsic mechanisms to rapidly adapt to an AI landscape that will continue to change quickly and often.
-
a gathering in Palo Alto organized by the Rand Corp. and the Carnegie Endowment for International Peace, where key technical leaders in AI converged on an idea: The best way to solve these problems is to create a new set of testing companies that will be incentivized to out-innovate each other—in short, a robust economy of testing
-
To check the most powerful AI systems, their testers will also themselves have to be powerful AI systems, precisely trained and refined to excel at the single task of identifying safety concerns and problem areas in the world’s most advanced models.
-
To be trustworthy and yet agile, these testing companies should be checked and certified by government regulators but developed and funded in the private market, with possible support by philanthropy organizations
-
The field is moving too quickly and the stakes are too high for exclusive reliance on typical government processes and timeframes.
-
One way this can unfold is for government regulators to require AI models exceeding a certain level of capability to be evaluated by government-certified private testing companies (from startups to university labs to nonprofit research organizations), with model builders paying for this testing and certification so as to meet safety requirements.
-
As AI models proliferate, growing demand for testing would create a big enough market. Testing companies could specialize in certifying submitted models across different safety regimes, such as the ability to self-proliferate, create new bio or cyber weapons, or manipulate or deceive their human creators
-
Much ink has been spilled over presumed threats of AI. Advanced AI systems could end up misaligned with human values and interests, able to cause chaos and catastrophe either deliberately or (often) despite efforts to make them safe. And as they advance, the threats we face today will only expand as new systems learn to self-improve, collaborate and potentially resist human oversight.
-
If we can bring about an ecosystem of nimble, sophisticated, independent testing companies who continuously develop and improve their skill evaluating AI testing, we can help bring about a future in which society benefits from the incredible power of AI tools while maintaining meaningful safeguards against destructive outcomes.