Skip to main content

Home/ 21st Century Skills/ Group items tagged evaluation

Rss Feed Group items tagged

Allard Strijker

https://arxiv.org/abs/2412.04984 - 0 views

  •  
    Frontier models are increasingly trained and deployed as autonomous agent. One safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives - also known as scheming. We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. We evaluate frontier models on a suite of six agentic evaluations where models are instructed to pursue goals and are placed in environments that incentivize scheming
Allard Strijker

An Evaluation of Primary School Children Coding Using a Text-Based Language (Java): Com... - 0 views

  •  
    All primary school children in England are required to write computer programs and learn about computational thinking. There are moves in other countries to this effect such as the U.S. K-12 Computer Science Framework (CSF) for development.
1 - 3 of 3
Showing 20 items per page