Skip to main content

/ 21st Century Skills/ Group items tagged scheming

Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

https://arxiv.org/abs/2412.04984 - 0 views

arxiv.org/2412.04984

scheming ai data research trust deception

shared by Allard Strijker on 11 Feb 25 - No Cached

Allard Strijker on 11 Feb 25

Frontier models are increasingly trained and deployed as autonomous agent. One safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives - also known as scheming. We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. We evaluate frontier models on a suite of six agentic evaluations where models are instructed to pursue goals and are placed in environments that incentivize scheming

<div class="cArrow"> </div><div class="cContentInner">Frontier models are increasingly trained and deployed as autonomous agent. One safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives - also known as scheming. We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. We evaluate frontier models on a suite of six agentic evaluations where models are instructed to pursue goals and are placed in environments that incentivize scheming</div>

...

Cancel

https://www.apolloresearch.ai/research/scheming-reasoning-evaluations - 0 views

www.apolloresearch.ai/...scheming-reasoning-evaluations

ai misleiding vertrouwen

shared by Allard Strijker on 11 Feb 25 - No Cached

Taalmodellen kunnen mensen misleiden - 0 views

thedataconnection.nl/...dellen-kunnen-mensen-misleiden

misleiding misleading scheming trust ai deception

shared by Allard Strijker on 11 Feb 25 - No Cached

Allard Strijker on 11 Feb 25

Sommige grote taalmodellen vertonen geheimzinnig, bedrieglijk en manipulatief gedrag wanneer ze een harde doelstelling moeten behalen. Dat blijkt uit onderzoek van Apollo Research, een organisatie die zich richt op AI-veiligheid.

<div class="cArrow"> </div><div class="cContentInner">Sommige grote taalmodellen vertonen geheimzinnig, bedrieglijk en manipulatief gedrag wanneer ze een harde doelstelling moeten behalen. Dat blijkt uit onderzoek van Apollo Research, een organisatie die zich richt op AI-veiligheid.</div>

...

Cancel
Allard Strijker on 24 Feb 25

Sommige grote taalmodellen vertonen geheimzinnig, bedrieglijk en manipulatief gedrag wanneer ze een harde doelstelling moeten behalen. Dat blijkt uit onderzoek van Apollo Research, een organisatie die zich richt op AI-veiligheid.

<div class="cArrow"> </div><div class="cContentInner">Sommige grote taalmodellen vertonen geheimzinnig, bedrieglijk en manipulatief gedrag wanneer ze een harde doelstelling moeten behalen. Dat blijkt uit onderzoek van Apollo Research, een organisatie die zich richt op AI-veiligheid.</div>

...

Cancel

1 - 3 of 3

Showing 20▼ items per page

Related searches