Pinned post

🔴 **Welcome**

Thank you for visiting this humble account. I hope the time you spend here will be informative. If a particular toot brought you here, please do view other toots from different subject areas. You never know what new topics may interest you.

🔴 **Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation**

“_While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking._”

Baker, B. et al. (2025) Monitoring reasoning models for misbehavior and the risks of promoting obfuscation. arxiv.org/abs/2503.11926.

@ai @computerscience

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Mitigating reward hacking--where AI systems misbehave due to flaws or misspecifications in their learning objectives--remains a key challenge in constructing capable and aligned models. We show that we can monitor a frontier reasoning model, such as OpenAI o3-mini, for reward hacking in agentic coding environments by using another LLM that observes the model's chain-of-thought (CoT) reasoning. CoT monitoring can be far more effective than monitoring agent actions and outputs alone, and we further found that a LLM weaker than o3-mini, namely GPT-4o, can effectively monitor a stronger model. Because CoT monitors can be effective at detecting exploits, it is natural to ask whether those exploits can be suppressed by incorporating a CoT monitor directly into the agent's training objective. While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking. Because it is difficult to tell when CoTs have become obfuscated, it may be necessary to pay a monitorability tax by not applying strong optimization pressures directly to the chain-of-thought, ensuring that CoTs remain monitorable and useful for detecting misaligned behavior.

arXiv.org

24 June 1649: Fr Mercure Verdier writes to Rome about the Irish Jesuits. Among other things, he notes that each owns a watch 'and spends much time on it' Love of devices is nothing new. Watch with alarm, calendar and case. Made in #London 1645-1648 by Sampson Shelton (BM)

🔴 **The Persistence of Tychonism**

“_In conclusion I will suggest that the historical longevity of
Tychonism has been considerably underestimated and also that Tychonists were not generally restrained from public endorsement of heliocentrism by religious pressure. On the contrary, I suggest that the continued acceptance of Tychonism was conditioned by its congruence with scientists’ religious beliefs._”

Barker, P. (2025) ‘The Persistence of Tychonism’, in Philosophical Readings. Zenodo, pp. 11–23. doi: doi.org/10.5281/zenodo.1503506.

@science @religion

🔴 **Born On This Day**

Mathematician, astronomer, and physicist Pierre-Simon Laplace was born in Beaumont-en-Auge, Normandy, France on this day in 1749.

“_He proved the stability of the solar system. In analysis Laplace introduced the potential function and Laplace coefficients. He also put the theory of mathematical probability on a sound footing.”

🔗 mathshistory.st-andrews.ac.uk/.

@science

🔴 📖 **Aristotle by George Grote**

Project Gutenberg

_“Aristotle” by George Grote is a scholarly treatise written in the late 19th century. This work provides a comprehensive account of Aristotle's life, philosophy, and influence, focusing particularly on his ethical and political theories._

_Read now or download (free!)_

🔗 gutenberg.org/ebooks/45851

@bookstodon @philosophy

The poor and nature will go thirsty so that AI can drink it's fill

"To enable AI growth, many energy companies are planning to co-locate low-carbon power generation facilities with computing and data center partners. These initiatives are substantial in scale and present significant siting challenges, particularly in identifying locations with adequate water resources to satisfy the cooling requirements of power production, data centers, and local communities."

sustainability.com/insights/po

I'm no epidemiologist, but what if for public health we got rid of all the scientists and their centuries of human knowledge clawed from the vast sea of human ignorance with rigor and testing, and turn it all over to people who were educated by vape-pen bro podcasts and their own magical thinking.

🔴 **The human-driven ecological success of olive trees over the last 3700 years in the Central Mediterranean**

_“We hypothesize that cultural exchanges between Sicily and the Aegean in the Early-Middle Bronze Age have promoted the use of wild olive trees in the Messina area. At that time, olive trees could have been used for various purposes, including fruit production, but also wood and forage for animals. The development of olive groves followed a gradual, multi-stage process in the Central Mediterranean, with large-scale cultivation appearing only in modern times.”_

Palli, J. et al. (2025) 'The human-driven ecological success of olive trees over the last 3700 years in the Central Mediterranean,' Quaternary Science Reviews, 356, p. 109313. doi.org/10.1016/j.quascirev.20.

🔴 🇦🇺 **News Corp fills out tabloid papers with AI slop: NewsGPT**

David Gerard

_“News Corp has hit the slop machine since at least 2023, when it was using chatbots to churn out local news stories — 3,000 each week — attributed to the head of the data team. Nobody knew about this until a News Corp executive bragged about it at a conference.”_

🔗 pivot-to-ai.com/2025/03/21/new.

🔴 **The Start Of The Bostran Era**

The Bostran era started after Trajan annexed the Nabataean kingdom on 22nd March 106 AD. The era was named after Bostra, a town in modern day Syria, which housed the headquarters of the Roman Sixth Legion. Dates from the Bostran calendar were used on commemorative and honorific inscriptions in the province of Arabia Petraea.

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.