How do we build organisations that want to build safe AI?
Much has been written about the dangers of artificial intelligence, and how the intelligent systems we build may sometimes unintentionally drift in alignment from their original goals. However, we seem to focus almost entirely upon an Artificial Intelligence System (AIS) drifting from an organisation's values, and yet little attention is paid to the danger of an organisation's values drifting from alignment with the common good. It is our responsibility when judging risk to plan for bad actors, despite any desire to be optimistic about the human condition. We do not yet know if truly malevolent artificial intelligences will come to exist. We can be confident in the existence of such human beings.
This essay does not address the technical question of how you embed ethics within an artificial system, as much of the field focuses on. It instead attempts to draw attention to a more social question: how do we build organisations that are strongly incentivized to create safe and ethical intelligent systems in the first place?
I am an avid and radical believer in the systemic property of democracy. But if you had asked me (before I wrote this, anyway) why I hold such a strong and deeply-held belief, I would have been uncomfortable with the amount of cultural conditioning that would come to mind. I grew up in in the West where you are saturated with a nominally pro-democracy viewpoint for your whole life, and so it is easy to endorse it as an ethical axiom, as opposed to in support of ethical axioms. It isn't enough for me to just feel strongly in support of radical democracy - I need to be able to tell you why.
Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?
In his 1984 lecture "Reflections on Trusting Trust", Ken Thompson (of Unix fame) speculated about a methodology for inserting an undetectable trojan horse within the C compiler binary that would self-propagate throughout all future versions. (Additional good video that got me thinking about this.)
The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used to compile the login command, I could log into that system as any user.
What does the reaction to NFTs tell us about how people evaluate ecological damage?
Recently, the Ethereum Foundation finalised the ERC-721 interface standard for Non-Fungible Tokens (NFTs). This standard lays out a protocol for the exchange and ownership of this new class of assets. An extremely simple explanation of the ERC-721 standard is as follows:
A blockchain is a decentralized ledger. It can contain things called smart contracts, which are like little computer programs. A smart contract can implement the ERC-721 interface, which means that it keeps track of variables called Non-Fungible Tokens (NFT). Each token has an owner, a unique identifier, and can be traded to other people. Each token can also include metadata information about the object it is representing.