How do we build organisations that want to build safe AI?
Much has been written about the dangers of artificial intelligence, and how the intelligent systems we build may sometimes unintentionally drift in alignment from their original goals. However, we seem to focus almost entirely upon an Artificial Intelligence System (AIS) drifting from an organisation's values, and yet little attention is paid to the danger of an organisation's values drifting from alignment with the common good. It is our responsibility when judging risk to plan for bad actors, despite any desire to be optimistic about the human condition. We do not yet know if truly malevolent artificial intelligences will come to exist. We can be confident in the existence of such human beings.
This essay does not address the technical question of how you embed ethics within an artificial system, as much of the field focuses on. It instead attempts to draw attention to a more social question: how do we build organisations that are strongly incentivized to create safe and ethical intelligent systems in the first place?
An idea I've been having a lot of fun playing around with is this idea of little generative algorithms to build mapping functions. When we normally think about a neuron within a deep neural network, we think about this point within a hyperdimensional space. The dimensionality of this space is defined by the number of neurons in the next layer, and the position within that space is defined by the values of those weights and biases.
If we think about what this neuron is actually doing, it is forming a mapping between an input and an output. We store this mapping naively as a very large vector of weights. When we want to see what the weight is, we just look up its index within that big vector. But imagine if you were a young coding student, and you were given the task to write a function that maps some input to some expected output. For instance, mapping an input to it's square. Would you really implement your function like:
Egos Are Fractally Complex
"All models are wrong, but some are useful" -- George Box
As our ability to model and simulate systems grows, we exert more and more computation on simulating ourselves - human agents, society, the systems which make up our existence. We simulate economic systems, weather systems, transport systems, power grids, ocean currents, the movements of crowds, and a million other models of our real world that attempt to predict some spread of possibilities from a given state.[^1] However, a model can only simulate the abstractable, and there is one object which remains resolutely unabstractable - the agency of egos.[^2] We may find ourselves immensely frustrated at this property in the future, and I believe it to be an insurmountable task. Here's why.
On 'Some Moral and Technical Consequences of Automation'
In 1960, Norbert Wiener - widely considered the originator of the concept of cybernetics - published a short essay entitled "Some Moral and Technical Consequences of Automation". Here's the article that got me there, which is mostly about social media and an abstracted reapplication of these concepts, but they tie in the article a bit.
I find myself facing a public which has formed its attitude toward the machine on the basis of an imperfect understanding of the structure and mode of operation of modern machines.
I am an avid and radical believer in the systemic property of democracy. But if you had asked me (before I wrote this, anyway) why I hold such a strong and deeply-held belief, I would have been uncomfortable with the amount of cultural conditioning that would come to mind. I grew up in in the West where you are saturated with a nominally pro-democracy viewpoint for your whole life, and so it is easy to endorse it as an ethical axiom, as opposed to in support of ethical axioms. It isn't enough for me to just feel strongly in support of radical democracy - I need to be able to tell you why.