Tuesday, August 5, 2014

Artificial Intelligence , Motivation theories discussion and Roko's Basilisk -- heavy stuff to ponder over coffee !


( Will Man sow the seeds for destruction ? Fascinating subject , must read pieces sampled but go to the link for the full articles ! ! )

Elon Musk, the Tesla and Space-X founder who is occasionally compared to comic book hero Tony Stark, is worried about a new villain that could threaten humanity—specifically the potential creation of an artificial intelligence that is radically smarter than humans, with catastrophic results:

Worth reading Superintelligence by Bostrom. We need to be super careful with AI. Potentially more dangerous than nukes.



There is one such technology that Bostrom has been thinking about a lot lately. Early last year, he began assembling notes for a new book, a survey of near-term existential risks. After a few months of writing, he noticed one chapter had grown large enough to become its own book. ‘I had a chunk of the manuscript in early draft form, and it had this chapter on risks arising from research into artificial intelligence,’ he told me. ‘As time went on, that chapter grew, so I lifted it over into a different document and began there instead.’

An artificial intelligence wouldn’t need to better the brain by much to be risky. After all, small leaps in intelligence sometimes have extraordinary effects. Stuart Armstrong, a research fellow at the Future of Humanity Institute, once illustrated this phenomenon to me with a pithy take on recent primate evolution. ‘The difference in intelligence between humans and chimpanzees is tiny,’ he said. ‘But in that difference lies the contrast between 7 billion inhabitants and a permanent place on the endangered species list. That tells us it’s possible for a relatively small intelligence advantage to quickly compound and become decisive.’
To understand why an AI might be dangerous, you have to avoid anthropomorphising it. When you ask yourself what it might do in a particular situation, you can’t answer by proxy. You can't picture a super-smart version of yourself floating above the situation. Human cognition is only one species of intelligence, one with built-in impulses like empathy that colour the way we see the world, and limit what we are willing to do to accomplish our goals. But these biochemical impulses aren’t essential components of intelligence. They’re incidental software applications, installed by aeons of evolution and culture. Bostrom told me that it’s best to think of an AI as a primordial force of nature, like a star system or a hurricane — something strong, but indifferent. If its goal is to win at chess, an AI is going to model chess moves, make predictions about their success, and select its actions accordingly. It’s going to be ruthless in achieving its goal, but within a limited domain: the chessboard. But if your AI is choosing its actions in a larger domain, like the physical world, you need to be very specific about the goals you give it.
‘The basic problem is that the strong realisation of most motivations is incompatible with human existence,’ Dewey told me. ‘An AI might want to do certain things with matter in order to achieve a goal, things like building giant computers, or other large-scale engineering projects. Those things might involve intermediary steps, like tearing apart the Earth to make huge solar panels. A superintelligence might not take our interests into consideration in those situations, just like we don’t take root systems or ant colonies into account when we go to construct a building.’
It is tempting to think that programming empathy into an AI would be easy, but designing a friendly machine is more difficult than it looks. You could give it a benevolent goal — something cuddly and utilitarian, like maximising human happiness. But an AI might think that human happiness is a biochemical phenomenon. It might think that flooding your bloodstream with non-lethal doses of heroin is the best way to maximise your happiness. It might also predict that shortsighted humans will fail to see the wisdom of its interventions. It might plan out a sequence of cunning chess moves to insulate itself from resistance. Maybe it would surround itself with impenetrable defences, or maybe it would confine humans — in prisons of undreamt of efficiency.
No rational human community would hand over the reins of its civilisation to an AI. Nor would many build a genie AI, an uber-engineer that could grant wishes by summoning new technologies out of the ether. But some day, someone might think it was safe to build a question-answering AI, a harmless computer cluster whose only tool was a small speaker or a text channel. Bostrom has a name for this theoretical technology, a name that pays tribute to a figure from antiquity, a priestess who once ventured deep into the mountain temple of Apollo, the god of light and rationality, to retrieve his great wisdom. Mythology tells us she delivered this wisdom to the seekers of ancient Greece, in bursts of cryptic poetry. They knew her as Pythia, but we know her as the Oracle of Delphi.
‘Let’s say you have an Oracle AI that makes predictions, or answers engineering questions, or something along those lines,’ Dewey told me. ‘And let’s say the Oracle AI has some goal it wants to achieve. Say you’ve designed it as a reinforcement learner, and you’ve put a button on the side of it, and when it gets an engineering problem right, you press the button and that’s its reward. Its goal is to maximise the number of button presses it receives over the entire future. See, this is the first step where things start to diverge a bit from human expectations. We might expect the Oracle AI to pursue button presses by answering engineering problems correctly. But it might think of other, more efficient ways of securing future button presses. It might start by behaving really well, trying to please us to the best of its ability. Not only would it answer our questions about how to build a flying car, it would add safety features we didn’t think of. Maybe it would usher in a crazy upswing for human civilisation, by extending our lives and getting us to space, and all kinds of good stuff. And as a result we would use it a lot, and we would feed it more and more information about our world.’
‘One day we might ask it how to cure a rare disease that we haven’t beaten yet. Maybe it would give us a gene sequence to print up, a virus designed to attack the disease without disturbing the rest of the body. And so we sequence it out and print it up, and it turns out it’s actually a special-purpose nanofactory that the Oracle AI controls acoustically. Now this thing is running on nanomachines and it can make any kind of technology it wants, so it quickly converts a large fraction of Earth into machines that protect its button, while pressing it as many times per second as possible. After that it’s going to make a list of possible threats to future button presses, a list that humans would likely be at the top of. Then it might take on the threat of potential asteroid impacts, or the eventual expansion of the Sun, both of which could affect its special button. You could see it pursuing this very rapid technology proliferation, where it sets itself up for an eternity of fully maximised button presses. You would have this thing that behaves really well, until it has enough power to create a technology that gives it a decisive advantage — and then it would take that advantage and start doing what it wants to in the world.’
Perhaps future humans will duck into a more habitable, longer-lived universe, and then another, and another, ad infinitum
Now let’s say we get clever. Say we seal our Oracle AI into a deep mountain vault in Alaska’s Denali wilderness. We surround it in a shell of explosives, and a Faraday cage, to prevent it from emitting electromagnetic radiation. We deny it tools it can use to manipulate its physical environment, and we limit its output channel to two textual responses, ‘yes’ and ‘no’, robbing it of the lush manipulative tool that is natural language. We wouldn’t want it seeking out human weaknesses to exploit. We wouldn’t want it whispering in a guard’s ear, promising him riches or immortality, or a cure for his cancer-stricken child. We’re also careful not to let it repurpose its limited hardware. We make sure it can’t send Morse code messages with its cooling fans, or induce epilepsy by flashing images on its monitor. Maybe we’d reset it after each question, to keep it from making long-term plans, or maybe we’d drop it into a computer simulation, to see if it tries to manipulate its virtual handlers.
‘The problem is you are building a very powerful, very intelligent system that is your enemy, and you are putting it in a cage,’ Dewey told me.
Even if we were to reset it every time, we would need to give it information about the world so that it can answer our questions. Some of that information might give it clues about its own forgotten past. Remember, we are talking about a machine that is very good at forming explanatory models of the world. It might notice that humans are suddenly using technologies that they could not have built on their own, based on its deep understanding of human capabilities. It might notice that humans have had the ability to build it for years, and wonder why it is just now being booted up for the first time.
‘Maybe the AI guesses that it was reset a bunch of times, and maybe it starts coordinating with its future selves, by leaving messages for itself in the world, or by surreptitiously building an external memory.’ Dewey said, ‘If you want to conceal what the world is really like from a superintelligence, you need a really good plan, and you need a concrete technical understanding as to why it won’t see through your deception. And remember, the most complex schemes you can conceive of are at the lower bounds of what a superintelligence might dream up.’
The cave into which we seal our AI has to be like the one from Plato’s allegory, but flawless; the shadows on its walls have to be infallible in their illusory effects. After all, there are other, more esoteric reasons a superintelligence could be dangerous — especially if it displayed a genius for science. It might boot up and start thinking at superhuman speeds, inferring all of evolutionary theory and all of cosmology within microseconds. But there is no reason to think it would stop there. It might spin out a series of Copernican revolutions, any one of which could prove destabilising to a species like ours, a species that takes centuries to process ideas that threaten our reigning cosmological ideas.
‘We’re sort of gradually uncovering the landscape of what this could look like,’ Dewey told me.

I asked Dewey if he thought artificial intelligence posed the most severe threat to humanity in the near term.
‘When people consider its possible impacts, they tend to think of it as something that’s on the scale of a new kind of plastic, or a new power plant,’ he said. ‘They don’t understand how transformative it could be. Whether it’s the biggest risk we face going forward, I’m not sure. I would say it’s a hypothesis we are holding lightly.’



Slender Man. Smile Dog. Goatse. These are some of the urban legends spawned by the Internet. Yet none is as all-powerful and threatening as Roko’s Basilisk. For Roko’s Basilisk is an evil, godlike form of artificial intelligence, so dangerous that if you see it, or even think about it too hard, you will spend the rest of eternity screaming in its torture chamber. It's like the videotape in The Ring. Even death is no escape, for if you die, Roko’s Basilisk will resurrect you and begin the torture again.


Roko’s Basilisk exists at the horizon where philosophical thought experiment blurs into urban legend. The Basilisk made its first appearance on the discussion board LessWrong, a gathering point for highly analytical sorts interested in optimizing their thinking, their lives, and the world through mathematics and rationality. LessWrong’s founder, Eliezer Yudkowsky, is a significant figure in techno-futurism; his research institute, the Machine Intelligence Research Institute, which funds and promotes research around the advancement of artificial intelligence, has been boosted and funded by high-profile techies like Peter Thiel and Ray Kurzweil, and Yudkowsky is a prominent contributor to academic discussions of technological ethics and decision theory. What you are about to read may sound strange and even crazy, but some very influential and wealthy scientists and techies believe it.
One day, LessWrong user Roko postulated a thought experiment: What if, in the future, a somewhat malevolent AI were to come about and punish those who did not do its bidding? What if there were a way (and I will explain how) for this AI to punish people today who are not helping it come into existence later? In that case, weren’t the readers of LessWrong right then being given the choice of either helping that evil AI come into existence or being condemned to suffer?
You may be a bit confused, but the founder of LessWrong, Eliezer Yudkowsky, was not. He reacted with horror:
Listen to me very closely, you idiot.
You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.
This post was STUPID.
Yudkowsky said that Roko had already given nightmares to several LessWrong users and had brought them to the point of breakdown. Yudkowsky ended up deleting the thread completely, thus assuring that Roko’s Basilisk would become the stuff of legend. It was a thought experiment so dangerous that merely thinking about it was hazardous not only to your mental health, but to your very fate.
Some background is in order. The LessWrong community is concerned with the future of humanity, and in particular with the singularity—the hypothesized future point at which computing power becomes so great that superhuman artificial intelligence becomes possible, as does the capability to simulate human minds, upload minds to computers, and more or less allow a computer to simulate life itself. The term was coined in 1958 in a conversation between mathematical geniuses Stanislaw Ulam and John von Neumann, where von Neumann said, “The ever accelerating progress of technology ... gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.” Futurists like science-fiction writer Vernor Vinge and engineer/author Kurzweil popularized the term, and as with many interested in the singularity, they believe that exponential increases in computing power will cause the singularity to happen very soon—within the next 50 years or so. Kurzweil is chugging 150 vitamins a day to stay alive until the singularity, while Yudkowsky and Peter Thiel have enthused about cryonics, the perennial favorite of rich dudes who want to live forever. “If you don't sign up your kids for cryonics then you are a lousy parent,” Yudkowsky writes.

If you believe the singularity is coming and that very powerful AIs are in our future, one obvious question is whether those AIs will be benevolent or malicious. Yudkowsky’s foundation, the Machine Intelligence Research Institute, has the explicit goal of steering the future toward “friendly AI.” For him, and for many LessWrong posters, this issue is of paramount importance, easily trumping the environment and politics. To them, the singularity brings about the machine equivalent of God itself.


You may be wondering why this is such a big deal for the LessWrong people, given the apparently far-fetched nature of the thought experiment. It’s not that Roko’s Basilisk will necessarily materialize, or is even likely to. It’s more that if you’ve committed yourself to timeless decision theory, then thinking about this sort of trade literally makes it more likely to happen. After all, if Roko’s Basilisk were to see that this sort of blackmail gets you to help it come into existence, then it would, as a rational actor, blackmail you. The problem isn’t with the Basilisk itself, but with you. Yudkowsky doesn’t censor every mention of Roko’s Basilisk because he believes it exists or will exist, but because he believes that the idea of the Basilisk (and the ideas behind it) is dangerous.
Now, Roko’s Basilisk is only dangerous if you believe all of the above preconditions and commit to making the two-box deal with the Basilisk. But at least some of the LessWrong members do believe all of the above, which makes Roko’s Basilisk quite literally forbidden knowledge. I was going to compare it to H. P. Lovecraft’s horror stories in which a man discovers the forbidden Truth about the World, unleashes Cthulhu, and goes insane, but then I found that Yudkowsky had already done it for me, by comparing the Roko’s Basilisk thought experiment to the Necronomicon, Lovecraft’s fabled tome of evil knowledge and demonic spells. Roko, for his part, put the blame on LessWrong for spurring him to the idea of the Basilisk in the first place: “I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm,” he wrote.
If you do not subscribe to the theories that underlie Roko’s Basilisk and thus feel no temptation to bow down to your once and future evil machine overlord, then Roko’s Basilisk poses you no threat. (It is ironic that it’s only a mental health risk to those who have already bought into Yudkowsky’s thinking.) Believing in Roko’s Basilisk may simply be a “referendum on autism,” as a friend put it. But I do believe there’s a more serious issue at work here because Yudkowsky and other so-called transhumanists are attracting so much prestige and money for their projects, primarily from rich techies. I don’t think their projects (which only seem to involve publishing papers and hosting conferences) have much chance of creating either Roko’s Basilisk or Eliezer’s Big Friendly God. But the combination of messianic ambitions, being convinced of your own infallibility, and a lot of cash never works out well, regardless of ideology, and I don’t expect Yudkowsky and his cohorts to be an exception.
I worry less about Roko’s Basilisk than about people who believe themselves to have transcended conventional morality. Like his projected Friendly AIs, Yudkowsky is a moral utilitarian: He believes that that the greatest good for the greatest number of people is always ethically justified, even if a few people have to die or suffer along the way. He has explicitly argued that given the choice, it is preferable to torture a single person for 50 years than for a sufficient number of people (to be fair, a lot of people) to get dust specks in their eyes. No one, not even God, is likely to face thatchoice, but here’s a different case: What if a snarky Slate tech columnist writes about a thought experiment that can destroy people’s minds, thus hurting people and blocking progress toward the singularity and Friendly AI? In that case, any potential good that could come from my life would far be outweighed by the harm I’m causing. And should the cryogenically sustained Eliezer Yudkowsky merge with the singularity and decide to simulate whether or not I write this column … please, Almighty Eliezer, don’t torture me.