For the past 13 days, however, Libratus has been facing off against four world-champion poker players in a Pittsburgh casino. If it can beat them like it beat Sandholm, it would be an enormous breakthrough.
So far, after 67,000 hands, Libratus has won $701,242 worth of chips after starting from a balance of zero. That means, of course, that the champions have lost that same amount, $701,242. (They’re not playing with real money but rather for a lump-sum prize of $200,000 that will divide at the end of the tournament.)
There are 53,000 hands left to play, and if this trend continues, it will be the first time that AI has beaten humans at poker.
That would be a huge achievement. Poker is not like other games, such as chess, where AI has emerged victorious thanks to advanced algorithms. Poker is much harder for AI. As the MIT Technology Review explained:
Poker requires reasoning and intelligence that has proven difficult for machines to imitate. It is fundamentally different from checkers, chess, or Go, because an opponent’s hand remains hidden from view during play. In games of “imperfect information,” it is enormously complicated to figure out the ideal strategy given every possible approach your opponent may be taking. And no-limit Texas Hold’em is especially challenging because an opponent could essentially bet any amount.
“Libratus has had the lead since the outset,” Sandholm said.
Monday, on the tail end of Day 13, four poker players, Jimmy Chou, Dong Kim, Jason Les and Daniel McAulay, sat in the dimly lit blue light of computer screens in Pittsburgh’s Rivers Casino, playing a virtual hand of cards against a virtual opponent.
For Sandholm, a computer scientist with a 126-page C.V., this is the culmination of 12 years of research. Starting in 2004 at Carnegie Mellon University, Sandholm began studying abstract algorithms for sequential imperfect information games. A “perfect” information game is one like chess, for example, where both players see the board and are in a good position to anticipate the opponent’s next possible move. An “imperfect information” game is one in which on each player’s turn they don’t know all the information available in the game — such as the other person’s cards.
Poker is an “imperfect information” game because players hide their hands, limiting the capacity of the opponent to calculate what their next move should be, thus allowing players to bluff.
The uses of the exercise go far beyond poker. War and cyberwar are both areas in which this could be useful.
Sandholm settled on No Limit Texas Hold’em poker as a model that could be extrapolated to real-life “imperfect” situations like cybersecurity or military strategy. He wanted a general purpose algorithm that would excel in strategic reasoning.
In the course of his research, time after time, his algorithms failed against humans in the game. Even as late as May 2015, when Sandholm organized a similar poker competition at Rivers Casino pitting AI program “Claudico” against four champion poker players, Claudico lost by $732,713 in chips.
“Where a human might place a bet worth half or three-quarters of the pot, Claudico would sometimes bet a miserly 10 percent or an over-the-top 1,000 percent,” Carnegie Mellon explained in a 2015 news release. As Doug Polk, a player against the program, explained at the time to CMU, “Betting $19,000 to win a $700 pot just isn’t something that a person would do.”
However, Sandholm’s team did win the Annual Computer Poker Competition against other AI research teams twice in a row.
“Different research builds on results,” he explains. None of the teams had succeeded — until Libratus.
Now, in the current competition in Pittsburgh, “AI is making moves humans would never make. AI is a Martian playing poker,” says Sandholm. Libratus, concocting a strategy based on its knowledge of the rules of No Limit Texas Hold’em and the moves you can make in the game, began beating even the two champion players who had played Sandholm’s prior AI program, Clautico.
It went like this:
27,000 hands in, Libratus had a $50,513 lead.
67,000 hands in, Libratus had doubled that lead 14 times, to $701,242 in chips.
The challenge for Libratus was that while the AI program remained constant, the human players were constantly studying, learning, and able to improve. They also had extra motivation to win: prize money and social pressure. On Day 9, a man said to Les, “Hey, you’re letting us down!”
Right now the AI is in first place. Sandholm has begun receiving, as he describes them, “a lot of nice emails” from other AI researchers about Libratus’s success. Meanwhile, the human poker players are streaming their games on Twitch and live-tweeting their results: “Humans end up winning $93k for the day. #BrainsVsAI,” Les tweeted on Jan. 23.
The competition lasts seven more days, unless they add on an extra day to account for the human poker players’ relative lack of speed. Sandholm won’t be popping any champagne yet, but by the end of the month that may no longer hold true.