“Even if you played 60 million hands of poker for 70 years, 12 hours a day, and never made any mistakes, you still wouldn't be able to say with statistical confidence you were better than this program,” said study author and computer scientist Michael Bowling.
A perfect solution would mean a program would never lose money against an opponent in the long run. Bowling and his colleagues can't say that about Cepheus, but they know the computer can stay ahead for at least as long as any human can live. The team has even set up a Web site where visitors can try their hand against the program.
“We're not quite perfect, but we're so close that even after a lifetime of playing against it, you wouldn't know it wasn't perfect,” Bowling said. The study was published online Thursday in the journal Science.
Perfect solutions have been found for the games checkers and Connect Four, but unlike poker, these are characterized as perfect-information games. In other words, players know everything that has happened in the game before making their move.
“The other games that have been solved in the past — like checkers — are games of perfect information,” said software engineer Eric Jackson, a computer poker hobbyist who was not involved in the research. “Games of imperfect information are harder, and I think it may be surprising to some that that they can be solved by computers.”
Heads-up, limit hold'em marks the first imperfect-information game competitively played by humans that has been essentially solved, according to the research. Poker, with its core strategy of bluffing, thoroughly embodies a situation in which players hide information to their advantage.
As with any human player, Cepheus learns through experience, except its only opponent is itself. At first, Cepheus's strategy is random, and it takes every action with equal probability. Then it starts learning from its mistakes, asking itself, “Could I have done better if I raised here, or folded here?”
“We can start to measure how much it regrets actions it didn't take, and it uses this regret to update its strategy,” said Bowling, who has been working on the bot since 2003.
At the same time, because Cepheus is its own opponent, the game just keeps getting tougher to win. The strategy keeps adapting based on these regrets; mathematically the regrets start to go toward zero, and a near-perfect solution emerges.
After playing billions of hands against itself for over two months, Cepheus has improved its strategy so dramatically that any human opponent would likely be toast. It hasn't been pitted against the best human players yet, but earlier incarnations have trounced top online poker professionals in man-machine poker competitions.
“Our program is much, much better now than it was then, so I suspect that humans really hold no chance,” Bowling said. “They wouldn't be even able to break even. They would lose quite rapidly.”
When it comes to playing games, computers have historically dominated the best competitors humankind has to offer. The checkers-playing program Chinook in 1994 became the first computer to win a world championship title in a competition against humans. IBM's Deep Blue bested chess champion Garry Kasparov in 1997. Fourteen years later, "Jeopardy!" TV game show champions Ken Jennings and Brad Rutter were beaten by Watson, the company's question-answering AI system.
Even if humans are unable to win against Cepheus, the opportunity to learn from the best is certainly there. For instance, the program proves that the dealer does hold a distinct advantage in the game. Also, raising as the first move as opposed to calling is favorable in the vast majority of situations, since it can cause the opponent to immediately fold.
And surprisingly, Cepheus will choose to stay in play for a wide variety of hands, even conventionally “bad” ones that human players would be tempted to fold.
Although Bowling and his colleagues admittedly create programs like Cepheus for fun, the game can be seen as a stand-in for the imperfect-information stand-offs encountered in the real world. One party versus another, each with conflicting goals, unaware of whether their opponent will strike or stand down.
“You can view the conflict between terrorists on the one hand and homeland security on the other hand as a 'game' and use the tools of game theory to try and find strategies for security that are optimal,” Jackson said.
Bowling has even collaborated with diabetes physicians to come up with an algorithm to best treat patients, with the disease being the “opponent” in such a game.
“The real applications we target are things like negotiation, cybersecurity, military games and medical applications,” said computer poker researcher Tuomas Sandholm at Carnegie Mellon University, who was not involved in the study. “That's what we do, and poker is really just a convenient benchmark that we can compare year after year, and group to group.”
Sandholm calls the study a “landmark result.” His own group has been working on a much harder problem — no-limit hold'em, where the sizes of bets and raises are variable. However, solving this game, which is the most popular version of hold'em poker worldwide, might just be impossible.
“This form of poker is both harder and more popular than heads-up, limit hold'em,” said Jackson, who switched his focus from limit to no-limit hold'em after 2012. “I don't think it will be solved (not even approximately) in our lifetimes, so there are still plenty of challenges!”
Correction: "Jeopardy!" TV show champions Jennings and Rutter were beaten by the IBM computer Watson in 2011 -- not 2001. This article has been revised to reflect that fact.