Creating a Chess Engine

Update 07/01/20

I have created a separate version of my engine that integrates the Universal Chess Interface (UCI) Protocol (allowing my engine to communicate with external GUIs). I then implemented the Lichess Bot API and registered a bot account so that my engine interacts with their servers. Unfortunately, the engine still runs on my laptop, so I can't keep the engine running 24/7. If you want to play the engine, contact me at abhathal@berkeley.edu and I will gladly activate the engine. Click here too see the profile.

Update 06/28/20

I have converted all my code into a .jar binary and then converted that to a Windows executable so that the engine can be played from the command line interface without the need to clone all my source files. Both the .jar (if you are using a Mac) and the .exe are available here.

Introduction

In the past month or so, I've gotten into chess quite a bit on websites like Chess.com and Lichess.org Usually, after blundering for an entire game, I visit the analysis section to see where exactly I went wrong with the hope of learning from my mistakes. This rarely ever works, but the past few times I've wandered into the analysis section, something else has piqued my interest. Both Chess.com and Lichess.org use incredibly strong chess engines for their analysis (the former uses Stockfish 10 by default while the latter uses the newer Stockfish 11.) For some background, since 1995 when Deep Blue beat World Champion Garry Kasparov in a 6 game series, computers have been considered superior to even the greatest grandmasters. Realizing there was no hope for me as a chess player, I took inspiration from Kevin Durant and thought "if you can't beat them, might as well join them." So I took on the challenge of building my own chess engine. Evidently, there is no way I will reach the levels of Stockfish (due to both the mechanical constraints of running everything off of a 4-year-old Dell XPS and the shear difference in knowledge and experience). Still, I want to build something that I can be proud of and learn something along the way. I also intend on making this a work-in-progress. I'll keep refining my engine and posting updates here as I do.

Tools

Before going over the AI itself, there are some other requirements. At the most basic level, there's the language that I will code in. While C++ is the obvious choice for speed, I'm most familiar with Java, so I chose to write the engine in Java. Next, we would need some way to represent the game of chess (the flipping of turns, the chess pieces, the board itself). There also must be a way of representing moves and extracting a list of moves for any given position. Finally, there would have to be some standardizations such as reading/writing of Forsyth-Edwards Notation (FEN) and Portable Game Notation (PGN). If this were a game like checkers or tic-tac-toe, I would be inclined to coding this myself, but moves in chess are a programming nightmare. For starters, there are 6 different types of pieces which move in 6 different ways. Next, there are exceptions to those moves such as pawns capturing diagonally, knights being able to skip over pieces, castling, en-passant and pawn promotion. There are also rules such as 50-move rule, insufficient material, stalemate, and threefold repetition that only add to the complexity. I wanted to focus the majority of my effort on the engine itself, so I decided to use Chesspresso, an open-source java library. Unfortunately, theres not much documentation available and there's not many implementations of Chesspresso on the internet, so I spent quite a few hours reading through the relevant classes myself. Chesspresso fulfills all the above requirements and does so quite efficiently. Leading chess engines all use some sort of bitboard, which is a representation of the chess board, pieces, and moves as bits. This significantly reduces the time complexity of determining valid moves.

Minimax and Alpha-Beta Pruning

Now comes the more interesting part. For two player zero-sum games, game trees are the most succesful abstraction for AIs. Neural nets have also been shown to work in some cases, but the leading chess engines all use some sort of game tree as the backbone of their program. Within game trees, the most commonly used algorithm is Minimax. Minimax is a type-A search algorithm which evaluates all positions up to a certain depth level. Type-A simply means that every single (useful) node is searched up to a certain depth, after which no node is searched. The reason I say "useful" will soon become apparent. More specifically, minimax evaluates all leaf nodes using a heuristic function that assigns a score to each position. The greater the score the better the position for one player and worse for the other, while the converse is true as well. Beginning at the depth d, either the maximum or minimum score is chosen for each node in d - 1. This repeats at each subsequent depth level, flipping maximum or minimum each time such that at d = 0, if it is white's turn to play the maximizing node will be chosen while if it's blacks turn the minimizing node with be chosen. This is rather confusing to convey in words, so see here for visualizations.

I say that every useful node will be searched since we will be implementing alpha-beta pruning, an optimization of Minimax. In short, alpha-beta pruning takes advantage of the fact that when searching for a minimizing or maximizing node, there might be situations wherein no matter what the state of a nephew node is, it's parent will never be chosen assuming optimal play. Then, we can "prune" that nephew, its children, grandchildren and so on by ignoring them in our search. For a better description and visualization of this algorithm, see here. The power of alpha-beta pruning is immense. For reference, in one of my implementations, without alpha-beta pruning I searched 243,000 leaf nodes at depth 4, whereas with alpha-beta pruning that number was reduced to just 65,000 leaf nodes at a depth level of 5.

Heuristic Function

The implementation of minimax is roughly the same universally, so the most important part of the algorithm is likely the evaluation heuristic. Ultimately, I wanted to make the engine 'mine' so while there is plenty of literature online about chess heuristics, I chose to create my own. The benefit of a Type-A search algorithm, is that my engine can guarantee forced wins (meaning a checkmate in less than n moves no matter what the opponent plays) when the number of moves is less than the search depth. In the vast majority of chess positions, however, there are no forced wins. Thus, a heuristic function is a sort of guide for the direction in which the game should develop for my side.

Unfortunately, I wasn't able to access the bitboard directly from the Chesspresso .jar file, so in coming up with my heuristic, I had to iterate over the entire board, making my implementation less than ideal. Without going into too much depth about the specific numbers I chose, I want to explain the main aspects of my heuristic. Most basically, I assigned each piece on the board a value: 100 for pawns, 300 for knights, 325 for bishops, 500 for rooks, and 900 for queens. The ratios between these numbers are fairly standard and agreed upon in the chess community. Next, I modified these values slightly based on position parameters. For pawns, I increased the value of those closer to the center and further along the board. For knights, I valued those closer to the center of the board higher, which I calculated using euclidean distance. Knights in a 4x4 grid in the center of the chessboard have a maximum of 8 possible moves, while those on edges or corners have less moves. For bishops, I did the same since bishops closer to the center have more diagonals and possible moves. For rooks, I found that there isn't much of an advantage in placing a rook in the center of a board since it's often the king's primary defender and is used in combination with the other rook and queen to launch 'stacked' attacks. Hence, I left the positional score of the rook the same everywhere on the board. Likewise, I found that the Queen's movement is quite varied in chess games. While a central position would increase attack threats, the queen is also much more valuable than the minor pieces and would thus be more vulnerable to enemy attacks in the center. Due to this complexity, I chose to leave the Queen's score constant. Finally, for the King, I rewarded safety, thus providing a point bonus of 100 points for a king side castle and 50 points for a queen side castle.

In additional to positional parameters, there are special bonuses that I added. I added a significant bonus for pass pawns due to the promotion threat. For bishops, I added a bonus for the openness of their main diagonal defined by the number of total pawns blocking their path. The more open the diagonal, the more squares are being controlled by a bishop. Likewise, for rooks I added an open lane bonus that is given if there are no pawns blocking a rook's vertical lane. Finally, for the king, I added a protection bonus if there are 3 or less unoccupied squares around it. With my heuristic and algorithm complete, it's time to run our first game of our engine playing itself.

I wanted to make my engine as interactive to readers as possible, so I tried hosting it on this website. Then, I realized that github pages, where this website is hosted, only supports static pages, meaning I had to run everything clientside. I then tried translating my engine in JavaScript using the chess.js library for move generation. After implementing minimax, I ran the engine only to find out that my maximum search depth was 2, and at search depth 3, minimax takes >10 seconds. At search depth 4, the browser crashed and so did my hopes. And this was before implementing a proper heuristic function (at the time, I was randomly generating a heuristic). I'm still not sure why my implementation is so slow since chess.js also seems to be using bitboards, but after 2 days of tiddling with all parts of my code, I decided to stick with my old engine. I compromised on user-interactivity and decided to import the PGN from my engine into the chess.js library and use the chessboard.js library to visualize my games in JavaScript. By the way, I must say that the interface and documentation for chessboard.js is incredibly easy-to-use and well-written.

Here is the game I promised to show you two paragraphs ago:

It doesn't take more than a few seconds of cursory analysis to see that this game was bad. That being said, the moves made are definitely better than random and the engine seems to understand piece value to a certain degree. The opening played is very unusual and technically unsound according to stockfish. There are also random sacrifices that lose material and don't seem to provide any positional advantage. Simple two-move tactics such as 13. Qc3+ by black to fork the rook in check are missed and the endgame is unnecessarily drawn out. If I were to guess the elo for the engine, I would guess maybe around 600-700.

Iterative Deepening and Shallow Move Ordering

The first improvement we can make to our existing minimax structure is to implement a different search strategy called iterative deepening. Normally, minimax is implemented recursively as a DFS. There are, however, many benefits of a BFS. For one, knowing the 'value' of all moves at depth d - 1 could provide useful feedback for further search, such as where/how to prune and maximize efficiency. The implementation of BFS, however, would require the use of a stack and since the branching factor of our current game tree is massive (~30), BFS would be very memory-intensive. The in-between of DFS and BFS is iterative deepening. Quite simply, iterative deepening means running the normal DFS at increasing levels of depth. Initially, such an idea would seem counterintuitive. Calling the minimax algorithm at incrementing depths would be redundant since d = 1 will be searched n times, d = 2 will be searched n - 1 times and so on. This is in fact true, but since the branching factor of the minimax structure is so large, each successive search requires roughly b - 1 more computing power than the previous search where b is the branching factor. Since the number of nodes at each depth form a geometric series, this can be proven to be true. There is also another benefit to this algorithm. At various stages of the game and different positions, the branching factor varies (in closed positions and certain king and pawn endgames the branching factor can be low ~10, whereas for very open positions where the majority of material is still on the board, I've seen the branching factor be close to 70). Thus, it would be unwise to have a constant depth. Using iterative deepening allows a program to store the best move at d - 1 in case going one ply deeper would be too computationally intensive.

Once iterative deepening is implemented, we can look to take advantage of it by implementing shallow move ordering. Begininning at depth = 1, we use minimax to evaluate each move for the current player. Normally, such information would go to waste as we only care about the best move, however, we can take advantage of this extra information by ordering the moves from best to worst for the next depth level. Once again, the moves are ordered according to their new evaluation and the process is repeated. The benefit of ordering moves has to do with how alpha-beta pruning works. alpha and beta converge faster when the better moves appear earlier, meaning that the number of branches that are pruned also increases. I tested out move ordering by counting the number of searched nodes when the moves were ordered and then the number when the ordering was reversed (worst to best) and found that the number of nodes searched with reverse ordering is more than double.

Quiescence Search

Implementing iterative deepening and shallow move ordering doesn't change how positions are evaluated, it simply makes search faster. In chess, like many games, there is a horizon effect at play with limited searches. Since engines can only see to a certain depth, they will consider the best situation at that depth even if such a position is horrible just one ply further. Take, for instance, a simple example. Suppose our engine can only see until depth 1. Then, the best move might be Queen takes Bishop. In the engine's figurative mind, a bishop has been won, a significant gain from our current point of view. Suppose that the bishop is defended by a pawn, now such a move is disastrous as the opponent can easily play pawn takes queen and now the opponent is up instead. Thus, we have to have some sort of secondary search that explores nodes beyond our maximum depth. We define nodes at depth d as either stable or not. A stable node is one created by a non-capturing move while a non-stable node is one created by a capturing move. We want to explore all non-stable nodes until they reach a stable state, meaning we apply a secondary search (called quiescence search) to all positions produced by a capturing move. This secondary search is quite simple. For all non-stable leaf nodes, we continue our tree search by only exploring their capturing children until there are no more capturing moves, at which point we evaluate the board position according to our heuristic. Of course, quiescence search isn't perfect. It assumes that all unstable nodes exist one ply from the bottom of the minimax search tree and that an unstable node is created by a capturing move. We could design a more rigorous quiescence search that took into account checks, hanging pieces, forks, skewers, etc., but doing so would not be very valuable unless we already have a very strong engine, which we don't. With iterative deepening, quiescence search, and shallow move ordering implemented, let's see another game of our engine against itself.

There is a bit of improvement from both sides, we see castling and the engine recognized a mate in two at the end. The game is still quite bad, however, and I hypothesize that the implentation of quiescence search didn't help but hurt our engine. While before we were searching to around depth 4 or 5 for most moves, with quiescence search our depth has decreased to on average 3 and occasionally 2 during open middlegame positions. I went ahead and counted the number of nodes searched in our minimax tree versus the number of nodes searched in our quiescence tree and found that the quiescence tree was accounting for over 90% of the nodes, meaning that most of our computational power was being spent on our secondary search and not our primary one, decreasing the depth of our primary search. I also found a bug in my minimax implementation of black (I was at times prematurely returning moves for the minimizing player) which led to worse play for black.

Quiescence Pruning

Just like we pruned the minimax tree, we can also prune the quiescence tree. Quiescence search is prone to combinatorial explosion when a mobile piece such as a queen infiltrates the other side and begins capturing every opponent piece, turn by turn. In reality such an event never occurs since the queen is quickly recaptured by an enemy piece. In order to mimic this idea, we store the value of a position and access that value at even number of plies later to see if the search should be continued or be pruned. The maximizing player would prune any subtrees where the value of the position was less than the previous evaluation. Likewise, the minimizing player would prune any subtrees where the value of the position is less than the previous evaluation. Every time the evaluation of a node betters from it's previous evaluation, we reassign the variable to the new evaluation. It might seem like we are missing many possible plays by pruning like this, and this is in fact true. Quiescence search is so many nodes deeper than our current position, however, that we can afford to be more aggressive with our pruning and our main goal should be to reduce horizon mistakes, which this method can achieve. In addition to this exchange evaluation, we can employ our regular alpha beta pruning to this tree.

Transposition Tables

There is another efficiency improvement we can make. By storing previously evaluated nodes in a transposition table we can save our engine the work of having to evaluate the same position over and over. In addition to the position, we want to store the score, the depth of the previous search and the best move found. One of the benefits of iterative deepening is that the same position is often evaluated more than one time. When this is the case we look at the maximum depth at which this position was previously evaluated and if that depth exceeds or equals the current depth at which we are searching, we return the best move and score of the stored position. The reason we only do this when the stored depth is equal to or higher than our current depth is because depths lower than our current depth are nonoptimal and will reduce the quality of our search. Meanwhile, say we are at depth 4 of 5 and arrive at a position that was previously searched at depth 2 of 5. If we continue searching, the value of this position will be based on one ply depth, whereas if we used the stored evaluation, it will be based on 3-ply depth, yielding a better analysis. The way we implement such a structure is by using a hashtable hashed using a Zobrist key that gives a unique value for every possible position. The Zobrist key is particularly important because it is efficiently generated using bitboards, while other hashing functions might be slower and reduce the speed of the get() and put() methods. We can flush the transposition table after every move to ensure that it doesn't become too large and contain nodes that are no longer useful to us.

Principal Variation Node Ordering

Principal Variation nodes, PV-nodes for short, are nodes considered to be the best from previous searches in the iterative deepening framework. Say, for instance, that we complete a minimax search at depth 2. Then for every non-leaf node, we store the best move that we found. When iterative deepening advances to depth 3, we already have the best moves stored for the first two depths and we can order our move list for the first two depths by first checking that best move and then all the other moves. By checking what we believe to be the best move first, we hope to cause an earlier cutoff in our alpha-beta pruning. Fortunately, there is no need to implement a seperate structure to keep track of PV-nodes since we already have our transposition table. To recap, here is our ordering algorithm until now:

Check transposition table to see if there are any searches on the same position. If there are:
1. If the depth of the previous search is equal to or greater than the current search, return the move and value of the stored search
2. If the depth of the previous search is less than the current search, order the move list by first searching the stored move
If this is the root node, order all moves using the shallow move ordering algorithm described earlier
If this is a leaf node, first check all capturing moves using the quiescence search algorithm described above, then evaluate all non-capturing moves using our heuristic function

With all this implemented, I tried another game of my engine playing itself.

The game starts with the nimzowitch defense being played. Black quickly moves off-book in move 3 with Bd6, but we see a fairly normal development from both sides. To my surprise, there are no blunders in this game, and around half of the moves played in the middle and late game were Stockfish's best move recommendations. The material on the board was completely even until move 18, which was a pleasant sight. To my bigger surprise, black makes a seemingly bad move on move 27, sacrificing a knight when the minor pieces are even for both sides. Surprisingly, this is stockfish's recommended move and if black had instead moved the knight to a different square white could trap black's queen with Bf5. The game finishes off quite nicely with the engine recognizing a forced mate in 6, which I was not expecting since the engine was operating at a 6-ply depth. Engine versus Engine is one thing, but I wanted to benchmark my engine my playing against it myself. I play as white here.

I played that game pretty poorly, but still I was not expecting to get crushed so badly. The engine quickly developed a massive pawn advantage that it used to force my pieces into bad positions and mated me in 32 moves. Granted, I'm a pretty bad chess player but I'm still impressed that my engine defeated me and did so in such a fashion. Since the first game on this page, there is significant improvement in the engine's performance. I kept track of the search depth throughout this game and found that the engine average a depth of 7, compared to depth 3 I was accustomed to seeing in earlier iterations. I believe there are two bottlenecks as of right now. First, openings. Since openings are quite theoretical, we can't expect our engine to be able to formulate opening lines off-the-cuff. What many engines do is create a database or neural net called an 'opening book' which deals with what openings to choose. While this would surely improve performance, I want to focus on the algorithmic aspects of the engine and come back to the opening book if I have time. The second bottleneck is depth. In the middlegame, the branching factor is so high that the engine average a depth as low as 5. It is imperative that an engine is able to see further to create more threats and recognize enemy threats. Aside from increasing computational power, the only other way to increase depth is to decrease the branching factor. The best engines have a branching factor of less than 2, which seems insane given that the average chess position has over 30 legal moves. The way it achieves this is by aggressive pruning strategies. As of right now, we have a branching factor of around 8. If we achieve a branching factor of 2, at our current computational power, we will be able to reach an average depth of around 20, which is an amazing improvement.