Computation as the Eleventh Muse
The fact that artificial neural networks can predict human perception of creativity in chess is evidence that some aspects of creativity are quantifiable
In Ancient Greek Mythology, the Muses were said to be the source of the creative spark. There was some dispute about how exactly that worked and how many Muses there were, but a consensus seems to have emerged: there are nine.
In truth, we have barely more understanding of how human cognition works than the Ancient Greeks did, though we now think cognition probably doesn't involve any daughters of Zeus at all. But we now have evidence that some aspects of human creativity can be understood using computation. In a way, computation is the eleventh Muse1.
With Kamron Zaidi, we collected an annotated dataset of chess games and trained a neural network that accurately predicts whether humans would annotate a particular move as brilliant2. As input to the neural network, we used the game trees. The reason the neural network could tell which moves would be perceived as brilliant is that brilliant moves can be characterized by the game trees they come from. I'll argue that this result probably generalizes: in creative endeavors that are constrained by a formal system, like chess, but also like formal poetry and classical music, an important part of perceiving a creative spark is how far ahead the creator had to look ahead to come up with a plan, and how precise they had to be with the planning.
Game Trees and Perception of Brilliance in Chess
A “brilliant” move in chess is not the same as a strong move. A quick checkmate is obviously strong, but it wouldn’t perceived as brilliant or creative.
The prototypical brilliant move in chess is a sacrifice. Here is one way to look at this: a good heuristic in chess is that having more pieces than your opponent is good. Making a sacrifice means that if you look two moves ahead, it will look like your opponent can capture the piece you are sacrificing, so you're worse off. Making a good — or brilliant — sacrifice means that if you look many moves ahead, it will turn out that you weren't worse off after all.
A more general way of thinking about this is game trees — a representation of the fleshing out of your opponent’s possible responses, your responses to their responses, and so on. In terms of game trees, a brilliant sacrifice means that if your opponent takes the bait and you choose the right responses, you can force a path in the game tree toward victory, eventually.
But making a sacrifice is just one way to be brilliant. In our paper, we show that you can input the game tree into a black-box neural network, and predict human perception of brilliance well — you can guess whether a move would be labelled by humans as brilliant correctly 79% of the time if half the moves in your test are brilliant.
To input a game tree into a neural network, we compute features of the tree — numbers that represent local properties of its shape (the shape of the tree varies depending on the position because we only include moves that make at least a little bit of sense in the tree).
Informally, we'd hypothesize that "long" and "thin' paths to victory in the tree correspond to moves people would perceive as brilliant. But those are informal notions and we don't compute those explicitly. Instead, we just compute features — a numerical description of the shape, and let the artificial neural network figure out if the shape suggests a perception of brilliance.
Our Experiments
We train a neural network on human-annotated games from lichess.org. People annotate brilliant games using the !! notation. Our neural network takes in the numerical description of game trees arising from every move, and tries to predict whether the lichess.org user annotated the move with "!!"
Another thing we noticed: we actually plug the tree into two chess engines that build trees: one engine is rated as the stronger one and one as the weaker one. The neural network, among other things, will tend to label moves as brilliant if the stronger engine rates the move as good and a weaker engine rates the move as weaker.
Chess Brilliance is Not Entirely Ineffable
Aesthetic perception of move brilliance in chess is not ineffable — at least not usually — you can compute the game tree, look at its shape, and have a pretty good idea whether the move would be perceived as brilliant.
From Brilliant Chess Plays to Brilliant Poetry
Fixed verse is a little bit like chess: the game has rules. and one thing that can make the poem be perceived as brilliant is that the author followed the rules while also conveying something to the reader.
Just like in chess, you might imagine wondering how a brilliant author came up with the first word in just such a way that a word two stanzas from there fell right into place, and thinking back to how that "plan" could go wrong if a slightly "imprecise" word were chosen along the way.
We conjecture that gaming out the possible word choice word-by-word as the author writes a poem to detect the perception of brilliance in poetry would work out similarly to chess.
In a similar way, it's hard to get away from the impression that J.S. Bach saw far ahead when selecting the first note in a piece (classical music also has "rules"). Even the scripts of the classic 90s sitcom Seinfeld had a rule that the plot lines must converge at the end of the 23-minute episode.
Georges Perec famously took this to the extreme in Life: A User's Manual: the novel, meticulously written over 10 years, is written so as to obey an extremely intricate structure. Among the many constraints Perec chose to satisfy is that the rooms in the Parisian building described in the novel are mention using a knight's tour of the apartment block.
Of course, Georges Perec is an extreme example. Nevertheless, whether you write poetry, prose, or sitcom scripts you are constrained by at least aiming for "the best words in the best order."
Is Computation All There is To It?
We haven't shown that. First of all, just because you can predict the perception of something with math, doesn't mean that the math is what's underlying the process. We also haven't accounted for the 21% of moves we don't predict well.
But if you're a physicalist (someone who believes we are bodies that are governed ultimately by the laws of physics), indulge me and consider a thought experiment. You sit in a room with the door closed, and I send you good and bad poems. You reply with "brilliant", "OK", or "terrible". I collect the data. If, after a while, my computer can figure out which poems you'd see as "brilliant", it might be argued that it captured something — perhaps not everything — about your notion of brilliance.
(It might be that my system figured out that you look at the author's name and just say that Shakespeare is the brilliant one. But that argument is harder to make in our experiments with chess — we only send the system the moves, not anything else, and, for chess, that's enough for our system.)
And certainly my argument is harder to make for painting, where there are fewer to no formal rules of the “game.”
Can LLMs be Creative?
Traditional chess engines rely on building and traversing game trees. Large Language Models (LLMs) like ChatGPT do not. It isn't exactly the case that ChatGPT builds its output word-by-word without looking ahead at all — it does look ahead a little bit — but it doesn't look very far ahead, and it cannot account for the exponential branching of possibilities you see in a chess game tree.
But it's a challenge to the view here that when a system built like GPT plays chess by just building up a response one move at a time, it plays really well, and when a similar Go system plays Go, it's amazing3. Recent research from Berkeley indicates that internally (as opposed to when generating step-by-step using beam search, where it’s obvious there’s search going on), GPT does at least modest lookahead by two or possibly more moves as well.
At the same time, I think that when people sometimes look at ChatGPT prose and perceive a kind of lack of creativity, what they are seeing is that mathematically, it is much less the case that ChatGPT could look ahead before it started talking. The Eleventh Muse turned away from it.
Can the lack of creativity in GPT prose be fixed with current architectures and approaches? I'm not sure. Conceptually, that is possible, but the cost might be computationally prohibitive and a new kind of model might need to be developed. It’s also possible (many would argue likely) that there are fundamental difference between human cognition and LLMs that mean that current technology can at best approximate, but never achieve creativity.
I wouldn’t go even so far as saying that AI understands creativity, let alone that it can be creative. But at the same time, the fact that human aesthetic perceptions can be predicted using computational methods means that we can say that computation — the eleventh muse — is part of what makes for creativity.
Kamron Zaidi and Michael Guerzhoy, “Predicting User Perception of Move Brilliance in Chess”, in Proc. International Conference on Computational Creativity (ICCC) 2024, Jönköping, Sweden. Paper: https://arxiv.org/abs/2406.11895
AlphaZero plays really well without the MCTS component
It seems that chess moves arising from precise long-term planning -- detected by looking at the game tree -- only seem to account for a subset of chess moves perceived to be brilliant.
I find that certain "brilliant" chess moves are actually mathematically mediocre, but becomes brilliant when the metagame (e.g. psychology, timing, time control, player history) is taken account of. These aren't necessarily reflected in the game tree.
In general, there seems to be different levels of perceivability of creativity. In other words, some creative actions are easier to detect than others.
In poker, 99% of the players will call a hand "brilliant" if a lot of money is involved. In other words, for them, the brilliance of a poker hand is simply a function of the pot amount. Obviously, better poker players will have a much more nuanced understanding of brilliance in poker actions, but, again, this is very hard to calculate.
Similarly, a "brilliance" of a comedy show is probably just a number of lines after which an average viewer laughs. "Brilliance" is much harder to quantify for shows like documentaries, cable news, etc.
Personally, I think creativity is much simpler than precise long-term planning. It just comes down to detecting deviation from nature and seeing whether this deviation aligns with the aesthetic value system of the judge.