Auroratide

Coding + Storytelling

Can AI invent a puzzle?

Artificial Intelligence

Topics
  • ai
  • creativity
  • experiment
  • claude code
  • gemini
  • puzzles

This is Part 1(?) of a weekly series where I'll be probing Artificial Creativty. Weekly is pretty ambitious of me, but I think I can do it if I'm willing to think of my posts as documenting a journey rather than a destination.

This week, I randomly asked Gemini and Claude to invent an entirely new category of puzzle, and that sent me down a rabbit hole of experimentations where I realized I might be able to probe AI's propensity for creativity. I studied creativity in college, so besides the head knowledge that I have on the subject, I've also got some pent-up curiosity. Is AI currently creative? In what ways is it different from human creativity? This is what I'll be investigating in the coming weeks.

After playing a bit, my personal takeaway was this:

Creativity is a process, not a revelation.

Let's look at Gemini's bad puzzle, Claude's much better puzzle, and how this demonstrates the stark difference in quality when just a tiny bit of process is added. Or... skip directly to the good puzzle (:

So, Gemini invented something. Whether it is truly novel is hard to validate practically, but I bet it's more novel than not. Why? Because the puzzle wasn't fun (:

But we'll discuss "funness" later. For now, behold Fulcrum Drift:

  • You are given a balancing beam with a certain number of slots.
  • You must place into the slots some letters (twice each), and a Fulcrum Δ.
  • For each letter, you are given a number called "Drift".
  • Drift roughly represents how out-of-balance the letter is on the beam. If each position on the beam is labeled starting at 1, then mathematically Drift equals , where the P's are the positions of the letters, and F the position of the fulcrum.
  • Using the Drift values, you must ascertain where on the beam the letters are, along with the Fulcrum.

Here's an example. You have a size-7 beam, and must place two of each of the letters A, B, and C, as well as a fulcrum, onto the beam.

[ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ]
  1     2     3     4     5     6     7

The drift values are as follows:

  1. The Drift of A is 1.
  2. The Drift of B is 5.
  3. The Drift of C is 1.

Find where the letters A, A, B, B, C, C, and the Fulcrum Δ go.

This is my best way to describe how to solve the puzzle without devolving into a series of equations. Basically, think of Drift values as "puzzle pieces", then try to fit them together onto the beam.


[ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ]
  1     2     3     4     5     6     7
		

Drift of A = 1 • Drift of B = 5 • Drift of C = 1

Let's start by considering what the possible shapes of numbers are for particular Drift values. Let's represent that with N being the letter and Δ being the fulcrum. A * represents any number of blanks on each side of the Fulcrum (but they must be the same number of blanks).

Drift 1: N * Δ * _ N

Drift 5: Δ * N _ _ N OR N * Δ * _ _ _ _ _ N

[ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ] [ _ ]
  1     2     3     4     5     6     7
		

Drift of A = 1 • Drift of B = 5 • Drift of C = 1

It is not possible to fit the second configuration of Drift 5, since there are not enough slots.
Therefore, we need to fit two of N * Δ * _ N and one of Δ * N _ _ N onto the beam.

[ _ ] [ _ ] [ Δ ] [ _ ] [ _ ] [ _ ] [ _ ]
  1     2     3     4     5     6     7
		

Drift of A = 1 • Drift of B = 5 • Drift of C = 1

This allows us to deduce that the Fulcrum is at slot 3. If it was at slot 4, then we couldn't fit the pattern for drift 5. If it was at slot 1 or 2, then we couldn't ever fit both patterns of drift 1.

[ A ] [ C ] [ Δ ] [ B ] [ C ] [ A ] [ B ]
  1     2     3     4     5     6     7
		

Drift of A = 1 • Drift of B = 5 • Drift of C = 1

The Drift 5 pattern is forced. And the other two letters just slot in.

Technically this is an opinion, but I do believe there are certain qualities that make puzzles in general "good" and others "bad".

  • Solution Uniqueness: A good logic puzzle should have exactly one unique, unambiguous solution. Every single Fulcrum Drift puzzle fails this. Simply mirror the solution and you get a second solution. And when two Drift values are the same, then those letters are effectively interchangeable in the solution.
  • Logic Diversity: Solving the puzzle should involve multiple kinds of deductions. Deploying the right kind of deduction at the right moment is what makes a puzzle fun. Fulcrum Drift is just a math equation. Drift values tell you how far apart numbers are, then you just sorta slot them onto the beam until everything fits.
  • Incremental Progress: Good puzzles are solved a piece at a time, sort of like discovering bits of the solution until you have the whole picture. When you need the whole picture all at once, then it feels less like "solving" and more like "finding" or "stumbling upon". In Fulcrum Drift, it's hard to know definitively whether a pair of numbers is correctly placed without also trying all the other numbers. In other words, it's glorified guess and check.

So why did AI produce such a bad puzzle? Is it because Gemini was not creative enough? Perhaps too creative? Or was it something else?

Fulcrum Drift is what AI is able to accomplish with a one-shot prompt. That is, I asked it to create a puzzle, and then it gave me a puzzle. Maybe it did a little bit of thinking in the background, but ultimately it got one single try, one single idea.

It would be like me telling you to do the same thing in 5 minutes. Exceedingly few people in the world could do that task well. It's therefore no surprise that Gemini's puzzle is lackluster.

Creativity is a process, not a revelation.

In order to do better than "lackluster" though, I decided to try the same thing, but with a bit more process. This is where I turned to Claude for its agentic harness.

Claude Code is a lot more powerful than the free browser Gemini AI. It engages in thinking loops, writes and executes code, and spins up subagents.

Given that power, Claude came up with what it calls Luminary.

  • You are given a square grid.
  • Your goal is to find the locations and values of Stars on the grid.
  • Some numbers are placed in the grid as clues. These numbers represent Readings. The value of a Reading is the sum of the values of all the Stars it sees in its row and column.
  • Clues outside the grid indicate how many stars can be found in that row or column.
  • Two stars cannot be placed adjacent to one another, either orthogonally or diagonally.
  • Each star has a unique value that goes from 1 to N, where N is the number of stars in the puzzle.

For example, here is a completed puzzle.

4x4 grid. 1-Star, blank, blank, blank. 4, 5, blank, 3-Star. 3, 2-Star, blank, blank. Blank, blank, 0, blank.
See how the 5-Reading is made by summing the 2-Star and 3-Star.
  • The "5" reading in row 2, column 2 sees a value-2 star and a value-3 star, which is why that cell's value is 5.
  • Row 4 and Column 3 each have a clue of 0, meaning no stars in their respective row and column.
  • Stars are not adjacent to each other; they are sufficiently far apart.

Here's an example puzzle, constructed by the AI as a soft introduction. You can interact with it and fill it out if you want to try solving it. If you deduce a square is a star, you can click it and put in its value. If you deduce a square is NOT a star, you can type an "x" instead to help you visualize.

The example puzzle is pretty easy, but demonstrates the core of the puzzle decently enough. Follow the slides below to see what logic is used to solve it.

Row 1 Column 2 marked as a star. Row 5 Column 3 marked as a star.
Let's start by finding where the stars are. In row 1 and column 3, there is only one possible location for each star respectively, due to the 0-clues.
Row 3 Column 4 marked as a star.
The star in row 5 eliminates the possibility of a star in row 5 column 4, so we now can deduce the final star's location as row 3 column 4.
The row 1 star is annotated with '1,2'. The row 5 star is annotated the same way.
When a 3-reading sees two stars, then one star must have a value of 1, and the other a value of 2. That's the only way to add to 3. We don't know which is which yet, but we will mark them.
The '2' is crossed out of the row 1 star slot.
Look at the 4-reading. It sees two stars. If one of the stars it sees is a 2, then the other must be a 2. But that is not possible! There can only be a single 2-star in the puzzle. Therefore, we can identify the row 1 star as the one whose value is 1.
Star values deduced. 1 is in row 1, 3 is in row 3, and 2 is in row 5.
That means the row 5 star has value 2. And the final star must therefore be the 3-star. Puzzle solved!

So, Luminary actually has some interesting logic, and to showcase that, here's a puzzle I made myself.

I composed this puzzle myself with the intent of making a tiny puzzle as hard as I could, but while having a relatively clean why to arrive at the solution. See, it's not enough to make a puzzle "hard". The logic must also be discoverable and satisfying. That's what makes a puzzle good.

And yes, I made it significantly harder by obscuring how many stars each row/column contains. Good luck C:<

Making slideshows is a lot of work. Instead, here's a series of hints (they're meant to be used in order, and it's possible if you found a different line of logic that the hints no longer apply). Let me know in the comments at the end of the post whether you solved it!

Hint 1

Look at the two 6-Readings. Consider whether it is ever possible for Row 4 Column 1 and Row 5 Column 3 to be stars.

Hint 2

You can deduce where the star in Row 2 goes, even if you do not know its value. Consider the 4-Reading and what you learned in Hint 1.

Hint 3

You can now deduce where the star in Row 4 is. For what it's worth, there must be at least one star in row 4 since it's otherwise impossible to get to a 6-Reading without overflowing the 4-Reading. But now we know it's exactly one star, because we know where the star in Column 1 is, thanks to what we learned in Hints 1 and 2.

Hint 4

Turns out we can now deduce the value of the star in row 2. This is done by asking whether a star exists in Column 3 or not, because if it does, then the star in Row 2 cannot be a 4.

Hint 5

Once you have that the star in Row 2 Column 5 is a 3-Star, then the rest of the star values can be determined by doing sums.

Unlike Gemini, I gave Claude Code two different feedback mechanisms.

  1. Validation: Each time Claude has an idea, it was told to search for similar puzzles online to identify novelty, and to validate puzzles for solution uniqueness and logical deductions by writing code.
  2. Revisions: Once Claude had a promising puzzle, it was told to brainstorm improvements and implement combinations of them as code, electing one combination as the winner.

Claude tried four different ideas before landing on Luminary as a concept, and then investigated 20 different variations to come up with what we have. To be frank, its first version of Luminary was an inelegant number slop mess.

A grid with stars and numbers. The only blank squares are the ones marked as stars.
Stars locations were given, and the goal was to figure out what values the stars had.

But, because one of the improvement ideas was "players should deduce star locations", that led it to the puzzle I'm showing here, a puzzle that is interesting even if imperfect.

The lesson here is this:

Creativity is a process, not a revelation.

Don't try to one-shot solutions. No hard problem worth its salt can be solved in 5 minutes, whether using an organic brain or a silicon one.

Let's consider the creativity of this kind of puzzle on two dimensions: novelty and purpose.

  • Novelty: How rare are puzzles like Luminary?
  • Purpose: How well does Luminary serve as a puzzle?

On the one hand, the exact combination of rules that make up Luminary seems to be rare. On the other hand, the rules clearly borrow from several well-known puzzles:

  • Star Battle: Star locations must be deduced and cannot be adjacent.
  • Tents & Trees: Clues outside the grid tell you how many stars are in that row or column.
  • Akari: Stars light up cells in horizontal and vertical directions.
  • Kakuro: Sums are clues to deduce where numbers go.

To be fair, many classic puzzles borrow from each other already, and it takes creative effort to combine things in just the right way (see what I've written on creativity as building bridges). So this is still impressive.

For a puzzle, purpose really just comes down to whether solving a Luminary puzzle is fun. Going by our criteria for what made Fulcrum Drift not fun:

  • Solution Uniqueness: Yep, puzzles can have unique solutions.
  • Logic Diversity: Solving my puzzle required using star adjacency to rule out spaces, deducing how many stars are in a row when it isn't given, using facts about star uniqueness to determine their values, and sum combinations. I mean, unless I'm small-brained and didn't see a more obvious way to solve my own puzzle...
  • Incremental Progress: You deduce the locations of stars and their values one at a time by narrowing down possibilities. You don't need to know the whole picture to solve the puzzle.

So at a minimum, the puzzle has promise.

Overall, Luminary is a pretty creative puzzle. That said, it was only possible when AI was given a lot of direction on how to think creatively. And even then, my instructions were far from perfect because, at the time I was experimenting, I wasn't thinking about creativity explicitly.

Now that I am, I want to see what happens when I start injecting elements of the Creative Problem Solving process directly.

So next week I'll provide an update on my further experiments!

And here's a couple more puzzles the AI made. They're not as intentional, but are fun to solve nonetheless.


Comments