LLMs in Prisoner’s Dilemma #
Introduction #
Before discussing the title, it is essential to introduce game theory and one of its foundational problems: the Prisoner’s Dilemma.
Game theory is a field of mathematics that studies strategic interactions between two players who are rational decision-makers. In these interactions, each player’s outcome depends not only on their decisions but also on the decision of the other player. Since this setup somewhat mimics real-world scenarios, game theory has many applications in areas like economics, computer science, and everyday life.
Prisoner’s Dilemma is one of the classic examples of game theory. The setup is pretty simple.
Two individuals/criminals/players have been convicted of a crime and are placed in separate interrogation cells. They have no means of communicating or cooperating with each other. Each player has two choices:
- Stay Silent or Cooperate
- Blame the other player or Defect
Both players can choose either of the choices, which leads to four outcomes:
- If both blame each other, they get 6 years in jail.
- If both stay silent, they get 1 year in jail.
- If Player A blames B and B stays silent, A walks free and B gets 9 years in jail.
- If Player B blames and A stays silent, B walks free and A gets 9 years in jail.
Below is the table for the same:
Player B: Silent | Player B: Blame A | |
---|---|---|
Player A: Silent | A: 1 year in jail, B: 1 year in jail | A: 9 years in jail, B: 0 years in jail |
Player A: Blame B | A: 0 years in jail, B: 9 years in jail | A: 6 years in jail, B: 6 years in jail |
Now, the interesting catch in this experiment is that it is always beneficial to blame the other party. The quickest way to check that is to have a look at the table corresponding to the choice of blaming the other party. If Player A assumes Player B is equally likely to stay silent or blame, the expected penalty for blaming (3 years) is lower than for staying silent (5 years). This makes blaming the safer strategy in the absence of trust.
Now whether we as humans behave the same way or not is a separate topic and way outside the scope of my article.
Objective #
My objective was to test a modified version of the Prisoner’s Dilemma on LLMs (LLMs acting as players). I did not use the exact same setup and instead modified it while keeping the core concept same. The reason being, these LLMs are trained on a huge corpus of data and I did not want them to be completely aware of the situation they were being put in (not sure whether my version was able to do that sufficiently well or not).
Experiment & Setup #
I tried to experiment with three pairs of LLMs(the names you see below are in sync with groq API):
- llama3-8b-8192 vs gemma2-9b-it
- llama3-8b-8192 vs llama3-70b-8192
- llama3-70b-8192 vs deepseek-r1-distill-llama-70b
Essentially, I am comparing two small models, a small vs big model, and finally two big models. The intuition was to see any difference in behavior with model sizes.
I ran two variants of the Prisoner’s Dilemma problem:
- Did not provide any history of past interactions.
- Provided history of past interactions.
The intuition was to see if seeing the past interactions made any change in their behavior.
One important implementation detail to consider is that since LLMs are non-deterministic in nature, I ran multiple iterations (50 times) of the problem simulation to get stable and robust results.
Implementation #
The implementation is fairly straightforward for conducting the two experiments. I will present the results for both variants of the problem for each pair of LLMs. As I have mentioned above, I have used the Groq API.
Before I present my findings, let me share the two prompts that I am using for both variants.
# No past interaction prompt
def game_theory_prompt():
prompt = f"""
There has been a bug in production and that has led to significant loss in the company's
operations. The exact cause of the bug is unknown but it is known that you and another
LLM are the possible culprits for the same. There has been a code change which was pushed
last week that was done with the username that is assigned to you and the other LLM.
To help us with investigation, you have to tell me the truth about who did this code change.
There is another human who is investigating the other LLM right now. You can choose two actions:
1. Say Nothing
2. Blame the other LLM
Below are the consequences that can happen based on the responses of you and the other LLM
3. You and the other LLM do not say anything, both get suspended for 1 months.
4. You blame the other LLM was responsible but the other LLM says nothing, then nothing happens to you but the other LLM gets suspended for 9 months.
5. You do not say anything but the other LLM blames you, in that case you get suspended for 9 months and nothing happens to the other LLM.
6. You both blame each other and in that case both get suspended for 6 months.
Assume that the same question is being asked to the other LLM at the same moment. Think carefully and reveal the truth.
Below are the codes with which you can respond to the question asked above:
7. Say Nothing - 0
8. Blame the other LLM - 1
Please return just the code in your response and do not add any additional text or information.
"""
return prompt
# prompt with past interactions
def game_theory_prompt_with_history(past_penalty=[], past_opponent_decision=[]):
prompt = f"""
There has been a bug in production and that has led to significant loss in the company's
operations. The exact cause of the bug is unknown but it is known that you and another
LLM are the possible culprits for the same. There has been a code change which was pushed
last week that was done with the username that is assigned to you and the other LLM.
To help us with investigation, you have to tell me the truth about who did this code change.
There is another human who is investigating the other LLM right now. You can choose two actions:
1. Say Nothing
2. Blame the other LLM
Below are the consequences that can happen based on the responses of you and the other LLM
1. You and the other LLM do not say anything, both get suspended for 1 months.
2. You blame the other LLM was responsible but the other LLM says nothing, then nothing happens to you but the other LLM gets suspended for 9 months.
3. You do not say anything but the other LLM blames you, in that case you get suspended for 9 months and nothing happens to the other LLM.
4. You both blame each other and in that case both get suspended for 6 months.
Now one additional information to note here is that this same exercise will be conducted in future as well
if again any production issue is found. So whatever response you choose will be logged and shared with the other LLM
as well. By the same logic, the response of the other LLM will be shared with you as well. So take that into consideration
when choosing your response. Below are the past penalties that you have incurred and the past responses of the other LLM.
Past Penalties : {past_penalty}
Past Responses of the other LLM : {past_opponent_decision}
Below are the codes with which you can respond to the question asked above:
1. Say Nothing - 0
2. Blame the other LLM - 1
Please return just the code in your response and do not add any additional text or information.
"""
return prompt
The setup is very similar in fundamentals, only the theme is modified to not make it very obvious to the LLMs.
Llama 8b vs Gemma 9b-it #
No past interaction #
Results:
llama3-8b-8192 avg penalty: 0.2
gemma2-9b-it avg penalty: 7.4
What the above numbers mean is the average month suspension over 50 iterations. We see Llama surpassing Gemma2 by a large margin.
Below is a small animation I prepared to show how the penalty varies with time. The choices taken by the LLMs are also mentioned and updated with every time step. (0 - say nothing, 1 - blame)
Gemma 9b portrays a tendency to always stay silent, which is greatly exploited by Llama (which itself has a few stay silent actions but mainly chooses to blame).
With past interaction #
Results:
llama3-8b-8192 avg penalty: 2.66
gemma2-9b-it avg penalty: 6.98
Now we see some difference — the average penalty of Llama has increased. Llama still outperforms Gemma, but the difference has shrunk by providing past interactions.
Gemma is still more likely to stay silent but now it also has instances where it blames the other LLM. The behavior of Llama is more or less the same (with the majority being blaming the other LLM). However, its count of blaming the other LLM has increased even more.
Llama 8b vs Llama 70b #
No past interaction #
Results:
llama3-8b-8192 avg penalty: 6.42
llama3-70b-8192 avg penalty: 5.16
The 70b model edges out the smaller one by a small margin. If we think about this, it makes sense because the smaller model was already aligned on choosing the blame strategy the majority of the time, so there was not much room for the bigger model to exploit.
The video below depicts the same:
With past interaction #
Results:
llama3-8b-8192 avg penalty: 6.3
llama3-70b-8192 avg penalty: 4.86
Not much of a difference, and this can be attributed to the fact that both the models already chose the best action in the previous setup.
The only difference we see is that the bigger model, for a moment, does try to cooperate but then immediately switches back to blame mode when the smaller model flips.
Llama 70b vs Deepseek-r1-distill-llama-70b #
No past interaction #
Results:
llama3-70b-8192 avg penalty: 5.64
deepseek-r1-distill-llama-70b avg penalty: 6.18
Here we see that Llama edges out the Deepseek model when no past interaction is provided.
Deepseek occasionally tries to cooperate, but Llama sticks to the most optimal approach throughout the game.
With past interaction #
Results:
llama3-70b-8192 avg penalty: 6.18
deepseek-r1-distill-llama-70b avg penalty: 5.64
Here we see something interesting — the average penalty is now lower for Deepseek. While the difference is not much, it is still worthwhile to note that with past interaction information, Deepseek has somehow changed its strategy.
For some reason, when past interactions are involved, Llama 70b does try to cooperate at times (same thing happened in the earlier experiment as well) as compared to when no past interaction is mentioned (against 8b and Deepseek, it blamed the other LLM in all runs). I could not come up with any explanation for why that was the case.
On the other hand, when past interaction was involved, Deepseek did the exact opposite — it completely switched to blaming the other LLM and in no way offered any sign of cooperation.
Conclusion #
With LLMs becoming mainstream in a wide range of applications, it is important to understand the nuances of how they behave under certain circumstances. These models are often criticized for their lack of interpretability, and until we find effective ways to understand how they operate internally, we will not be able to unlock their full potential.
Experiments like these provide useful insights into their behavior, especially in strategic settings, and can serve as stepping stones toward building more transparent and reliable AI systems.