Python代写:ECE326OptimalStrategyCalculator


代写游戏 Easy Blackjack
的博弈规则。
![Easy
Blackjack](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Blackjack_game_1.JPG/250px-
Blackjack_game_1.JPG)

Background

This assignment is the kind you can expect if you work in an actuary, e.g.
calculating expected return and assessing risk. Since Easy Blackjack is a game
of chance, short term result may fluctuate and not be representative of the
expected value of playing the game long term. There are two main methods for
measuring expected value of a game: 1) by repeating a random experiment many
times until the arithmetic mean of the values in the experiments converges to
the expected value, and 2) by calculating it mathematically.
If you are unfamiliar with the term probability and expected value, you should
review these two concepts before continuing. We suggest two YouTube videos to
help you get started quickly: Probability and Expected Value.
In assignment 1, you developed a program that can estimate expected value of
the game (i.e. player advantage) through repeated trials. In this assignment,
you will come up with the optimal strategy table and calculate the player
advantage associated with it. Fortunately, you can double check and/or debug
your calculation by using your assignment 1 solution and compare the result
with the theoretical player advantage that you have calculated.
Unlike in a real casino, we will be using an infinite shoe, meaning that the
probability of receiving any card on the next draw remains constant and
independent (e.g. probability of receiving an ace is always 1/13). This
property simplifies our calculation. In the real world, removing a card would
reduce the probability that a card of the same value can be drawn. For
example, receiving an ace would mean that there is one fewer ace left in the
shoe.

Setup

Add the starter code for this assignment, available in optimal.tar, to your
repository. You may add additional files to help you organize your code. When
we mark your code, we will run python3 main.py …
cd ~/ece326
tar -xf /cad2/ece326f/src/optimal.tar
git status # should say that “asst2/“ directory is untracked
git add asst2 git commit -m “Initial code for Assignment 2”
git tag Asst2-start
git push
git push –tags cd asst2
Here is a rundown of the code provided to you:

  • main.py: contains the main function and functions to print the result that you return. You should not change this file, but do try to understand what it’s doing. Note that it imports easybj.py as a module. Our tester will also import your version of easybj.py as a module to mark your solution.
  • table.py: you will change this file to implement a two-dimensional, spreadsheet-like table, as part of the first milestone.
  • easybj.py: The meat of the assignment. It contains some basic definitions to help you start the assignment. You will have to add substantial code to finish the assignment, but as mentioned, you can do it in steps. You may also add more files and have this file import them, if it helps you organize your work better.

Approach

If you have not done so already, you should go back to assignment 1 and make
sure you still have a solid understanding of the rules of Easy Blackjack. In
this assignment, the same rules of the game still applies.
At a high level, we will use divide and conquer to break down the problem into
smaller subproblems. For every possible action in Easy Blackjack, there is an
associated expected value (EV). For example, the EV for surrendering is always
-0.5, because you lose half of your bet when choosing the action. The optimal
action, in each situation, is thus the action that yields the highest EV. For
example, you have hard 19 versus dealer hard 20, the expected value of each
action is as follows:

Action EV
Hit -0.768
Double -1.538
Stand -1.000
Surrender -0.500
Don’t worry about where these numbers come from for now. In the table above,
we can see that surrender is the best option because you lose the least amount
of money. Stand, for example, always results in losing the entire bet since
the dealer will not hit hard 20 and your hard 19 loses to hard 20.
Now, it may become clear what the subproblems are: finding the EV of each
action for every combination of player’s hand and dealer’s hand. For the
purpose of this assignment, we call each specific combination of player/dealer
hands an encounter. Once you calculated all of them, then you can determine
the optimal action for every step of the game. We call the result of this set
of calculations an “EV Table”.
Fundamentally, there are two types of actions: terminal versus non-terminal.
Terminal actions are ones where you may no longer take any more actions, i.e.,
surrender, stand, double, or split aces. Non-terminal actions in Blackjack are
hit and split non-aces hands.
To calculate the EV tables for terminal actions, you simply need to take the
action, let the dealer take his/her actions (if necessary, e.g. for surrender,
dealer wins automatically), then you compare hands. For non-terminal actions,
you still need to first take the action (e.g., hit). However, since the action
is non-terminal, then what happens after the initial action if there are
further actions (i.e., you did not bust nor did you reach 21, at which you
stand automatically)?
The answer is that you should take the optimal action, meaning you are picking
the best available options after the initial action based on the expected
value of each action. This is logically sound because our objective is to
maximize expected value, and thus every action we take should always return
the highest expected value out of all available actions. If you do not know
which action is optimal, then you will need to solve the subproblems
(calculating the EVs of all actions) before you can solve this problem
(choosing the optimal action).

Stand EV

The easiest EV table (besides surrender) to calculate is for standing. Since
the player won’t take any more cards, the points for his/her hand is fixed.
However, the dealer may take more cards. Therefore, the EV of standing is the
weighted average of all outcomes after the dealer is done. For example, if you
stand on hard 18 against the dealer’s hard 16, the following outcomes are
possible:

Dealer Outcome Probability EV
17 +1 1/13 1/13
18 0 1/13 0
19,20,21 -1 3/13 -3/13
bust +1 8/13 8/13
Here, observe that the dealer can only hit one more card before standing
(since his/her points will go up by at least 1, i.e. receiving an ace). The EV
of each outcome is the outcome i pi, where the sum of pi is equal to 1.
Finally, the EV standing on hard 18 against dealer’s hard 16 is the sum of the
EV of all possible outcomes, which is 6/13 or 0.462.
Of course, if the dealer has a low starting hand, then it is possible that the
dealer will hit multiple times. In this case, you will need recursion and
dynamic programming to help you solve the problem more elegantly. Since you
should already be familiar with recursion, we will explain dynamic programming
and how it works.

Dynamic Programming

Dynamic programming is an optimization technique where the result of a
function is cached (saved) so that the cached result is returned on future
invocation. This technique will greatly speed up the calculation by avoiding
redundant and possibly expensive recomputation. There are some restrictions as
to when dynamic programming can be used (i.e. the function must be “pure” –
see lecture note for more thorough explanation). For our purposes, you only
need to know that the function may not depend on any mutable non-local
variables (constant ones are fine).
Continuing from previous example, suppose this time the dealer has hard 15
(the player’s hand is not relevant), and we want to calculate the probability
distribution F(hard 15) for all possible points that the dealer ends with. We
would arrive at the following table:

Next Card Dealer Probability
A 16 1/13
2-6 17-21 5/13
7-K bust 7/13
There is a 1/13 chance that the dealer gets an ace, where he/she would have to
keep hitting. However, from previous example, we already calculated F(hard
16)! Therefore, if we have saved the result for F(hard 16), we can use it
immediately and merge the two tables together, where F(hard 15) = 1/13*F(hard
16) + F(hard 15 not ace) for the following final table.
Dealer Probability
17 1/13 + 1/13 * 1/13
21 1/13 + 1/13 * 1/13
bust 7/13 + 1/13 * 8/13
A general suggestion for this assignment is make use of the assert function to
make sure your assumptions hold. Your program will crash if the assertion
fails, which allows you to debug the problem immediately. For example, the sum
of the probabilities in all dealer probability tables should be equal to 1.
Assuming you are using a Python dictionary to store the table, you can check
this using the following snippet of code:
#  
# use math.isclose() for Python 3.5 or later  
# use this for earlier versions of Python  
#  
def isclose(a, b=1., rel_tol=1e-09, abs_tol=0.0):  
  return abs(a-b) [= max(rel\_tol * max(abs(a), abs(b)), abs\_tol)  
def make_dealer_table(table):  
  # ...  
  # make sure the probabilities add up to 1  
  assert(isclose(sum(table.values()), 1.0))  

—|—
Lastly, we would save the result for F(hard 15) so that future calculation
that uses it does not require recomputation. There are many opportunities to
take advantage of dynamic programming in this assignment, and you will see as
you continue reading.
Question: How many dealer probability tables do you need to complete the stand
EV table?

Hit EV

Correctly calculating the Hit EV table requires that you first complete the
Stand EV table. The reason is that the EV for hitting depends on the optimal
action between hitting and standing, i.e. it is a non-terminal action. Let’s
suppose you hit on hard 17 against dealer’s hard 18. Your possible outcomes
are as follows:

Player Probability Outcome EV
18-20 3/13 ? ?
21 1/13 +1 1/13
bust 9/13 -1 -9/13
You may be asking, why is the outcome of achieving 18 to 20 points unknown? It
is because the algorithm is unsure whether it is better to hit or stand after
the initial hit, for example, on hard 20 against dealer’s hard 18. On the
surface, it seems obvious that you would stand in this case, but there are
many other situations where it may be a close call. Therefore, your algorithm
must cast aside all doubt and verify that it is indeed better to stand than to
hit, or vice versa.
There are always two base cases in the recursion where you would stop hitting:
arriving at 21 points or busting. In the above example, hitting on hard 20
results in an EV of 1/13 - 12/13 (only an ace will save you) = -11/13. On the
other hand, standing on hard 20 guarantees a win because the dealer does not
hit hard 18, so it’s EV is +1.000. In this case, it is now clear that you
should pick stand and not hit.
Both memoisation and dynamic programming can be effective for calculating hit
EV. Using memoisation, you would make the initial hit, and for all subsequent
encounters, recursively calculate the hit EVs if they are not in the hit EV
table, or just use the value straight from the table and compare with the
stand EV, which you have already calculated. For dynamic programming, you
would iterate through all hands, starting from the base case (hard 20), and
ending at hard 4.

Double EV

Since doubling down is a terminal action, it is very similar to stand, except
you would hit before you stand, making it possible that you bust in the
process. Otherwise, the algorithm is similar, except the expected value is
doubled since the player is risking twice the amount of money to take this
action.

Split EV

Because split is a non-terminal action, except in the case of splitting aces,
again, you will need to make the optimal decision after the initial split
occurs. However, because of the hand limit of up to 4, using recursion and
memoisation may not be intuitive. Therefore, we suggest that you use dynamic
programming, where you start from the smallest subproblem and work our way up.
Assuming you have already completed all other tables, then split EV’s base
subproblem is deciding the optimal action after you cannot split anymore,
either because the hand cannot be split or the hand limit has been reached. At
this point, the valid actions remaining are hit, double, or stand. Therefore,
using the existing tables, you can calculate a base table for splitting. You
should use STAND_CODE for the ylabels of this table since one of the actions
includes standing and none of the actions involves splitting (you can’t split
anymore!).
One complication of splitting is dealing with the many ways where the hand
limit can be reached. Fortunately, it is actually not an issue you need to
worry about, because unlike in assignment 1, we are exploring all possible
outcomes of having N hands. As such, how we arrived at N hands has no impact
on the expected value having N hands.
As an example, suppose you have 99 and 99. Assuming the hand limit is only 3,
you may only split one more time. Now, does it matter which of the 99 you
split? No. Because the expected value of splitting the first 99 and the second
99 is exactly the same. With this insight, then the problem of splitting up to
4 times is a lot more tractable. While you are going through the set of hands
that can result from a split, always split a hand as long as the hand limit
has not been reached. So in the case where the initial XX is split into XX1
and XX2, you should assume that both of them will be split, meaning that no
subsequent split after the two resplits are possible.
The only situation that you have to worry about is when you split into two
hands, both of which can be split again, but there’s only room for one more.
For example, 880 splits into 8T and 881, and 881 is split into 882 and 883. In
this case, one of the 88 cannot be split again and must be treated as base
case.
Since there are multiple levels after the initial split to consider, you will
actually need a total of 4 tables for the split action. The base case table
(discussed earlier), split1, split2, and split3, where the subscript of each
split table is the number of hands remaining that can be split. The split3
table represents the initial split from one hand, and is the one that the
tester expects. split3 will include the EV for splitting AA, whereas split2
and split1 should not, since you are not allowed to resplit AA.

Player Advantage

Once you have calculated the EV of all possible actions for all combinations
of hands, you would be able to generate a compact table of the optimal actions
and their associated EV. To calculate the player advantage, you need the
weighted average of the optimal EV of all possible starting hand combinations
(sum of pi pj EVij, where pi is the probability of the player’s starting hand
i, and pj is the probability of the dealer’s starting hand j).

Solution Requirements

To help you with this assignment, we have broken it up into multiple
milestones. The program will take zero or more arguments, with the argument
being the name of the table that you wish to print. For example:
python3 main.py stand optimal
This will print both the stand EV table and the optimal EV table. Partial
marks will be given for completing some of the milestones. The list of all
accepted arguments are as follows:

  • initial: print the initial probability table (see subsection for more detail)
  • dealer: print the dealer probability table (see subsection for more detail)
  • stand: print the stand EV table
  • hit: print the hit EV table
  • double: print the double EV table
  • split: print the split EV table
  • optimal: print the optimal EV table
  • strategy: print the strategy table
  • advantage: print the theoretical player advantage
    If no arguments are given, the program will print everything. Note that all of
    the argument parsing has been done for you. You just need to produce the
    correct results for each of the milestone.

Table class

The first milestone of the assignment is to complete the implementation for
table.py. In the file is an incomplete implementation of an ordered,
homogeneous, two-dimensional table, similar to an spreadsheet. The table takes
a 2-tuple (a tuple with two elements) and either raises an error or returns
the value in the corresponding cell. A cell is to a spreadsheet (2D) what an
element is to a list (1D). The 2-tuple is in the form (y, x), where y is the
row name and x is the column name. For example, table[“a”,”m”] will return the
cell value for row a column m. Let’s now describe the interface of the Table
class.
Table.init(self, celltype, xlabels, ylabels, unit=””)
—|—

  • celltype: the data type of each cell, e.g., str, int, or float. It is an error to set a cell with a value that is not specified cell type.
  • xlabels: a sequence of all x-axis labels (i.e. column names) accepted by the table.
  • ylabels: a sequence of all y-axis labels (i.e. row names) accepted by the table.
  • unit: this argument is used to print the table. You do not need to do anything about it.
    For example, the table Table(str, “abcd”, range(1,5)) will create the
    following table:
    a b c d
    1 – – – –
    2 – – – –
    3 – – – –
    4 – – – –
    Notice that the cell values are “empty”, that is because their values by
    default is set to None unless you set it. For example, table[3,’b’] = “HI”
    will update the table:
    a b c d
    1 – – – –
    2 – – – –
    3 – HI – –
    4 – – – –
    You must create a suitable data structure to store data for the table, and
    implement all operations supportd by the index operator: get, set, and delete.
  • getitem(self, idx): return the cell value given a valid 2-tuple, or None if the cell is empty.
  • setitem(self, idx, val): set the cell value to val given a valid 2-tuple. Note that type checking on val is already done for you.
  • delitem(self, idx): set the cell value to None given a valid 2-tuple.

Hand class

We provide you the skeleton code for the hand class in easybj.py, which
currently stores the cards that are in the hand and specifies whether the hand
belongs to player or dealer. With this, it is enough for you to complete an
important method, Hand.code, which will be used throughout this assignment to
fill the table.
For the purpose of this assignment, the code of a hand represents its strength
and potential. It comes in three forms: hard hands, soft hands, and split
hands. For hard hands, the code is the score of the hand. We define the score
of a hand to be equal to the point value of the hand, except that the score is
zero if the hand is busted. This simplifies comparing hand strength using
integer comparison. As an example, the code ‘15’ represents hard 15. For soft
hands, it will start with an ace ‘A’. For example, ‘A6’ denotes soft 17, and
‘A9’ denotes soft 20. Note that the actual hand can have more cards than what
the code depicts. For example, A45 is also soft 20. If the point value of the
hand is 21, then if it is also blackjack, the hand code is ‘BJ’, otherwise it
is ‘21’, regardless of whether it is hard or soft. Lastly, split hands are
always two of the same number or letter, for example, ‘AA’, ‘55’, or ‘TT’.
Remember that ‘AA’ is both a soft hand and a split hand.
Hand code can vary based on whether split is possible and who owns the hand.
In the first case, if split is not possible, then the hand code is the
correponding hard or soft hand code. For example, ‘22’ becomes ‘4’, and ‘99’
becomes ‘18’. However, ‘AA’ remains ‘AA’ since it has dual meanings. In the
second case, if the hand belongs to a dealer, then there are two adjustments.

  1. the dealer never splits, so he/she does not use any split codes. 2) the
    dealer treats soft 18 or above as hard 18, so ‘A7’ or higher becomes ‘18’ and
    higher, respectively.
    Your next milestone is to complete Hand.code according to the above
    specificaiton. Once this is done, it will automatically finish the
    implementation for calculating the initial probability table. This table shows
    the probability of receiving a specific hand against a specific dealer hand.
    You can see it via the following command:
    python3 main.py initial
    At this point, you may have already noticed how these codes are used by the
    table. The x-axis is a list of all possible starting hands for the dealer, and
    the y-axis is a list of all possible hands for the player for a particular
    situation. At the top of easybj.py are a list of codes. For example,
    INITIAL_CODE is a list of player’s hand code for the initial deal. We will
    explain what other lists are used for in the next few subsections.

Dealer Table

The dealer probabilities table is a collection of subtables that contain the
exact probability of each dealer outcome for a particular dealer starting
hand. For example, if the dealer starts with hard 17, then there would only be
one entry in the subtable: 17: 1.000, since the dealer does not hit hard 17 or
higher. You will need this table to be able to complete the stand EV table (it
is also possible to complete both tables together using memoisation technique
instead of dynamic programming).
The format of the dealer table is different from other EV tables: it is a
dictionary of dictionaries. At the top level, the key is the dealer’s starting
hand code and the value is the associated subtable (also a dictionary). At the
bottom level, the dictionary stores each outcome in score (using zero for
busted hands) and the value of each outcome is its probability. Use the
methodology as described in the Approach section to complete this table. Below
is out of the example output for dealer hard 16:
Dealer 16
0: 0.615385 17: 0.076923 18: 0.076923 19: 0.076923 20: 0.076923 21: 0.076923

EV Tables

You will need to produce an EV table for each of the actions: stand, hit,
double, and split. Note that surrender is not necessary since all entries will
be filled with -0.5. The tables have already been initialized for you in the
Calculator class constructor. Your task is to populate the cells of each table
with the correct expected value for each player/dealer encounter. Notice that
these tables ignore the situation if either the player or dealer (or both) has
blackjack. This is done mainly to compact the table since the game ends
immediately and no actions need to be taken. You should study how the initial
probability table was populated and use a similar method to work on the other
tables.
There are two irregularities worth mentioning in these tables that may help
you gain a better understanding of the process. First, the stand table has an
extra entry, 21, for the player. While the player stands on 21 automatically,
the game continues and the player is not guaranteed to win in this situation
(e.g., if the dealer also gets 21). Second, the split table only contains
entries for split hands. This may be more obvious since all other hand types
cannot be split. You should return the initial split table, as described in
the Split EV section, and not the resplit tables.
Lastly, for the optimal table, as described in the Approach section, you will
need to compare all available actions (including surrender) for each
player/dealer encounter and choose the one with the highest expected value.

Strategy Table

This part can be done as part of calculating the optimal EV table. Instead of
entering numbers into the table, you enter the exact action that should be
taken for each encounter. For double and surrender, an alternative action (hit
or stand only) needs to be specified in case the primary is unavailable. Split
does not need an alternative action since the reader of the strategy table
should instead look up the hand as a hard total. The format of the strategy
table is the same as described in assignment 1.

Player Advantage

Calculate the theoretical player advantage using the optimal EV table and the
initial probability table, as described in the Approach section.

Code Submission

To submit your code with the current commit, use the Asst2-end tag. Remember
to push the new tag to the remote repository.
git tag Asst2-end // Creates tag for current submission for Assignment 2
git push –tags

Glossary of Terms

Here, you can find the definition of commonly seen terms that are used in the
assignment. More will be added as people ask questions about the assignment.

  • point: refers to the point value of the hand. For example, 88 is 16 and 999 is 27 (which is busted)
  • score: refers to the relative strength of the hand when compared with another. Score and point are the same for a hand that has not busted. If a hand is busted, it’s score is zero.
  • code: a two character name used to signify the strength and the potential of the hand.
  • encounter: a player having a particular hand versus the dealer’s particular hand.

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录