- This program simulates the two strategies play-the-better-machine and play-the-winner, in connection with the Two-Armed Bandit problem. The variables p(m1) and p(m2) represent the respective probabilities that machines 1 and 2 pay off a dollar on a single play. You can choose the play-the-better-machine strategy or the play-the-winner strategy. 10 plays are carried out. For each play, the program prints out which machine was used, and what the results of the play were (i.e., whether it was won or lost). Further, the program plots the final expermimentally- determined densities for p(m1) and p(m2), plotting that of p(m1) with red line, and that of p(m2) with a back one.


The source.