CUDA代写：CS475CUDAMonteCarlo

发布日期: 2023-06-27

使用 CUDA 实现Monte Carlo模拟算法。
![CUDA](https://upload.wikimedia.org/wikipedia/en/thumb/b/b9/Nvidia_CUDA_Logo.jpg/300px-
Nvidia_CUDA_Logo.jpg)

Node

The flip machines do not have GPU cards in them, so CUDA will not run there.
If your own system has a GPU, you can use that. You can also use the DGX
machine, but please be good about sharing it.

Introduction

Monte Carlo simulation is used to determine the range of outcomes for a series
of parameters, each of which has a probability distribution showing how likely
each option is to happen. In this project, you will take a scenario and
develop a Monte Carlo simulation of it, determining how likely a particular
output is to happen.

The Scenario

A laser is pointed at a circle (circle, in this case). The circle is defined
by a center point (xc,yc) and a radius (r). The beam comes out at a 30 angle.
It bounces off the circle. Underneath, even with the laser origin, is an
infinite plate. Given all of this, does the beam hit the plate?
Normally this would be a pretty straightforward geometric calculation, but the
circle is randomly changing location and size. So now, the laser beam might
hit the plate or it might not, depending on the values of (xc, yc, r ). OK,
since it is not certain, what is the probability that it hits the plate? This
is a job for GPU Monte Carlo simulation!
Because of the variability, the beam could miss the circle entirely (A). The
circle might totally engulf the laser pointer (B). It could bounce off the
circle and miss the plate entirely (C). Or, it could bounce off the circle and
actually hit the plate (D).
So, now the question is “What is the probability that the beam hits the
plate?”.
In My Opinion, Here Is How To Make Your Life Way, Way, Way Easier
IMHO, use Linux for this project. The compilation is orders of magnitude
simpler, and you can try this out on OSU’s new DGX system, which will produce
dazzling performances.
Also, before you use the DGX, do your development on the rabbit system (Slide
#3 of the DGX noteset). It is a little friendlier because you don’t have to
run your program through a batch submission. But, don’t take any final
performance numbers from rabbit, just get your program running there.
But, if you decide to use Visual Studio on your own machine, you must first
install the CUDA Toolkit!
If you are trying to run CUDA on your own Visual Studio system, make sure your
machine has the CUDA toolkit installed. It is available here:
https://developer.nvidia.com/cuda-downloads

Requirements

Variable	Range
xc	0 0 - 2.0
yc	0 0 - 2.0
r	0.5 - 2.0

The ranges are above.
Note: these are not the same numbers as we used before!
Run this for four BLOCKSIZEs (i.e., the number of threads per block) of 16, 32, 64, and 128, combined with NUMTRIALS sizes of 16K, 32K, 64K, 128K, 256K, 512K, and 1M.
Be sure the NUMTRIALS are in multiples of 1024, that is, for example, use 32,768, not 32,000.
Record timing for each combination. For performance, use some appropriate units like MegaTrials/Second or GigaTrials/Second.
For this one, use CUDA timing, not OpenMP timing.
Do a table and two graphs:
1. Performance vs. NUMTRIALS with multiple curves of BLOCKSIZE
2. Performance vs. BLOCKSIZE with multiple curves of NUMTRIALS
Like before, fill the Xcs, Ycs, and Rs arrays ahead of time. Send them to the GPU where they can be used as look-up tables.
A template of what the code could look like can be found in the montecarloTemplate.cu file.
You will also need six .h files:
* helper_functions.h
* helper_cuda.h
* helper_image.h
* helper_string.h
* helper_timer.h
* exception.h
Your commentary PDF should:
1. Tell what machine you ran this on
2. Show the table and the two graphs
3. What patterns are you seeing in the performance curves?
4. Why do you think the patterns look this way?
5. Why is a BLOCKSIZE of 16 so much worse than the others?
6. How do these performance results compare with what you got in Project #1? Why?
7. What does this mean for the proper use of GPU parallel computing?

Grading

Feature	Points
Monte Carlo performance table	20
Graph of performance vs. NUMTRIALS with multiple curves of BLOCKSIZE	25
Graph of performance vs. BLOCKSIZE with multiple curves of NUMTRIALS	25
Commentary	30
Potential Total	100