使用 MPI 补全提供的Gas
Simulator代码空缺部分。
Introduction
Learning Goals
You will learn the following:
- Communication patterns of parallel 2D Van der Waals gas simulator
- Using MPI, and AMPI for distributed memory execution
Please read the entire document before beginning any part of the assignment,
as some parts are interdependent.
Assignment Tasks
The basic workflow of this assignment is as follows (there are more details in
the relevant sections):
- Clone your turnin repo to Campus Cluster
- You may need to iterate:
- Implement the algorithms / code for the various programming tasks.
- Build and test.
- Check in and push your complete, correct code.
- Benchmark on Campus Cluster as a batch job (sbatch scripts/batch script.slurm). See README CAMPUS CLUSTER.md for details.
- Check in any benchmarking results that you wish to and final versions of code files.
- Write and check in your writeup.
Part I: MPI
You will implement the communication functions in part1/solution.h and
part1/solution.cpp. You may modify the classes in these files to add member
variables/functions or even create other helper classes, etc. Although you are
allowed to create or alter other files and use them for debugging/testing
purposes, they will be discarded at grading time.
Implementation
The MPI code revolves around a class named MPISimulationBlock defined in
part1/solution.h, which inherits from
SimulationBlock defined in common/simblock.h. The parent class SimulationBlock
contains most of the program parameters as well as particle data, which can be
accessed from MPISimulationBlock’s member methods. Each MPI rank contains one
MPISimulationBlock that represents a block of the entire simulation grid.
Remember that with MPI, the decomposition (i.e. number of blocks/ranks used
for the simulation) is dependent on the number of CPU cores.
You may add code to the constructor and destructor bodies if want (not
required). You should implement the following functions:
MPISimulationBlock::exchange particles()
When this function is called, any particle outside the current block’s bounds
must be moved to the appropriate adjacent block. That is, it must be removed
from this block’s SimulationBlock::all particles array (using the provided
SimulationBlock::remove particle(int) function) and added to one and only one
appropriate recipient’s all particles array (either via SimulationBlock::add
particle(phys particle t) or by directly placing it in the all particles array
and updating N particles).
You can determine which direction a particle needs to move by calling check
migrant direction(particle). The return value of this function is an int,
which is one of the following:
SimulationBlock::DIR_SELF, SimulationBlock::DIR_N, SimulationBlock::DIR_S,
SimulationBlock::DIR_E, SimulationBlock::DIR_W, SimulationBlock::DIR_NE,
SimulationBlock::DIR_NW, SimulationBlock::DIR_SE, SimulationBlock::DIR_SW
—|—
SELF means the particle does not need to move to another block. N means north,
S means south, NW means northwest, and so on. You may use the provided macro
DIR EQ (provided in common/simblock.h) to check the direction: e.g. DIR
EQ(SimulationBlock::DIR N, direction) will return true (i.e. 1) if the
particle should migrate to the north neighbor.
For the particle exchange, you would need to use MPI communication calls (MPI
send, MPI recv, etc.) and proper synchronization (if necessary). Note that the
receiving rank will need to know how many particles it is going to receive
before posting a receive MPI call, since you want to avoid sending particles
one by one over the network.
MPISimulationBlock::communicate ghosts()
Any time a particle from SimulationBlock::all particles is within a certain
distance of the current block’s edge, it must be communicated to the
appropriate adjacent, or corner-touching SimulationBlock and placed in that
block’s SimulationBlock::all ghosts array. SimulationBlock::N ghosts must be
set to the total number of ghost particles received in this iteration.
Communicating ghost particles is very similar to exchanging particles, except
that a single particle may be sent as a ghost to several adjacent blocks, and
particles that are sent are not removed from the local all particles. You can
determine which direction(s) a particle needs to be sent by calling check
ghost direction(particle). Because a particle may be sent to multiple
neighbors unlike in exchange particles(), you should use the provided DIR HAS
macro: e.g. DIR HAS(SimulationBlock::DIR N, direction) will return true if
north is one of the directions that the particle needs to be sent to.
You should use MPI communication calls to send and receive the ghosts, similar
to exchange particles().
MPISimulationBlock::init communication()
MPISimulationBlock::finalize communication()
These functions will be called by the main program before and after the
simulation runs. You may use them to do whatever setup and teardown you wish,
but do not call MPI Init() or MPI Finalize(), which our main program will do.
Mostly this is for setting up/tearing down whatever variables you will need
for communication. Try not to leak any memory, we may choose to test your code
under valgrind or another memory debugger.
Compilation and Testing
- Create a new build directory.
- In build, run cmake [path to mp3 dir]. This will go through the system configurations and generate a Makefile.
- Run make. This will compile binaries for all parts of the assignment (bin/part1, bin/part2, bin/part3).
This compilation process is identical for all parts of the assignment.
To test if your MPI program works, run bin/part1 -N 100 -i 1. This will run
the simulation for 100 iterations, outputting the iteration value every
iteration.
Part II: AMPI
Implementation
For this section, you do not need to modify/add any code, but please do read
the code in part1/main.cpp that is wrapped with #ifdef AMPI. These code blocks
demonstrate how load balancing can be invoked with AMPI, and will be included
in the AMPI version of the program.
You do need to benchmark this code, as explained in the last section.
Testing
Run bin/part2 +vp 4 -N 100 -i 1 +balancer GreedyRefine +isomalloc_sync
.
This will run the AMPI program with 4 virtual ranks, with the GreedyRefine
load balancing strategy.
Benchmarking
We have provide you with a batch script that will run both parts of the
assignment (MPI and AMPI; Charm will be excluded this term). As always, you
should run this on the campus cluster. It will vary the number of utilized CPU
cores from 1 to 36, running on at most 2 physical nodes (each node has 20 CPU
cores) with a fixed decomposition. The results will be stored in
writeup/benchmark .txt, where is the one of mpi and ampi. You should plot
these results and evaluate the performance in your writeup. More specifically,
explain how the performance for each version of the simulation code scales
with the number of CPU cores.
Questions
As part of this assignment, you will need to answer multiple questions about
your experiments. The repo contains a file (mp3.answers) to put your answers
into. Each line of the file contains numbers corresponding to each question.
Put your answers on corresponding lines. Do not include any extra symbols
(e.g. no dots at the end of the line). IMPORTANT: we will be using automated
tools to grade your work, so make sure you follow the described format. The
answers to these questions may require multiple runs of the experiments, so
start early.
Question 1
In terms of runtime, does MPI or AMPI perform better on 4 processes? Answer
with either MPI or AMPI.
Question 2
In terms of runtime, does MPI or AMPI perform better on 36 processes? Answer
with either MPI or AMPI.
Question 3
Which scales better, MPI or AMPI? Answer with either MPI or AMPI.
Question 4
What is the most probable cause of one scaling better than the other? Answer
with either A, B, or C
A) MPI scales better since for larger number of processes the frequency of
AMPI migration is too high.
B) MPI scales better since AMPI migration synchronizes processes.
C) AMPI scales better due to the effectiveness of load balancing.
Submission
You must commit at least the following files. These files, and only these
files, will be copied into a fresh repo, compiled (if needed), and tested at
grading time.
- part1/solution.h, part1/solution.cpp
- mp3.answers
Nothing prevents you from altering or adding any other file you like to help
your debugging or to do additional experiments. This includes the benchmark
code. (Which will just be reverted anyway.)
It goes without saying, however, that any attempt to subvert our grading
system through self-modifying code, linkage shenanigans, etc. in the above
files will be caught and dealt with harshly. Fortunately, it is absolutely
impossible to do any of these things unaware or by accident, so relax and
enjoy the assignment.