OpenMP代写:CS484MatrixTranspose


完成基本的Matrix操作优化。包括使用 [ Tiling ](https://penny-xu.github.io/blog/tiled-matrix-
multiplication “Tiling”) 等技术。
![Tiling](https://www.mathwarehouse.com/algebra/matrix/images/matrix-
multiplication/matrix-multiplication.png)

Introduction

Learning Goals

You will learn the following:

  • How memory access patterns can affect the performance of serial algorithms.
  • How to reason about the relative performance of two algorithms based on machine characteristics.
  • An introductory OpenMP directive.
  • How to calculate Speedup and Parallel Efficiency.

Assignment Tasks

The basic workflow of this assignment is as follows:

  • Clone your turnin/mp1 repo
  • Implement the algorithms / code for the various programming tasks.
  • Build and test on Campus Cluster (see README.md) (It should pass all tests except “Cache.TileSizeChosen” , which you will handle later.)
  • Push your complete, correct code.
  • Do at least two rounds of benchmarking on Campus Cluster ( scripts/submit batch.slurm ):
    • To determine the best tile size for the Campus Cluster nodes.
    • A final round to get the benchmark results you will use for your writeup.
  • Push any benchmarking results that you wish to and final versions of code files. (At this point, you should pass all tests.)
  • Write and push your writeup.

Matrix Transpose

This exercise introduces tiling as a way of improving cache performance. There
is a function named transpose tiled in transpose/transpose.cpp. Implement this
function (matrix transposition) using the tiling concept from class.
For simplicity, we will only deal with square matrices and tiles, and you may
assume that the matrices are optimally aligned in memory.
After implementing and verifying, benchmark this function on Campus Cluster
(see Section 2). Then answer the following questions in your writeup:

  1. For a square matrix of size N = 2048, plot the average execution times as the tile size varies over 2, 4, 8, …512. Explain the trend in terms of the machine’s cache specifications. (If you use the provided scripts/batch script.slurm, you will have these results in writeup/tile results.txt . ) You can find details about the cache using the following command: ‘getconf -a | grep CACHE’ . (On campus cluster, the run script will do this automatically and place the output in writeup/cache properties.txt )
  2. On the basis of the above results, alter the definition of BEST TILE SIZE in transpose/best tile size.h (and add/commit), re-run the batch script, reexamine the tiling results, and plot the execution time varying the matrix size N over 128, 256, 512, 1024, 2048, 4096, 8192. Explain these new results in terms of the cache specifications of the machine on which you are benchmarking (Campus Cluster).

Matrix Multiplication

In this exercise, you will analyze the performance difference of two variants
of matrix multiplication.
Implement the functions multiply tiled and multiply transposed in
mulitply/multiply.cpp , verify their correctness, and answer the following
questions.

  1. Explain performance differences that you observe between the two variants, Is one clearly better than the other? Why or why not? (Note that the benchmarks will automatically use the same BEST TILE SIZE that you determined in exercise 1.1.

Basic OpenMP

In this exercise you will apply simple OpenMP directives to two for loops
(wrapped by functions squares serial and primes serial ) and analyze their
effects on performance. You will use OpenMP pragmas to parallelized versions
of these two loops inside the functions squares parallel and primes parallel
located inside openmp/openmp.cpp .

  1. Explain performance differences that you observe between the two variants. Is one clearly better than the other? Why or why not?

Testing and Benchmarking

The process for running unit tests and benchmarks is described in see Section
2.1.
Though we reserve the right to subject your code to any tests of correctness,
benchmarks, or human code examination, these are essentially the tests that we
will use for grading. In some cases, tests that cannot be provided without
revealing an answer (“Cache.TileSizeChosen”) have been redacted to only check
if your submission is of the correct form, not to check the accuracy of your
answer itself.
Below is the grading rubric based on expected speedups for each test. Keep in
mind that you may need to run your code several times to find the best
speedups. We will also be running your code several times to find the best
speedups.

Compiling and Testing

To compile, just run the script compile script.sh in the scripts directory. To
test and benchmark your code, just run the script submit batch.sh in the
scripts directory.
You will have output files in your writeup directory that will give you
performance measurements, and your slurm-[JOBID].out file will tell you which
tests you passed or failed. We use Google Benchmarks for benchmarking your
code. Feel free to create your own benchmarks in tests/student benchmarks.cpp
if you wish, but you do not have to. If you do use your own benchmarks, then
uncomment the student benchmark run command at the bottom of scripts/batch
script.pbs.

Submission Guidelines

Your writeup must be a single PDF file that contains the tasks outlined above.
Please save the file as “./writeup/mp1 [NetID].pdf” , commit it to git, and
push it to the main branch of your turnin repo before the submission deadline.
You must also commit at least the following code files. These files, and only
these files, will be copied into a fresh repo, compiled, and tested at grading
time.

  • transpose/transpose.cpp
  • transpose/best tile size.h
  • mulitply/multiply.cpp
  • openmp/openmp.cpp
    Nothing prevents you from altering or adding any other file you like to help
    your debugging or to do additional experiments.
    This includes the unit test and benchmark code. (Which will just be reverted
    anyway.)
    It goes without saying, however, that any attempt to subvert our grading
    system through self-modifying code, linkage shenanigans, etc. in the above
    files will be caught and dealt with harshly. Fortunately, it is absolutely
    impossible to do any of these things unaware or by accident, so relax and
    enjoy the assignment.

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录