代写多线程的应用作业,练习多线程的编程方法。写一个简易的Map-Reduce程序。
Requirement
In this assignment, we are going to create many multithreaded implementations
of the MapReduce-like program we started Homework 1. The major difference with
this assignment is that we are going to use multithreading to map and reduce
data more efficiently and effectively than in Homework 1. There is nothing in
this assignment that relies on your Homework 1 code. This assignment simply
uses the same concepts.
Multithreading allows us to run the map functions concurrently and get results
faster than mapping sequentially. There is hardly a device available today
that doesn’t have a multicore processor. Up until now your C programs have
only utilized a single core to execute your instructions.
Concurrent Programming
Some difficulties arise with concurrent programming as there are now
indeterminacy factors. This can lead to some disastrous consequences when
manipulating data structures or waiting for resources. You will use techniques
taught in lecture to avoid such situations.
Threads are made available by the POSIX Threading library or pthread.
Except there is one thing that is missing, naming threads. Naming your threads
will make debugging and grading your program easier.
You are required to name your threads using the following format. “map %d”
where %d is just an index of which map thread it is, and “reduce” for your
reduce thread.
When running your program, you can launch the program htop and view threads as
they are spawned, if you enable tree view in htop ‘s settings.
Terminator is a terminal window that allows you to split your terminal into
many panes. This may be useful when testing. sudo apt-get install terminator
Reentrant funcitons
You will write map-reduce programs all with the same functionality but with
different paradigms of thread safety. You will encounter a need for reetrant
safe functions such as readdir_r() for use when iterating through directories.
For example, strtok(3) isn’t reentrant. This is becuase it contains a static
variable that is reused on subsequent calls. This causes an issue when two
threads call strtok(3) because the static variable will be changed. Thread
unsafe functions like strtok(3) have reentrant versions such as strtok_r(3) .
Make sure to use the reentrant versions of functions.
We challenge you to a journey in multithreading in a series we’re calling:
LORD OF THE THREADS
Part 0: Map-Reduce: An Unexpected Thread
In this part, you will implement simple map(…) and reduce(…) functions that
will form the basis of each of the following parts.
- Navigate into your CSE320 git repository on your local machine and add the HW5 repository as a branch to your existing repository.
- Fetch information about this branch and merge it into the master branch. Refer back to HW1 for more details.
- There should be a folder called hw5 in your repository with all the assignment files. Inside the hw5 directory you can find the following files:
- src/part*.c - contains function prototypes for each of the assignment parts.
- src/lott.c - contains the main program to run your assignment
- include/lott.h - contains function prototypes and other dependencies.
- Makefile - this will build any .c and .h files you may have in the include and src directories.
- .gitignore - contains the directories to ignore with git
We will be using website data collected over the past two decades to find the
highs, lows, and averages of the data. The queries are described in detail
below.
Download the data directly from the link, and place it in a folder called data
in your hw5 directory. DO NOT FETCH AND MERGE!
Note: the data folder is in the .gitignore meaning it won’t be pushed to your
repo. DO NOT push the website data to your repo. Our disk space is precious.
THE DATA IS 1.5GB IF EVERYONE IN THE COURSE PUSHES THE DATA IT WILL USE 300GB.
Your map() function should open each of the files it is tasked with handling
and calculate the necessary values for a given query.
Your map() function should also take a void* as an argument and return a void*
. The void* argument should be a pointer to a struct of your own design
holding the necessary arguments.
Keep in mind how the sharing of data between threads is performed when passing
arguments.
Code organization and commenting IS KEY. If we cannot understand your code,
your grade will be affected.
Lord of the Threads main function
We will use the provided main (unmodified) to test your assignment. It simply
handles argument parsing and calls the functions that you implement.
Possible Queries
- A/B: Max/Min Average duration;
- map(): Average duration for each website
- reduce(): Max/Min average duration of all of the websites
- C/D: Max/Min Average # of users per year
- map(): Find the average user count per year
- reduce(): Max/Min average between all sites
- E: Country with most users
- map(): Count country codes to find maximum occurence for a website
- reduce(): Find country with the most users
The help menu is already implemented and is:
Lord of the Threads
USAGE: lott QUERY PART [THREADS]
QUERY - The calculation the program is to execute: A, B, C, D, or E.
PART - Part specification, parts 1,2,3,4, or 5 are valid choices.
THREADS - Number of threads for parts that take a specified amount
Part 1: The Desolation of CPU
TRIVIA: A decade ago for Return of the King, WETA’s renderwall had about 2,300
CPU cores and 5 terabytes of ram. To render everything for Desolation of
Smaug, though WETA relied on 50,000 CPU cores and 170 terabytes of ram!
In this section, as the title suggests, you are going to test the limits of
your CPU’s temperature throttling ability and your operating system’s context
switching ability.
Each file found in the recursive search for files will spawn a thread (
pthread_create() ). This thread will be responsible for running the map() half
of the program. After all threads have been spawned, the main function of your
program should call pthread_join() for each thread, storing the return value
of each thread in some structure of your design.
At this point, you should call the reduce() function to operate on the
returned values of the map() threads. This way your program can find a result
to the specified query.
Now you can output the results using the given format at the end of this
document.
You can expirement with less files to avoid severe slowdowns and/or crashes
while working on your VM.
Part 2: The Battle of the N Threads
TRIVIA: In the great war between Elves, Dwarves, Humans, Orcs, and Eagles
there was something to be said about an efficiency factor of not involving all
24+ species imagined in the great Tolkien universe.
Since most computers have only 2 to 4 cores, it is ideal that we limit our
program to utilize a specified number of threads.
From the function argument given you will limit the number of threads used.
This time you divide up the work evenly per thread, taking some portion of the
files and dividing it amongst your threads. You will then take the same action
as the previous part and call pthread_join() for each of the spawned threads.
When all threads have completed you can call reduce() on the total collected
data from each map() thread.
Part 3: The Fellowship of 1 Reader & N Writers
TRIVIA: All of Frodo’s friends working together to carry the ring across
middle earth was hardly efficient. If only there was a way they could’ve
worked in tandem maybe we wouldn’t have so many movies to watch. (19 hours and
31 minutes to be exact)
In this part, you will implement the reader/writer lock on a shared file
between “writer” map threads and a “reader” reduce thread. You will open a
single temporary file, mapred.tmp , which all map threads will write to.
Note: Your program should delete the temporary file when the map threads are
complete. Look at unlink(…)
You will need to write a wrapper function for the write() syscall such that
each map() thread will be guaranteed to write uninterrupted to the temporary
file. In addition, you will need a wrapper for the read() syscall so that the
reduce() thread can safely read while there is no writing being done. Your
read/write lock MUST give Reader-preference. The reduce thread should read
what has been written to the file so far and begin the reducing process on
this subset of the data immediately. When the map() threads are able to obtain
the lock they will write the data they have and end their execution.
NOTE: While testing you may encounter an unavoidable deadlock as the writer
threads can be starved from writing when the reduce thread is constantly
holding a lock for reading. You may want to introduce an artificial delay (man
usleep ) in your reduce’s calculation code so that the lock is left unlocked
long enough for a writer thread to obtain it.
When all map threads have joined with the main process you can
pthread_cancel() your reduce() thread. Since your thread can be canceled at
any time you’ll need to use pthread_cleanup_push() to handle printing the
current results.
Since your map thread can be cancelled at any time, you’ll want to make sure
that it is uninterruptable when calculating the query result. You should look
into the following function.
Part 4: The Producer/Consumer Towers
TRIVIA: Tolkien actually typed the Lord of the Rings Fellowship of the Ring by
chicken pecking two fingers on his typewriter. Be sure not to code like that.
:laughing:.
In this part you will create a global buffer that each map thread will store
their results into as they are collected. You will need to use mutexes or
semaphores to implement a producer/consumer lock on your structure to ensure
the stability of the structure.
The producer/consumer lock should be implemented such that the consumer will
be able to read new data whenever it is available, but producers are given
priority to insert data into the buffer as quickly as possible. Basically, you
MUST give writer priority, look here for some ideas.
In this part, you will spawn N threads much like in the previous example for
each of the map threads dividing the work evenly amongst each of them. You
will spawn one reduce() thread which will be the consumer. Your reduce()
thread should read data as it becomes available and compute the new result
given the new data immediately and then wait for further data.
When each map thread has been joined with your main process again you can
pthread_cancel() the reduce() thread. Since your thread can be canceled at any
time you’ll need to use pthread_cleanup_push() to handle printing the current
results. Refer to Part 3 for more details on these functions.
Part 5: The Return of the Multiplexor
TRIVIA: Hadoop is actually the name of a toy Elephant owned by the creator’s
son.
In the final part of the assignment you will spawn N threads giving each
thread one end of a socketpair(). The other ends will be given to your
reduce() thread. Your map() threads should do their job the same as the
previous parts of the assignment, but at the conclusion of their work they
will write the data to the unix socket they were passed.
Your reduce() thread will use I/O Multiplexing functions such as epoll(7) ,
poll(2) or select(2) to trigger on input from any of the given file
descriptors processing the data available on that socket.
Resources
You are expected to hand in at minimum the following files in your git
submission.
- All files relevant to hw5.
- README
* The README should contain the following information:- Name and SBU ID#
- If you did any Extra Credit tasks document them by writing EXTRA CREDIT # to the file where # is the number next to the task in the extra credit section. If you did multiple Extra Credit tasks, you must write them individually on their own lines.
- Anything relevant to grading your project when errors may occur.
Submitting your assignment
REMEMBER! DO NOT submit at the last minute. We will be using the time stamp of
the commit associated with the hw5 tag to determine if your homework
assignment is late or not. On top of that we will NOT grade late assignments.
- Make sure all files for this assignment are in the hw5 directory.
- Before you tag your submission, make sure that you have pulled from the remote and all committed changes have been pushed. Be sure you are done making any and all changes before proceeding as tags cannot be deleted.
- Log on to your gitlab account and navigate to the tags page from your repository’s main page.
- Enter hw5 for the tag name, the name of your current branch ( master only), and an optional completion message. Do not put important information in the message as we will not necessarily see it. Any relevant information to your submission should be in your README. Please tag hw5 in lowercase.
- To check if you submitted properly, you should now see a hw5 tag in the list of tags for your repository.