代写实现分布式计算中的MapReduce模型,考察Linux下的网络编程能力。
Problem Statement
In this project you will implement a simple model of computational offloading
where a single client offloads some computation to a server which in turn
distributes the load over 3 backend servers. The server facing the client then
collects the results from the backend and communicates the same to the client
in the required format. This is an example of how a cloudcomputing service
such Amazon Web Services might implement MapReduce to speed up a large
computation task offloaded by the client.
The server communicating with the client is called AWS (Amazon Web Server) and
the three backend servers are named BackServer A, BackServer B and
BackServer C. The client and the AWS communicates over a TCP connection while
the communication between AWS and the BackServers A, B & C is over a UDP
connection.
Input Files Used
The files specified below will be used as inputs in your programs in order to
dynamically configure the state of the system. The contents of the files
should NOT be “hardcoded” in your source code, because during grading, the
input files will be different, but the formats of the files will remain the
same.
If you are working in an environment other than UNIX, pay particular attention
to line endings or newlines . For this project, it is assumed that all files
follow the UNIX line ending convention. This is particularly important while
handling the input file(s). See the articles here and here for more
information.
nums.csv: An ASCII file that contains a single column of integers. Each row
consists of a single integer and ends with a newline. You may assume that each
integer is within the range of a long signed integer type. The number of rows
in the file will be a multiple of 3. This file will always reside in the same
directory as the client.
Source Code Files
Your implementation should include the source code files described below, for
each component of the system.
- AWS: You must name your code file: aws.c or aws.cc or aws.cpp (all small letters). Also you must call the corresponding header file (if you have one; it is not mandatory) a ws.h (all small letters).
- BackServer A, B and C: You must use one of these names for this piece of code: server#.c or server#.cc or server#.cpp (all small letters except for #). Also you must call the corresponding header file (if you have one; it is not mandatory) server#.h (all small letters, except for #). The “#” character must be replaced by the server identifier (i.e. A or B or C), depending on the server it corresponds to.
Note: In case you are using one executable for all four servers (i.e. if you
choose to make a “fork” based implementation), you should call the file
servers.c or servers.cc or servers.cpp. Also you must call the corresponding
header file (if you have one; it is not mandatory) servers.h (all small
letters). In order to create four servers in your system using one executable,
you can use the fork() function inside your server’s code to create 4 child
processes. You must follow this naming convention! This piece of code
basically handles the server functionalities. - Client: The name of this piece of code must be client.c or client.cc or client.cpp (all small letters) and the header file (if you have one; it is not mandatory) must be called client.h (all small letters).
More Detailed Explanations
Phase 1
All four server programs (AWS, BackServer A, B, & C) boot up in this phase.
While booting up, the servers must display a boot message on the terminal. The
format of the boot message for each server is given in the onscreen messages
tables at the end of the document. As the boot message indicates, each server
must listen on the appropriate port for incoming packets/connections.
Once the server programs have booted up, the client program is run. The client
displays a boot message as indicated in the onscreen messages table. Note that
the client code takes an input argument from the command line, that specifies
the computation that is to be run. The format for running the client code is
./client
where function_name
can take a value from {min, max, sum, sos}
. As an
example, to find the sum of the all the numbers in the input file, the client
should be run as follows:
./client sum
After booting up, the client establishes a TCP connection with AWS. After
successfully establishing the connection, the client first sends the function_name
to AWS. Once the function_name
is sent, the client should
print a message in the format given int the table. The client then reads all
integers from nums.csv and proceeds to send them to AWS over the same TCP
connection. After successfully sending the integers, the client should print
the number of integers sent to AWS. This ends Phase 1 and we now proceed to
Phase 2.
Phase 2
In Phase 1, you read the numbers from the file and sent them to the AWS server
over a TCP connection. Now in phase 2, this AWS server will divide the data
into 3 nonoverlapping components and send that to the 3 backservers. If
there are N numbers in the file, then the first N/3 numbers must be sent to
backserver A, next N/3 to backserver B and the last N/3 numbers to
backserver C. TAs will make sure that the number N is divisible by 3. Also
the function to be performed needs to be communicated to the backservers.
The communication between the AWS server and the backservers happen over UDP.
The AWS server will send the function_name
along with the actual numbers.
Note that the function_name
can be MIN, MAX, SUM or SOS (sum of squares).
The port numbers for backservers A, B and C are specified in table 2. Since
all the servers will run on the same machine in our project, all have the same
IP address (the IP address of localhost is usually 127.0.0.1).
Once a backserver receives the actual numbers (a total of N/3 numbers) and
the function to be performed, it computes the function value. Let this value
for server i as X(i). This step is also called as map in MapReduce. If the
numbers received the backserver i are n(1), n(2), then the Map operations it
performs are as follows.
Phase 3
At the end of Phase 2, all backendservers have their answers ready. Let’s
call the value calculated by backendserver i as X(i). This is to be sent to
the AWS server using UDP. The final answer needs to be calculated by the
Frontendserver (AWS) in the reduce step and then handed over to the user.
The frontendserver (server D) looks at the type of reduction operation and
calculates the final answer which we call X f inal based on the answers it
receives from the backservers A, B and C. This step is also called as reduce
in MapReduce. Now depending on the operation requested by the user we have.
Example Output
BackendServer A Terminal:
The Server A is up and running using UDP on port 21319.
The Server A has received 30 numbers
The Server A has successfully finished the reduction SUM: 1000
The Server A has successfully finished sending the reduction value to AWS server.
BackendServer B Terminal:
The Server B is up and running using UDP on port 22319.
The Server B has received 30 numbers
The Server B has successfully finished the reduction SUM: 1001
The Server B has successfully finished sending the reduction value to AWS server.
BackendServer C Terminal:
The Server C is up and running using UDP on port 23319.
The Server C has received 30 numbers
The Server C has successfully finished the reduction SUM: 1002
The Server C has successfully finished sending the reduction value to AWS server.
AWS Terminal:
The AWS is up and running.
The AWS has received 90 numbers from the client using TCP over port 25319
The AWS has sent 30 numbers to Backend-Server A
The AWS has sent 30 numbers to Backend-Server B
The AWS has sent 30 numbers to Backend-Server C
The AWS received reduction result of SUM from Backend-Server A using UDP
over port 24319 and it is 1000
The AWS received reduction result of SUM from Backend-Server B using UDP
over port 24319 and it is 1001
The AWS received reduction result of SUM from Backend-Server C using UDP
over port 24319 and it is 1002
The AWS has successfully finished the reduction SUM: 3003
The AWS has successfully finished sending the reduction value to client.
Client Terminal:
The client is up and running.
The client has sent the reduction type SUM to AWS.
The client has sent 90 numbers to AWS
The client has received reduction SUM: 3003
Assumptions
- It is recommended to start the processes in this order: backendserver (A), backendserver (B), backendserver (C), AWS (D), Client.
- If you need to have more code files than the ones that are mentioned here, please use meaningful names and all small letters and mention them all in your README file.
- You are allowed to use blocks of code from Beej’s socket programming tutorial (Beej’s guide to network programming) in your project. However, you need to mark the copied part in your code.
- When you run your code, if you get the message “port already in use” or “address already in use”, please first check to see if you have a zombie process (from past logins or previous runs of code that are still not terminated and hold the port busy). If you do not have such zombie processes or if you still get this message after terminating all zombie processes, try changing the static UDP or TCP port number corresponding to this error message (all port numbers below 1024 are reserved and must not be used). If you have to change the port number, please do mention it in your README file. If you have zombie processes you can kill them using unix commands:
kill
orkillall
.
Requirements
- Do not hardcode the TCP or UDP port numbers that are to be obtained dynamically. Refer to Table 1 to see which ports are statically defined and which ones are dynamically assigned. Use
getsockname()
function to retrieve the locallybound port number wherever ports are assigned dynamically as shown. - Use
gethostbyname()
to obtain the IP address or the local host however the host name must be hardcoded as nunki.usc.edu or localhost in all pieces of code. - You can either terminate all processes after completion of phase 3 or assume that the user will terminate them at the end by pressing
ctrl-c
. - All the naming conventions and the onscreen messages must conform to the previously mentioned rules.
- You are not allowed to pass any parameter or value or string or character as a commandline argument except while running the client in Phase 1.
- All the onscreen messages must conform exactly to the project description. You should not add anymore onscreen messages. If you need to do so for the debugging purposes, you must comment out all of the extra messages before you submit your project.
- Using
fork()
or similar system calls are not mandatory if you do not feel comfortable using them to create concurrent processes. - Please do remember to close the socket and tear down the connection once you are done using that socket.