使用Java代写一个servlets的Web服务。
Background
We are all familiar with how one accesses a Web server via a browser. The big
question is what is going on under the covers of the Web server: how does it
serve data?, what is necessary in order to provide the notion of sessions?,
how is it extended?, and so on.
This assignment focuses on developing an application server, i.e., a Web
(HTTP) server that runs Java servlets, in two stages. In the first stage, you
will implement a simple HTTP server for static content (i.e., files like
images, style sheets, and HTML pages). In the second stage, you will expand
this work to emulate a full-fledged application server that runs servlets.
Java servlets are a popular method for writing dynamic Web applications. They
provide a cleaner and much more powerful interface to the Web server and Web
browser than previous methods, such as CGI scripts.
A Java servlet is simply a Java class that extends the class HttpServlet. It
typically overrides the doGet and doPost methods from that class to generate a
web page in response to a request from a Web browser. An XML file, web.xml,
lets the servlet developer specify a mapping from URLs to class names; this is
how the server knows which class to invoke in response to an HTTP request.
Further details about servlets, including links to tutorials and an API
reference, as well as sample servlets and a corresponding web.xml file, are
available on the course web site. We have also given you code to parse
web.xml.
Developing and running your code
We strongly recommend that you do the following before you start writing code:
- Carefully read the entire assignment (both milestones) from front to back and make a list of the features you need to implement.
- Think about how the key features will work. For instance, before you start with MS2, go through the steps the server will need to perform to handle a request. If you still have questions, have a look at some of the extra material on the assignments page, or ask one of us during office hours.
- Spend at least some time thinking about the design of your solution. What classes will you need? How many threads will there be? What will their interfaces look like? Which data structures need synchronization? And so on.
- Regularly check your changes into the Git repository. This will give you many useful features, including a recent backup and the ability to roll back any changes that have mysteriously broken your code.
We recommend that you continue using the VM image we have provided for HW0.
This image should already contain all the tools you will need for HW1. If you
have already checked out the code from your Git repository, all you need to do
to get the HW1 framework code is open a terminal and run “cd ~/workspace”
followed by “git pull”. After this, there should be a new “HW1” folder in the
workspace directory; you can import this folder as a project into Eclipse
using the same approach as in HW0.
Of course, you are free to use any other Java IDE you like, or no IDE at all,
and you do not have to use any of the tools we provide. However, to ensure
efficient grading, your submission must meet the requirements specified in 3.5
and 4.7 below - in particular, it must build and run correctly in the original
VM image and have a Maven build script (pom.xml). The VM image, and Maven,
will be the ‘gold standard’ for grading.
We strongly recommend that you regularly check the discussions on Piazza for
clarifications and solutions to common problems.
Testing your server
To test your server, you have several options:
- You can use the Developer Tools in Chrome to inspect the HTTP headers. Open the menu, choose “More Tools”, and click on “Developer tools”. This should pop up a new tab; click on “Network” to open a list of all the HTTP requests processed by Chrome, click on a request for extra details, and then click on “Headers” to see the headers.
- If you want to check whether you are using the correct headers, you may find the site web-sniffer.net useful.
- You can use the telnet command to directly interact with the server. Just run telnet localhost 80, type in the request, and hit Enter twice; you should see the server’s response. (If your server is running on a different port, replace ‘80’ with the port number.)
- You may also want to consider using the curl command-line utility to do some automated testing of your server. curl makes it easy to test HTTP/1.1 compliance by sending HTTP requests that are purposefully invalid - e.g., sending an HTTP/1.1 request without a Host header. ‘man curl’ lists a great many flags.
- To stress-test your server, you can use Apachebench (the ab command, which is already pre- installed in the VM). Apachebench can be configured to make many requests concurrently, which will help you find concurrency problems, deadlocks, etc.
We suggest that you use multiple options for testing; if you only use Firefox,
for instance, there is a risk that you hard-code assumptions about Firefox, so
your solution won’t work with Chrome, curl, or ab. You may also want to
compare your server’s behavior with that of a known-good server, e.g., the CIS
web server. Please do test your solution carefully!
Milestone 1: Multithreaded HTTP/1.1 Server
For the first milestone, your task is relatively simple. You will develop a
Web server that can be invoked from the command line, taking the following
parameters, in this order:
- Port to listen for connections on. Port 80 is the default HTTP port, but it is often blocked by firewalls, so your server should be able to run on any other port (e.g., 8080)
- Root directory of the static web pages. For example, if this is set to the directory /var/www, a request for /mydir/index.html will return the file /var/www/mydir/index.html. (do not hard-code any part of the path in your code - your server needs to work on a different machine, which may have completely different directories!)
Note that the second milestone will add a third argument (see below). If your
server is invoked without any command-line arguments, it must output your full
name and SEAS login name.
Your program will accept incoming GET and HEAD requests from a Web browser
(such as the Firefox browser in the VM image), and it will make use of a
thread pool (as discussed in class) to invoke a worker thread to process each
request. The worker thread will parse the HTTP request, determine which file
was requested (relative to the root directory specified above) and return the
file. If a directory was requested, the request should return a listing of the
files in the directory. Your server should return the correct MIME types for
some basic file formats, based on the extension (.jpg, .gif, .png, .txt,
.html); keep in mind that image files must be sent in binary form – not with
println or equivalent – otherwise the browser will not be able to read them.
If a GET or HEAD request is made that is not a valid UNIX path specification,
if no file is found, or if the file is not accessible, you should return the
appropriate HTTP error. See the HTTP Made Really Easy paper for more details.
MAJOR SECURITY CONCERN: You should make sure that users are not allowed to
request absolute paths or paths outside the root directory. We will validate,
e.g., that we can’t get hold of /etc/passwd!
HTTP protocol version and features
Your application server must be HTTP 1.1 compliant, and it must support all
the features described in HTTP Made Really Easy. This means that it must be
able to support HTTP 1.0 clients as well as 1.1 clients. Persistent
connections are suggested but not required for HTTP 1.1 servers. If you do not
wish to support persistent connections, be sure to include “Connection: close”
in the header of the response. Chunked encoding (sometimes called chunking) is
also not required. Support for persistent connections and chunking is extra
credit, described near the end of this assignment.
HTTP Made Really Easy is not a complete specification, so you will
occasionally need to look at RFC 2616 (the ‘real’ HTTP specification;
http://www.ietf.org/rfc/rfc2616.txt )
for protocol details. If you have a protocol-related question, please make an
effort to find the answer in the spec before you post the question to Piazza!
Special URLs
Your application server should implement two special URLs. If someone issues a
GET /shutdown, your server should shut down immediately; however, any threads
that are still busy handling requests must be aborted properly (do not just
call System.exit!). If someone issues a GET /control, your server should
return a ‘control panel’ web page, which must contain at least a) your full
name and SEAS login, b) a list of all the threads in the thread pool, c) the
status of each thread (‘waiting’ or the URL it is currently handling), and d)
a button that shuts down the server, i.e., is linked to the special /shutdown
URL. It must be possible to open the special URLs in a normal web browser.
Implementation techniques
For efficiency, your application server must be implemented using a thread
pool that you implement, as discussed in class. Specifically, there should be
one thread that listens for incoming TCP requests and enqueues them, and some
number of threads that process the requests from the queue and return the
responses. We will examine your code to make sure it is free of race
conditions and the potential for deadlock, so code carefully!
We expect you to write your own thread pool code, not use one from the Java
system library or an external library. This includes the queue, which you
should implement by yourself, using condition variables to block and wake up
threads. You may not use the BlockingQueue that comes with Java, or any
similar classes.
Tips for testing and debugging
When you test your solution with Firefox or Chrome, you will sometimes see
more than one request, even if you open only one web page. This is because the
browser sometimes checks whether there is an icon (favicon.ico) for the web
page; just handle this additional request as you would handle any other
request. Another common problem when requesting binary files, such as images,
is that the file is not displayed properly or is shown as ‘broken’. Typically,
the reason is that your server is sending a few extra bytes or is missing a
few bytes, e.g., due to an off-by-one error; it might also be converting some
character sequences, e.g., a \n to a \r\n. Try saving the file to disk in the
browser, and then compare its length and contents to the original file.
When stress-testing your solution with Apachebench, you may sometimes see some
failed connections (“Connection reset by peer”). To fix this, try turning off
any console logging during the stress test, since this will slow down your
server and prevent it from keeping up. You can also increase the second
argument to ServerSocket, which limits the number of connections that can be
“waiting” at any given time. Finally, you may want to try running the
following two commands in a terminal:
sudo sh -c ‘echo 1024 > /proc/sys/net/core/somaxconn’
sudo sh -c ‘echo 0 > /proc/sys/net/ipv4/tcp_syncookies’
The first one increases a relevant kernel parameter, and the second disables a
defense against denial-of-service attacks that is sometimes triggered by the
benchmark.
Requirements
Your solution must meet the following requirements (please read carefully!):
- Your main class must be called HttpServer, and it must be located in a package.
- Your submission must contain a) the entire source code, as well as any supplementary files needed to build your solution, b) a Maven build script called pom.xml (a template is included with the code in your Git repository), and c) a README file. The README file must contain 1) your full name and SEAS login name, 2) a description of features implemented, 3) any extra credit claimed, and 4) any special instructions for building or running.
- When your submission is unpacked in the original VM image and the Maven build script is run (mvn clean install), your solution must compile correctly. Please test this before submitting!
- Your server must accept the two command-line arguments specified above, and it must output your full name and SEAS login name when invoked without command-line arguments.
- Your solution must be submitted using the online submission system (see the link on the course web page) before the relevant deadline on the first page of this handout. The only exception is if you have obtainend an extension online (using the “Extend” link in the submission system).
- You must check your submission into your Git repository after you submit it. Run “git status” in the workspace directory to check for modifications, use “git add” to add any new files or directories you created, and then run “git commit” followed by “git push”. Be sure to check whether these commands actually succeed. If necessary, consult https://git-scm.com/documentation .
- Your code must contain a reasonable amount of useful documentation.
You may not use any third-party code other than the standard Java libraries
(exceptions noted in the assignment) and any code we provide.
Milestone 2: Servlet Engine
The second milestone will build upon the Web server from Milestone 1, with
support for POST and for invoking servlet code. To ease implementation, your
application server will need to support only one web application at a time.
Therefore, you can simply add the class files for the web application to the
classpath when you invoke you application server from the command line, and
pass the location of the web.xml file as an argument. Furthermore, you need
not implement all of the methods in the various servlet classes; details as to
what is required may be found below.
The Servlet
A servlet is typically stored in a special “war file” (extension .war) which
is essentially a jar file with a special layout. The configuration information
for a servlet is specified in a file called web.xml, which is typically in the
WEB-INF directory. The servlet’s actual classes are typically in WEB-
INF/classes. The web.xml file contains information about the servlet class to
be invoked, its name for the app server, and various parameters to be passed
to the servlet.
The servlet and servlet-class elements are used to establish an internal name
for your servlet, and which class it binds to. The servlet-mapping associates
the servlet with a particular sub-URL ( http://my-server/MyServlet
or similar). There are two kinds of URL patterns
to handle:
- Exact pattern (must start with a /). This is the most common way of specifying a servlet.
- Path mapping (starts with a / and ends with a *, meaning it should match on the prefix up to the ). This is used in a certain Web service scheme called “REST” (which we discuss later in the term). As a special case, /foo/ should match /foo (without the trailing forward slash).
There are two ways that parameters can be specified from “outside” the
servlet, e.g., to describe setup information such as usernames and passwords,
servlet environment info, etc. These are through init- param elements, which
appear within servlet elements and establish name-value pairs for the servlet
configuration, and the context-param elements, which establish name-value
pairs for the servlet context. We will discuss how these are accessed
programmatically in a moment.
Basic Servlet Operation
All servlets implement the javax.servlet.http.HttpServlet interface, which
extends the Servlet interface. You will need to build the “wrapping” that
invokes the HttpServlet instance, calling the appropriate functions and
passing in the appropriate objects.
Servlet initialization, config, and context. When the servlet is first
activated (by starting it in the app server), this calls the init() method,
which is passed a ServletConfig object. This may request certain resources,
open persistent connections, etc. The ServletConfig details information about
the servlet setup, including its ServletContext. Both of these can be used to
get parameters from web.xml.
ServletConfig represents the information a servlet knows about “itself”.
Calling getInitParameter() on the ServletConfig returns the servlet init-param
parameters. The method getParameterNames() returns the full set of these
parameters. Finally, one can get the servlet’s name (from web.xml) through
this interface. ServletContext represents what the servlet sees about its
related Web application. Calling getInitParameter() on the ServletContext
returns the servlet context-param parameters. The method getParameterNames()
returns the full set of these parameters. Through the context, the servlet can
also access resources that are within the .war file, and determine the real
path for a given “virtual” path (i.e., a path relative to the servlet).
Perhaps more important, the ServletContext provides a way of passing objects
(“attributes” that are name-object pairs) among different parts of a Web
application. You may ignore the context’s logging capabilities.
Service request and response. When a request is made of the servlet by an HTTP
client, the app server calls the service() method with a
javax.servlet.ServletRequest parameter containing request info, and a
javax.servlet.ServletResponse parameter for response info. For an HTTP servlet
(the only kind we are implementing), service() typically calls a handler for
the type of HTTP request. The only ones we care about are doGet() for GET
requests, and doPost() for POST requests. (There are other kinds of calls, but
these are seldom supported in practice.) Both doGet() and doPost() are given
parameter objects implementing javax.servlet.HttpServletRequest and
javax.servlet.HttpServletResponse (which are subclassed from the original
ServletRequest and ServletResponse). HttpServletRequest, naturally, contains
information about the HTTP request, including headers, parameters, etc. You
can get header information from getHeader() and its related methods, and get
form parameters through getParameter(). HttpSession is used to store state
across servlet invocations. The getAttribute() and related methods support
storing name-value pairs. The session should time-out after the designated
amount of time (specified as a default or in setMaxInactiveInterval()).
HttpServletResponse contains an object that is used to return information to
the Web browser or HTTP client. The getWriter() or getOutputStream() methods
provide a means of directly sending data that goes onto the socket to the
client. Also important are addHeader(), which adds a name-value pair for the
response header, and its sibling methods for adding header information. Note
that there are a variety of important fields you can set this way, e.g.,
server, content-length, refresh rate, content-type, etc. Note that you should
ensure that an HTTP response code (e.g., “200 OK”) is sent to the client
before any output from the writer or output stream are returned. If the
servlet throws an exception before sending output, you should return an error
code such as “500 Internal Server Error”. You should return a “302 Redirect”
if the servlet calls HttpServletResponse’s sendRedirect() method.
Servlet shutdown. When the servlet is deactivated, this calls the servlet’s
destroy() method, which should release resources allocated by init().
Invocation of the application server
You should add a third command-line argument: the location of the web.xml file
for your web application. In your submission, this file should be located in
the conf subdirectory. You may accept additional optional arguments after the
initial three (such as number of worker threads, for example), but the
application should run with reasonable defaults if they are omitted.
Special URLs
You should now augment the special URLs you implemented for MS1. The /shutdown
URL should properly shut down all the servlets, by invoking their destroy
methods, and the /control URL should now provide a way to view the error log.
It may provide other (e.g., extra-credit) features as you see fit.
Implementation techniques
Dynamic loading of classes in Java which you will need to do since a servlet
can have any arbitrary name, as specified in web.xml can be a bit tricky.
Start by calling the method Class.forName, with the string name of the class
as an argument, to get a Class object representing the class you want to
instantiate (i.e. a specific servlet). Since your servlets do not define a
constructor, you can then call the method newInstance() on that Class object,
and typecast it to an instance of your servlet. Now you can call methods on
this instance.
Required application server features
Your application server must provide functional implementations of all of the
non-deprecated methods in the interfaces HttpServletRequest,
HttpServletResponse, ServletConfig, ServletContext, and HttpSession of the
Servlet interface version 2.5 ( [ http://download.oracle.com/otn-
pub/jcp/servlet-2.5-mr5-oth-JSpec/servlet-2.5-mr5-spec.pdf/
](http://download.oracle.com/otn-pub/jcp/servlet-2.5-mr5-oth-
JSpec/servlet-2.5-mr5-spec.pdf/) ), with the following exceptions:
- ServletContext.log
- ServletContext.getMimeType (return null)
- ServletContext.getNamedDispatcher
- ServletContext.getResource
- ServletContext.getResourceAsStream
- ServletContext.getResourcePaths
- HttpServletRequest.getPathTranslated
- HttpServletRequest.getUserPrincipal
- HttpServletRequest.isUserInRole
- HttpServletRequest.getRequestDispatcher - HttpServletRequest.getInputStream
- HttpServletResponse.getOutputStream
- HttpServletRequest.getLocales
- ServletContext.getNamedDispatcher
- ServletContext.getRequestDispatcher
- ServletContext.getContextPath
You may return null for the output of all of the above methods, as well as all
deprecated methods. We will also make the following simplifications and
clarifications of the spec: - HttpServletRequest.getAuthType should always return BASIC AUTH (“BASIC”)
- HttpServletRequest.getPathInfo should always return the remainder of the URL request after the portion matched by the url-pattern in web-xml. It starts with a “/“.
- HttpServletRequest.getQueryString should return the HTTP GET query string, i.e., the portion after the “?” when a GET form is posted.
- HttpServletRequest.getCharacterEncoding should return “ISO-8859-1” by default, and the results of setCharacterEncoding if it was previously called.
- HttpServletRequest.getScheme should return “http”.
- HttpServletResponse.getCharacterEncoding should return “ISO-8859-1”.
- HttpServletResponse.getContentType should return “text/html” by default, and the results of setContentType if it was previously called.
- HttpServletResponse.getLocale should return null by default, or the results of setLocale if it was previously called.
- HttpServletRequest.isRequestedSessionIdFromUrl should always return false.
This means that your application server will need to support cookies, sessions
(using cookies you don’t need to provide a fall-back like path encoding if the
client doesn’t support cookies), servlet contexts, initialization parameters
(from the web.xml file) - in other words, all of the infrastructure needed to
write real servlets. It also means that you won’t need to do HTTP-based
authentication, or implement the ServletInputStream and ServletOutputStream
classes.
We suggest you start by determining what you need to implement:
- Print the JavaDocs for HttpServletRequest, HttpServletResponse, Servlet- Config, ServletContext, and HttpSession, from the URL given previously.
- Create a skeleton class for each of the above, with methods that temporarily return null for each call. Be sure that your HttpServletRequest class inherits from the provided javax.servlet.HttpServletRequest (in the .jar file), and so forth.
- Print the sample web.xml from the extra/Servlets/web/WEB-INF directory. There is very useful information in the comments, which will help you determine where certain methods get their data.
You can find a simple parser for the web.xml file from the TestHarness code
(see 5.1 and the code in extra/TestHarness). For the ServletConfig and
ServletContext, note the following:
- There is a single ServletContext per “Web application,” and a single ServletConfig per “servlet page.” (For the base version of Milestone 2, you will only need to run one application at a time.) Assuming a single application will likely simplify some of what you need to implement in ServletContext (e.g., getServletNames).
- Most of the important ServletConfig infoservlet name, init parameter names, and init parameter list come directly from web.xml.
- The ServletContext init parameters come from the context-param elements in web.xml.
- The ServletContext attributes are essentially a hash map from name to value, and can be used, e.g., to communicate between multiple instances of the same servlet. By default, these can only be created programmatically by servlets themselves, unlike the initialization parameters, which are set in web.xml. The ServletContext name is set to the display name specified in web.xml.
- The real path of a file can be getting the canonical path of the path relative to the Web root. It is straightforward to return a stream to such a resource, as well. The URL to a relative path can similarly be generated relative to the Servlet’s URL.
Requirements
Your solution must meet the same requirements as MS1 (see 3.4 above), with two
exceptions. First, your solution must now support three command-line arguments
(the third is the location of web.xml), and second, you need to submit at
least one test case for each of the major classes you implemented.
Resources
The framework code in your Git repository includes (in the extra directory)
the source code for a simple application server that accepts requests from the
command line, calls a servlet, and prints results back out. It will give you a
starting point, though many of the methods are just stubs, which you will need
to implement. We have also provided a suite of simple test servlets and an
associated web.xml file and directory of static content; it should put your
application server through its paces. We will, however, test your application
server with additional servlets.
TestHarness: A primitive app server
TestHarness gives you a simple command-line interface to your servlets. It
reads your web.xml file to find out about servlets. Thus, in order to test a
servlet you need to add the appropriate entry in web.xml first (as you would
do in order to deploy it). You can then specify a series of requests to
servlets on the command line, which all get executed within a servlet session.
Suppose you have a servlet ‘demo’ in your web.xml file. To run:
- Put the TestHarness classes and the servlet code in the same Eclipse project.
- Make sure the file servlet-api.jar has been added to the project as an ‘external jar file.’
- Create a new run profile (Run ! Run…), choose TestHarness as the main class, and give the command line arguments path/to/web.xml GET demo to have the TestHarness program run the demo servlet’s doGet method.
The servlet output is printed to the screen as unprocessed HTML. You can set
the profile’s root directory if it makes writing the path to the web.xml
easier; it defaults to the root of the Eclipse project. More interestingly, if
you had a servlet called login, you could also run it with the arguments:
path/to/web.xml POST login?login=aaa&passwd=bbb
This will call the doPost method with the parameters login and passwd passed
as if the servlet was invoked through Tomcat.
Finally, TestHarness also supports sequences of servlets while maintaining
session information that is passed between them. Suppose you had a servlet
called listFeeds, which a used can run only after logging in. You can simulate
this with the harness by doing:
path/to/web.xml POST login?login=aaa&passwd=bbb GET listFeeds
In general, since your servlets would normally expect to be passed the session
object when executed, in order to test them with this harness you should
simulate the steps that would be followed to get from the login page to that
servlet. If for example after login you go to formA and enter some values and
click a button to submit formA to servletA, and then you enter some more
values in formB and click a button and go to servletB, to test servletB
(assuming you use post everywhere) you would do.
Extra credit
HTTPS support
For this extra-credit task, you need to extend your server with support for
HTTPS. Add a boolean constant called useHTTPs in your code. When this constant
is set to true, your solution should accept HTTPS connections on the port that
was specified on the command line; otherwise it should work exactly as
described in the main part of the assignment. You may want to have a look at
the javax.net.ssl package, and particularly at the SSLServerSocket class.
Please document in your README file where the constant is, as well as any
other setup steps that are needed to run your server in HTTPS mode.
Event-driven server
This extra-credit item is only available for MS1. To claim it, you must submit
two working versions of your web server: One with a thread pool, and a second,
event-driven one. The event handlers must be non-blocking, i.e., all I/O must
be asynchronous (Java NIO). We recommend that you first implement the basic
server and then refactor it to be event-driven.
HTTP/2 support
For this extra-credit task, you need to add support for the more recent
protocol version, HTTP/2, which can be found in RFC7540. Add a boolean
constant called useHTTP2 in your code; when this constant is set to true, your
solution should accept HTTP/2 connections; otherwise it should work exactly as
described in the main part of the assignment. Please document in your README
file where the constant is. To complete this task by itself, you can implement
the protocol version that runs over cleartext TCP; alternatively, you can
combine this task with Task 6.1 (“HTTPS support”) and implement the TLS-based
version for +30% extra credit total. (In that case HTTPS support for HTTP/1.1
won’t count separately.) This task is not for the faint of heart, so you
should only attempt it if MS1 was relatively easy for you.
Persistent HTTP connections and chunked encoding
We do not require persistence or chunking in your basic HTTP 1.1 server.
However, each of these will count for part of up to 10% extra credit. To get
full credit for chunking support, your server needs to be able to both send
and receive chunked messages.
Performance testing
The supplied servlet BusyServlet performs a computationally intestive task
that should take a number of seconds to perform on a modern computer.
Experimentally determine the effect of changing the thread pool size on
performance of the application server when many requests for BusyServlet come
in at the same time. Comment on any trends you see, and try to explain them.
Suggest the ideal thread pool size and describe how you chose it. Include
performance measures like tables, graphs, etc.
Multiple applications and dynamic loading
The project described above loads one web application and installs it at the
root context. Extend it to dynamically load and unload other applications at
different contexts. Add options to the main menu of the server to list
installed applications, install new applications, and remove any installed
applications. You’ll need to take special care to ensure that static variables
do not get shared between applications (i.e. the same class in two different
applications can have different values for the same static variable). Each
application should have its own servlet context as well. (Since each
application may have its own classpath, be sure to add the capability to
dynamically modify the classpath, too.)