回答数据库问题,包括 SQL , Logical
Database Design , Relational Algebra, Query Optimization以及Physical Database
Design相关部分。
![SQL](https://upload.wikimedia.org/wikipedia/commons/thumb/7/72/ER_Diagram_MMORPG.png/1009px-
ER_Diagram_MMORPG.png)
SQL
A climbing gym is deploying a new system to manage inventory and help members
track their progress.
The gym has walls with slots to which different holds can be attached. Each
hold has a color.
A route is a set of holds, all one color, attached to specific locations on a
wall and arranged roughly vertically. The intent is to have a member climb the
route to th
Here’s a proposed schema:
Slot(sid, wall, x, y)
Hold(hid, color, desc)
Route(rid, name, circuit)
Placement(rid, hid, sid)
- Slot represents the possible locations for a hold. sid is a surrogate key, wall is the name of the wall (e.g., “north,” “front”), (x,y) is the location on the wall, measured in meters.
- Hold manages the inventory of shaped resin pieces that simulate outcroppings on which to step or grab.
- Route is a set of holds attached to particular slots. name is a descriptive text string. circuit is a label indicating that this route is part of a set of related routes. sid, hid, ridare integers.
- Write CREATE TABLE statements for the schema above.
- A route is consistent if every hold has the same color, which indicates difficulty. Write a query that returns route ids that violate this condition. That is, your query should return route ids associated with holds of at least two different colors.
- A conflictis when two holds occupy the same slot. A set of routes are compatible if they have no conflicts. Write a query to check that all the routes in the circuit called “Beginner” are compatible. Your query should return the sid that is causing the conflict.
- Write a query to compute the color of each consistent route. In part (a), you found the routes that violate consistency. Here, you only want the routes that are consistent, and then return (rid, name, circuit, color) tuples. In other words, if rid is associated with holds of more than one color, it should not be in the result. If all holds associated with rid are of the same color, return that color.
- Write a query to compute the number of routes of each color. Assume you have your result from (d) saved as a view called RouteColor. That is, you can refer to your result from (d) as a table called RouteColor, and you can assume the answer is correct. Note that the number ofholdsof each color is not what we want; we want the number ofroutes. (Hint: The query is very simple, thanks to the view!)
- Write a query that returns the highest hold for each route. The highest hold is the one placed in the slot with the highest y elevation. Your query can just return the hid.
Logical Database Design
- Your client at Climbr has proposed some minor changes to the schema, and provided the following ER diagram. is_critical indicates whether this hold is required to make the route traversable. Placement is a weak entity, and note the arrows on the Occupies relationship. Write a CREATE TABLE statements for the Placement table based on this ER diagram. The tables Route, Slot, and Hold can be assumed to be unchanged from what you provided above.
- You are given a spreadsheet called equipment_reservation with the following columns (name, status, gym, address, equipment, type, trainer, date)
You are given the following functional dependencies: equipment type(each piece
of equipment has a specific type) gym, type trainer(each gym has one
specialist trainer for each type of equipment) name status(each member has a
specific status) gym address(each gym has a specific address)
Decompose equipment_reservation into BCNF. In your final answer indicate the
key of each relation. If you show your work, you are more likely to earn
partial credit in the case of mistakes!
Relational Algebra
- Write or draw a relational algebra expression that corresponds to the following SQL query over the Climbr schema.
SELECT rh.rid, s.x, s.y
FROM RouteHold rh, Slot s, Hold h
WHERE s.sid=rh.sid
AND rh.hid = h.hid
AND s. x < 5
—|— - Here is a relational algebra plan, P0, over the Climbr schema.
- Does this plan produce the same result as P0?
- Does this plan produce the same result as P0?
- Does this plan produce the same result as P0?
- Does this plan produce the same result as P0?
- Does this plan produce the same result as P0?
- Does this plan produce the same result as P0?
- The following query plan was create using EXPLAIN for a query executed over the Climbr database from Question 1. Each physical operator in the plan has been annotated with the corresponding relational algebra operator. The “leaves” of the tree are annotated with the table name (which you can see even without the annotations – look at the smaller font size for the name of the table in brackets.)
Write a SQL query that could produce this query plan. You don’t have enough
information to know exactly what the query was, but you can make an
appropriate guess. - Here is another example, this time without annotations. Write a SQL query that could produce this query plan.
- Write a SQL query that could produce the following query plan.
- Write a SQL query that could produce the following query plan.
- Write a SQL query that could produce the following query plan.
- Write a SQL query that could produce the following query plan.
Query Optimization
Cardinality Estimation: Consider the following schema:
CREATE TABLE A (
a int PRIMARY KEY,
b int REFERENCES B
)
CREATE TABLE B (
b int PRIMARY KEY,
c int
)
—|—
You are given the following statistics about the data:
|A| = 1000
|B| = 100
V(B,c) = 4
Consider the following RA plan:
Provide numbers for each of the following to make the statement true (copy and
paste the questions into your response):
V(A,a) = ?
V(B,b) = ?
V(A,b) <= ?
Cardinality Estimation. Consider the following RA plan.
You are given the following statistics:
|Placement| = 10000
|Hold| = 1600
|Slot| = 5000
|Route| = 1000
V(Hold,color) = 8
Range(Slot,x) = [0, 125]
V(Route,circuit) = 100
V(Placement,hid) = 500
V(Placement,sid) = 400
V(Placement,rid) = 1000
Physical Database Design (Index Selection)
You manage the database for a supply chain company. The schema includes the
following table:
Shipment(shipid, customer, product, fromwarehouse, towarehouse, date, quantity)
Your workload consists of 100000 daily queries of type QA, 100000 daily
queries of type QB, and 1000 daily inserts of type QX
QA queries are of the form:
SELECT * FROM Shipment WHERE fromwarehouse = ? AND towarehouse = ? AND product = ? AND date BETWEEN ? AND ?
—|—
QB queries are of the form:
SELECT quantity FROM Shipment WHERE date = ? AND product = ?
—|—
QX queries are of the form:
INSERT INTO Shipment VALUES (?, ?, ?, ?, ?, ?, ?);
—|—
The DB Administrator is considering building zero or more of the following
indexes and has asked you for advice:
CREATE INDEX ship_customer ON Shipment(customer)
CREATE INDEX ship_date ON Shipment(date)
CREATE INDEX ship_product ON Shipment(product)
CREATE INDEX ship_warehouse ON Shipment(fromwarehouse, towarehouse)
CREATE INDEX ship_date_product ON Shipment(date, product)
CREATE INDEX ship_product_date ON Shipment(product, date)
CREATE INDEX ship_allbydateON Shipment(date, product, fromwarehouse, towarehouse, quantity, customer, shipid)
—|—
- Which of the above indexes that could be used by queries of type QA? (List their names)
- Which of the above indexes that could be used by queries of type QB? (List their names)
- Which indexes would you recommend building? (List their names)
Concepts
For each statement below, indicate whether it is true or false:
- An index is clustered when the physical table is sorted by the same attributes as are included in the index.
- A table can have more than one clustered index, but only one unclustered index.
- The query optimizer may choose a different algorithm for a join depending on statistics about the data, such as number of tuples in a table or number of unique values in an attribute.
- If R(A,B,C,D) satisfies the functional dependencies ABC and CA then C is a key.
- If R(A,B,C,D) satisfies the functional dependencies AB and BC and CD then A is a key.
- If R(A,B,C,D) satisfies the functional dependencies AB and BC and CD and DA then C is a key.
- This schedule isserial.
- This schedule isserializable.
- If your database has only a single user, there is no need for transactions.
- If we were to convert our Climbr database to JSON, we could choose to list Routes (with nested Holds), or we could choose to list Slots (with nested Routes). Both of these organizations could be valid in different situations.