MSIT 630 Database Systems
ANSWER
Assignment#3 MSIT 630 Database Systems
1. What are two advantages of encrypting data stored in the database? (2 points)
- Data Security: Encrypting data ensures that even if unauthorized users gain access to the database, they cannot read or misuse the data without the encryption keys. This helps protect sensitive information from breaches.
- Regulatory Compliance: Many industries are required to comply with data protection regulations such as GDPR, HIPAA, and PCI-DSS, which mandate data encryption to safeguard user information. Compliance helps avoid legal penalties and maintains customer trust.
2. RAID systems typically allow you to replace failed disks without stopping access to the system. Thus, the data in the failed disk must be rebuilt and written to the replacement disk while the system is in operation. Which of the RAID levels yields the least amount of interference between the rebuild and ongoing disk accesses? Explain your answer. (4 points)
RAID 10 (or RAID 1+0) yields the least amount of interference between the rebuild and ongoing disk accesses.
- Explanation: RAID 10 combines the features of RAID 1 (mirroring) and RAID 0 (striping). When a disk fails, the system can continue operating using the mirrored copy. The rebuilding process involves copying data from the mirror, which is relatively fast and does not significantly impact ongoing operations because the read and write operations can be distributed across multiple disks. This minimizes performance degradation during the rebuild.
3. In the sequential file organization, why is an overflow block used even if there is, at the moment, only one overflow record? (4 points)
An overflow block is used to handle records that do not fit into the original block due to lack of space.
- Explanation: Even if there is only one overflow record at the moment, future insertions may cause more records to overflow. By using an overflow block, the file organization system ensures that all overflow records are managed efficiently and can be easily accessed without disrupting the sequential order of the primary blocks. This maintains the integrity and performance of the sequential file organization.
4. For each of the following two B+ trees, show the steps involved in the following queries: (10 points)
(Note: Since the B+ trees are not provided, the explanation will be general)
a. Find records with a search-key value of 11:
- Start at the root and compare the search-key value with the keys.
- Follow the appropriate pointer to the child node.
- Continue this process until reaching a leaf node.
- Search the leaf node for the key value 11.
b. Find records with a search-key value between 11 and 19, inclusive:
- Start at the root and compare the search-key value with the keys.
- Follow the appropriate pointer to the child node.
- Continue this process until reaching a leaf node.
- Search the leaf node for key values between 11 and 19.
- If the range spans multiple leaf nodes, follow the linked list of leaf nodes to retrieve all relevant records.
5. What are the causes of bucket overflow in a hash file organization? What can be done to reduce the occurrence of bucket overflows? (4 points)
- Causes of Bucket Overflow:
- High Load Factor: When the number of records exceeds the number of available buckets, leading to more records being hashed to the same bucket.
- Poor Hash Function: An inefficient hash function that does not distribute records evenly across buckets.
- Skewed Data: Data with certain values occurring more frequently, causing some buckets to overflow while others remain underutilized.
- Reducing Bucket Overflows:
- Use a Better Hash Function: Design a hash function that distributes records more evenly.
- Increase the Number of Buckets: Allocate more buckets to reduce the load factor.
- Dynamic Hashing: Implement extendible or linear hashing to dynamically adjust the number of buckets based on the number of records.
6. Why is a hash structure not the best choice for a search key on which range queries are likely? (4 points)
A hash structure is not suitable for range queries because hash functions distribute records based on the hash value, not the order of the keys.
- Explanation: In a hash table, records with consecutive keys are likely to be placed in different buckets. This makes it inefficient to retrieve a range of records since the query must search through multiple buckets, negating the advantage of direct access provided by hashing. Range queries are better suited for data structures like B+ trees that maintain key order.
7. A drawback of cost-based optimization is the cost of optimization itself. Optimizers use heuristics to reduce the cost of optimization. Please describe at least three heuristic approaches for transforming relational-algebra queries. (4 points)
- Rule-Based Optimization: Applying a set of predefined rules to transform queries into a more efficient form, such as pushing selections and projections closer to the base relations to reduce the size of intermediate results.
- Join Ordering Heuristics: Using heuristics like “left-deep join trees” where the system evaluates joins in a specific order (e.g., smaller tables first) to minimize intermediate result sizes and reduce computation time.
- Predicate Pushdown: Moving filter conditions as close to the data source as possible to reduce the amount of data processed in later stages. This reduces the overall processing cost by filtering out unnecessary rows early.
8. Consider the following two transactions: (10 points)
T13:
read(A);
read(B);
if A = 0 then B := B + 1;
write(B);
T14:
read(B);
read(A);
if B = 0 then A := A + 1;
write(A);
Let the consistency requirement be A = 0 or B = 0, with A = 0 and B = 0 as the initial values.
a. Serial Execution:
- T13 followed by T14:
lua
Initial: A = 0, B = 0
T13: read(A), read(B), B := B + 1, write(B) -> A = 0, B = 1
T14: read(B), read(A), (B ≠ 0) -> no change to A -> A = 0, B = 1
Consistent: A = 0 or B = 0 (true)
- T14 followed by T13:
lua
Initial: A = 0, B = 0
T14: read(B), read(A), A := A + 1, write(A) -> A = 1, B = 0
T13: read(A), read(B), (A ≠ 0) -> no change to B -> A = 1, B = 0
Consistent: A = 0 or B = 0 (true)
b. Concurrent Execution Producing Nonserializable Schedule:
Initial: A = 0, B = 0
T13: read(A) -> A = 0
T14: read(B) -> B = 0
T13: read(B) -> B = 0
T14: read(A) -> A = 0
T13: B := B + 1 -> B = 1
T14: A := A + 1 -> A = 1
T13: write(B) -> A = 1, B = 1
T14: write(A) -> A = 1, B = 1
Inconsistent: A = 0 or B = 0 (false)
c. Concurrent Execution Producing Serializable Schedule:
- T13 completes before T14 starts:
lua
Initial: A = 0, B = 0
T13: read(A), read(B), B := B + 1, write(B) -> A = 0, B = 1
T14: read(B), read(A), (B ≠ 0) -> no change to A -> A = 0, B = 1
Consistent: A = 0 or B = 0 (true)
9. Consider the log in Figure 16.5 below. Suppose there is a crash just before the < T0 abort> log record is written out. Explain what would happen during recovery. Describe the redo and undo phase of the recovery algorithm and the log records to be added during recovery. (8 points)
(Note: Figure 16.5 is not provided, so the explanation will be generic based on common log-based recovery procedures.)
- Redo Phase:
- Scan the log forward from the last checkpoint to redo all operations recorded in the log.
- For each < T, X, V > record, set X to the new value V.
- This phase ensures all committed transactions’ effects are applied to the database.
- Undo Phase:
- Scan the log backward to undo the effects of uncommitted transactions.
- For each < T, X, old_value > record of uncommitted transactions, set X back to old_value.
- Write a < T abort > log record for each transaction that is undone.
During recovery:
- Redo: Ensure all operations of committed transactions up to the point of the crash are redone.
- Undo: Undo the changes of transactions that were not committed at the time of the crash.
QUESTION
Assignment#3 MSIT 630 Database Systems
Total: 50 points
1. What are two advantages of encrypting data stored in the database? (2 points)
2. RAID systems typically allow you to replace failed disks without stopping access to the
system. Thus, the data in the failed disk must be rebuilt and written to the replacement disk while
the system is in operation. Which of the RAID levels yields the least amount of interference
between the rebuild and ongoing disk accesses? Explain your answer. (4 points)
3. In the sequential file organization, why is an overflow block used even if there is, at the
moment, only one overflow record? (4 points)
4. For each of the following two B+ trees, show the steps involved in the following queries: (10
points) (Note: there are two B+ trees. You are supposed to answer question #a and #b for both
trees).
a. Find records with a search-key value of 11
b. Find records with a search-key value between 11 and 19, inclusive.
B-tree 1
B-tree 2
5. What are the causes of bucket overflow in a hash file organization? What can be done to
reduce the occurrence of bucket overflows? (4 points)
6. Why is a hash structure not the best choice for a search key on which range queries are likely?
(4 points)
7. A drawback of cost-based optimization is the cost of optimization itself. Optimizers use
heuristics to reduce the cost of optimization. Please describe at least three heuristic approaches
for transforming relational-algebra queries. (4 points)
8. Consider the following two transactions: (10 points)
T13: read(A);
read(B);
if A = 0 then B := B + 1;
write(B).
T14: read(B);
read(A);
if B = 0 then A := A + 1;
write(A).
Let the consistency requirement be A = 0 or B = 0, with A = 0 and B = 0 as the initial values.
a. Show that every serial execution involving these two transactions preserves the consistency of
the database.
b. Show a concurrent execution of T13 and T14 that produces a nonserializable schedule.
c. Is there a concurrent execution of T13 and T14 that produces a serializable schedule?
9. Consider the log in Figure 16.5 below. Suppose there is a crash just before the < T0 abort> log
record is written out. Explain what would happen during recovery. Describe the redo and undo
phase of the recovery algorithm and the log records to be added during recovery. (8 points)