This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the…

This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the…

Question:

   

Transcribed Image Text:

This problem explores a variant of the external merge sort algorithm. Below in
the subproblems, you will calculate the cost of performing the modified external
merge sort.
Recall that sequential IO (i.e. involving reading from / writing to consecutive
pages) is generally much faster that random access 10 (any reading / writing that
is not sequential). Additionally, on newer memory technologies like SSD reading
data can be faster than writing data (if you want to read more about SSD access
patterns look here).
For example, if we read 8 consecutive pages from file A, this should be much faster
than reading 2 pages from A, then 4 pages from file B, then 2 pages from A.
In this problem, we will begin to model this, by assuming that 8 sequential
READS are “free”, i.e. the total cost of 8 sequential reads is 1 10. We will also
assume that the writes are always twice as expensive as a read. Sequential
writes are never free, therefore the cost of N writes is always 2N.
Please note the following:
• NO REPACKING: Consider the external merge sort algorithm using the basic
optimizations we present in class, but do not use the repacking optimization
covered in class. • ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for
output in a merge, e.g. a B-way merge would require B+1 buffer pages.
• REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer
values) into account in this problem. Note that we have sometimes omitted
these (for simplicity) in lecture.
• Consider worst case cost: In other words, if 2 reads could happen to be
sequential, but in general might not be, consider these random 10.
Question 2.1
Consider a modification of the external merge sort algorithm where reads are
always read in 8-page chunks (i.e. 8 pages sequentially at a time) so as to take
advantage of sequential reads. Calculate the cost of performing the external merge
sort for a setup having B+1=40 buffer pages and an unsorted input file with 320
pages.
Show the steps of your work and make sure to explain your reasoning if necessary.
Question 2.1.1
What is the exact 10 cost of splitting and sorting the files? As is standard we want
runs of size B+1.
Question 2.1.2
After the file is split and sorted, we can merge n runs into 1 using the merge
process. What is the largest n we could have, given reads are always read in 8-page
chunks? Note: this is known as the arity of the merge.
Question 2.1.3
How many passes of merging are required?
Question 2.1.4
What is the total 10 cost of running this external merge sort algorithm? Do not
forget to add in the remaining passes (if any) of merging.
Question 2.2
Now, we’ll generalize the reasoning above by writing formula that compute the
approximate cost of performing this version of external merge sort for a setup
having B+1 buffer pages, a file with N pages, and where we now read in P-page
chunks (replacing our fixed 8 page chunks in the previous section.
*Note: our approximation will be a small one- for simplicity, we’ll assume that each
pass of the merge phase has the same 10 cost. We’ll calculate the 10 cost for each merge phase and compute the total cost as the
product of the cost of reading in and writing out all the data (which we do each
pass), and the number of passes we’ll have to do. Even though this is an
approximation, make sure to take care of floor / ceiling operations- i.e. rounding
down / up to integer values properly! Importantly, to simplify your calculations,
you can assume:
• (B+1) %P == 0 (i.e. the buffer size is divisible by the chunk size)
N% (B + 1) == 0 (i.e. the file size is divisible by the buffer size)
Question 2.2.1
First, write the formula that computes the exact total 10 cost to create the initial
runs in terms of B, N, and P.
Question 2.2.2
Next, write the formula that computes the approximate total 10 cost to read in and
then write out all the data during one pass of the merge in terms of B, N, and P.
Question 2.2.3
Next, write the formula that computes the exact total number of passes we’ll need
to do in terms of B, N, and P.
Question 2.2.4
Finally, write the formula that computes the total cost in terms of B, N, and P.

Expert Answer:

Answer rating: 100% (QA)

2 1 Here are the steps to calculate the cost of performing the modified external merge sort for a setup having B 1 40 buffer pages and an unsorted input file with 320 pages Step 1 Determine the number
View the full answer