This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the…

Question:

Transcribed Image Text:

This problem explores a variant of the external merge sort algorithm. Below in

the subproblems, you will calculate the cost of performing the modified external

merge sort.

Recall that sequential IO (i.e. involving reading from / writing to consecutive

pages) is generally much faster that random access 10 (any reading / writing that

is not sequential). Additionally, on newer memory technologies like SSD reading

data can be faster than writing data (if you want to read more about SSD access

patterns look here).

For example, if we read 8 consecutive pages from file A, this should be much faster

than reading 2 pages from A, then 4 pages from file B, then 2 pages from A.

In this problem, we will begin to model this, by assuming that 8 sequential

READS are “free”, i.e. the total cost of 8 sequential reads is 1 10. We will also

assume that the writes are always twice as expensive as a read. Sequential

writes are never free, therefore the cost of N writes is always 2N.

Please note the following:

• NO REPACKING: Consider the external merge sort algorithm using the basic

optimizations we present in class, but do not use the repacking optimization

covered in class. • ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for

output in a merge, e.g. a B-way merge would require B+1 buffer pages.

• REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer

values) into account in this problem. Note that we have sometimes omitted

these (for simplicity) in lecture.

• Consider worst case cost: In other words, if 2 reads could happen to be

sequential, but in general might not be, consider these random 10.

Question 2.1

Consider a modification of the external merge sort algorithm where reads are

always read in 8-page chunks (i.e. 8 pages sequentially at a time) so as to take

advantage of sequential reads. Calculate the cost of performing the external merge

sort for a setup having B+1=40 buffer pages and an unsorted input file with 320

pages.

Show the steps of your work and make sure to explain your reasoning if necessary.

Question 2.1.1

What is the exact 10 cost of splitting and sorting the files? As is standard we want

runs of size B+1.

Question 2.1.2

After the file is split and sorted, we can merge n runs into 1 using the merge

process. What is the largest n we could have, given reads are always read in 8-page

chunks? Note: this is known as the arity of the merge.

Question 2.1.3

How many passes of merging are required?

Question 2.1.4

What is the total 10 cost of running this external merge sort algorithm? Do not

forget to add in the remaining passes (if any) of merging.

Question 2.2

Now, we’ll generalize the reasoning above by writing formula that compute the

approximate cost of performing this version of external merge sort for a setup

having B+1 buffer pages, a file with N pages, and where we now read in P-page

chunks (replacing our fixed 8 page chunks in the previous section.

*Note: our approximation will be a small one- for simplicity, we’ll assume that each

pass of the merge phase has the same 10 cost. We’ll calculate the 10 cost for each merge phase and compute the total cost as the

product of the cost of reading in and writing out all the data (which we do each

pass), and the number of passes we’ll have to do. Even though this is an

approximation, make sure to take care of floor / ceiling operations- i.e. rounding

down / up to integer values properly! Importantly, to simplify your calculations,

you can assume:

• (B+1) %P == 0 (i.e. the buffer size is divisible by the chunk size)

N% (B + 1) == 0 (i.e. the file size is divisible by the buffer size)

Question 2.2.1

First, write the formula that computes the exact total 10 cost to create the initial

runs in terms of B, N, and P.

Question 2.2.2

Next, write the formula that computes the approximate total 10 cost to read in and

then write out all the data during one pass of the merge in terms of B, N, and P.

Question 2.2.3

Next, write the formula that computes the exact total number of passes we’ll need

to do in terms of B, N, and P.

Question 2.2.4

Finally, write the formula that computes the total cost in terms of B, N, and P.

Expert Answer:

Answer rating: 100% (QA)

2 1 Here are the steps to calculate the cost of performing the modified external merge sort for a setup having B 1 40 buffer pages and an unsorted input file with 320 pages Step 1 Determine the number

View the full answer