Lunes, Enero 30, 2012

MERGE SORT

 

Merge sort is an O(n log n) comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the implementation preserves the input order of equal elements in the sorted output. Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945.[1] A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and Neumann as early as 1948.

 

Merge sort



Merge-sort-example-300px.gif
An example of merge sort. First divide the list into the smallest unit (1 element), then compare each element with the adjacent list to sort and merge the two adjacent list. Finally all the elements are sorted and merged.
Class Sorting algorithm
Data structure Array
Worst case performance O(n log n)
Best case performance O(n log n) typical,
O(n) natural variant
Average case performance O(n log n)
Worst case space complexity O(n) auxiliary

 

 

Analysis

A recursive merge sort algorithm used to sort an array of 7 integer values. These are the steps a human would take to emulate merge sort (top-down).
In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm (apply the algorithm to two lists of half the size of the original list, and add the n steps taken to merge the resulting two lists). The closed form follows from the master theorem.
In the worst case, the number of comparisons merge sort makes is equal to or slightly smaller than (n ⌈lg n⌉ - 2⌈lg n + 1), which is between (n lg n - n + 1) and (n lg n + n + O(lg n)).[3]
For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches α·n fewer than the worst case where \alpha = -1 + \sum_{k=0}^\infty \frac1{2^k+1} \approx 0.2645.
In the worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case; merge sort always makes fewer comparisons than quicksort, except in extremely rare cases, when they tie, where merge sort's worst case is found simultaneously with quicksort's best case. In terms of moves, merge sort's worst case complexity is O(n log n)—the same complexity as quicksort's best case, and merge sort's best case takes about half as many iterations as the worst case.[citation needed]
Recursive implementations of merge sort make 2n − 1 method calls in the worst case, compared to quicksort's n, thus merge sort has roughly twice as much recursive overhead as quicksort. However, iterative, non-recursive implementations of merge sort, avoiding method call overhead, are not difficult to code. Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces).
Stable sorting in-place is possible but is more complicated, and usually a bit slower, even if the algorithm also runs in O(n log n) time (Katajainen, Pasanen & Teuhola 1996). One way to sort in-place is to merge the blocks recursively.[4] Like the standard merge sort, in-place merge sort is also a stable sort. Stable sorting of linked lists is simpler. In this case the algorithm does not use more space than that the already used by the list representation, but the O(log(k)) used for the recursion trace.
Merge sort is more efficient than quick sort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, and is thus popular in languages such as Lisp, where sequentially accessed data structures are very common. Unlike some (efficient) implementations of quicksort, merge sort is a stable sort as long as the merge operation is implemented properly.
Merge sort also has some demerits. One is its use of 2n locations; the additional n locations are commonly used because merging two sorted sets in place is more complicated and would need more comparisons and move operations. But despite the use of this space the algorithm still does a lot of work: The contents of m are first copied into left and right and later into the list result on each invocation of merge_sort (variable names according to the pseudocode above). An alternative to this copying is to associate a new field of information with each key (the elements in m are called keys). This field will be used to link the keys and any associated information together in a sorted list (a key and its related information is called a record). Then the merging of the sorted lists proceeds by changing the link values; no records need to be moved at all. A field which contains only a link will generally be smaller than an entire record so less space will also be used.
Another alternative for reducing the space overhead to n/2 is to maintain left and right as a combined structure, copy only the left part of m into temporary space, and to direct the merge routine to place the merged output into m. With this version it is better to allocate the temporary space outside the merge routine, so that only one allocation is needed. The excessive copying mentioned in the previous paragraph is also mitigated, since the last pair of lines before the return result statement (function merge in the pseudo code above) become superfluous.
Merge sort can also be done with merging more than two sublists at a time, using the n-way merge algorithm. However, the number of operations is approximately the same. Consider merging k sublists at a time, where for simplicity k is a power of 2. The recurrence relation becomes T(n) = k T(n/k) + O(n log k). (The last part comes from the merge algorithm, which when implemented optimally using a heap or self-balancing binary search tree, takes O (log k) time per element.) If you take the recurrence relation for regular merge sort (T(n) = 2T(n/2) + O(n)) and expand it out log2k times, you get the same recurrence relation. This is true even if k is not a constant.

Walang komento:

Mag-post ng isang Komento