There are couple of problems that have already been mentioned:
- Using lists is not going to give the performance you are looking for. Even this sample implementation using vector is factor x50 faster than using lists, since it does in-place element swaps. For this reason my answer will include implementation using the array library
massiv
, rather than lists.
- I tend to find Haskell scheduler far from perfect for CPU bound tasks, so, as @Edward Kmett noted in his answer, we need a work stealing scheduler, which I conveniently implemented for the above mentioned library:
scheduler
-- A helper function that partitions a region of a mutable array.
unstablePartitionRegionM ::
forall r e m. (Mutable r Ix1 e, PrimMonad m)
=> MArray (PrimState m) r Ix1 e
-> (e -> Bool)
-> Ix1 -- ^ Start index of the region
-> Ix1 -- ^ End index of the region
-> m Ix1
unstablePartitionRegionM marr f start end = fromLeft start (end + 1)
where
fromLeft i j
| i == j = pure i
| otherwise = do
x <- A.unsafeRead marr i
if f x
then fromLeft (i + 1) j
else fromRight i (j - 1)
fromRight i j
| i == j = pure i
| otherwise = do
x <- A.unsafeRead marr j
if f x
then do
A.unsafeWrite marr j =<< A.unsafeRead marr i
A.unsafeWrite marr i x
fromLeft (i + 1) j
else fromRight i (j - 1)
{-# INLINE unstablePartitionRegionM #-}
Here is the actual in-place quicksort
quicksortMArray ::
(Ord e, Mutable r Ix1 e, PrimMonad m)
=> Int
-> (m () -> m ())
-> A.MArray (PrimState m) r Ix1 e
-> m ()
quicksortMArray numWorkers schedule marr =
schedule $ qsort numWorkers 0 (unSz (msize marr) - 1)
where
qsort n !lo !hi =
when (lo < hi) $ do
p <- A.unsafeRead marr hi
l <- unstablePartitionRegionM marr (< p) lo hi
A.unsafeWrite marr hi =<< A.unsafeRead marr l
A.unsafeWrite marr l p
if n > 0
then do
let !n' = n - 1
schedule $ qsort n' lo (l - 1)
schedule $ qsort n' (l + 1) hi
else do
qsort n lo (l - 1)
qsort n (l + 1) hi
{-# INLINE quicksortMArray #-}
Now if we look at the arguments numWorkers
and schedule
they are pretty opaque. Say if we supply 1
for the first argument and id
for the second one, we will simply have a sequential quicksort, but if we would have a function available to us that could schedule each task to be computed concurrently, then we would get a parallel implementation of a quicksort. Luckily for us massiv
provides it out of the box withMArray
:
withMArray ::
(Mutable r ix e, MonadUnliftIO m)
=> Array r ix e
-> (Int -> (m () -> m ()) -> MArray RealWorld r ix e -> m a)
-> m (Array r ix e)
Here is a pure version that will make a copy of an array and than sort it in palce using the computation strategy specified within the array itself:
quicksortArray :: (Mutable r Ix1 e, Ord e) => Array r Ix1 e -> Array r Ix1 e
quicksortArray arr = unsafePerformIO $ withMArray arr quicksortMArray
{-# INLINE quicksortArray #-}
Here comes the best part, the benchmarks. The order of results:
- Intro sort from
vector-algorithms
- In-place quicksort using vector from this answer
- Implementation in
C
, which I grabbed from this question
- Sequential quicksort using
massiv
- Same as above, but in parallel on a computer with a humble 3rd gen i7 quad core processor with hyperthreading
benchmarking QuickSort/Vector Algorithms
time 101.3 ms (93.75 ms .. 107.8 ms)
0.991 R2 (0.974 R2 .. 1.000 R2)
mean 97.13 ms (95.17 ms .. 100.2 ms)
std dev 4.127 ms (2.465 ms .. 5.663 ms)
benchmarking QuickSort/Vector
time 89.51 ms (87.69 ms .. 91.92 ms)
0.999 R2 (0.997 R2 .. 1.000 R2)
mean 92.67 ms (91.54 ms .. 94.50 ms)
std dev 2.438 ms (1.468 ms .. 3.493 ms)
benchmarking QuickSort/C
time 88.14 ms (86.71 ms .. 89.41 ms)
1.000 R2 (0.999 R2 .. 1.000 R2)
mean 90.11 ms (89.17 ms .. 93.35 ms)
std dev 2.744 ms (387.1 μs .. 4.686 ms)
benchmarking QuickSort/Array
time 76.07 ms (75.77 ms .. 76.41 ms)
1.000 R2 (1.000 R2 .. 1.000 R2)
mean 76.08 ms (75.75 ms .. 76.28 ms)
std dev 453.7 μs (247.8 μs .. 699.6 μs)
benchmarking QuickSort/Array Par
time 25.25 ms (24.84 ms .. 25.61 ms)
0.999 R2 (0.997 R2 .. 1.000 R2)
mean 25.13 ms (24.80 ms .. 25.75 ms)
std dev 991.6 μs (468.5 μs .. 1.782 ms)
Benchmarks are sorting 1,000,000 random Int64
s. If you'd like to see full code you can find it here: https://github.com/lehins/haskell-quicksort
To sum it up, we got a x3 time speed up on a quad core processor and 8 capabilities, which sounds pretty good to me. Thanks for this question, now I can add sorting function to massiv
;)
Edit
Note, that the accepted answer which uses lists instead of a more appropriate data structure for this problem such as a mutable array, is x100 times slower on the same input:
benchmarking List/random/List Par
time 2.712 s (2.566 s .. 3.050 s)
0.998 R2 (0.996 R2 .. 1.000 R2)
mean 2.696 s (2.638 s .. 2.745 s)
std dev 59.09 ms (40.83 ms .. 72.04 ms)
variance introduced by outliers: 19% (moderately inflated)