Re: Using quicksort for every external sort run

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2016-01-29 11:41:32
Message-ID: CAD__OuhgUbZgszr-Ftf=bqgz-YwD60T5xMOikvXYiBQUCNpPqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 29, 2015 at 4:33 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>Attached is a revision that significantly overhauls the memory patch,
>with several smaller changes.

I just ran some tests on above patch. Mainly to compare
how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for
sorting.
I have 8GB of ram and ssd storage.

Settings and Results.
----------------------------
Work_mem= DEFAULT (4mb).
key width = 520.

CASE 1. Data is pre-sorted as per sort key order.

CASE 2. Data is sorted in opposite order of sort key.

CASE 3. Data is randomly distributed.

*Key length 520*

*Number of records* 3200000 6400000 12800000 25600000

1.7 GB 3.5GB 7 GB 14GB

*CASE 1*

*RS* 23654.677 35172.811 44965.442 106420.155
*Qsort* 14100.362 40612.829 101068.107 334893.391

*CASE 2*

*RS* 13427.378 36882.898 98492.644 310670.15
*Qsort* 12475.133 32559.074 100772.531 322080.602

*CASE 3*

*RS* 17202.966 45163.234 122323.299 337058.856
*Qsort* 12530.726 23343.753 59431.315 152862.837

If data is sorted as same as sort key order then current code performs
better than proposed patch
as sort size increases.

It appears new algo do not seem have any major impact if rows are presorted
in opposite order.

For randomly distributed order quick sort performs well when compared to
current sort method (RS).

======================================================
Now Increase the work_mem to 64MB and for 14 GB of data to sort.

CASE 1: We can see Qsort is able to catchup with current sort method(RS).
CASE 2: No impact.
CASE 3: RS is able to catchup with Qsort.

*CASE 1* *RS* 128822.735

*Qsort* 90857.496
*CSAE 2* *RS* *105631.775*

*Qsort* *105938.334*
*CASE 3* *RS* *152301.054*

*Qsort* *149649.347*

I think for long keys both old (RS) and new (Qsort) sort method has its own
characteristics
based on data distribution. I think work_mem is the key If properly set new
method(Qsort) will
be able to fit most of the cases. If work_mem is not tuned right it, there
are cases it can regress.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
Test Queries.sql application/sql 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-01-29 11:59:49 Re: [PATCH] Refactoring of LWLock tranches
Previous Message Oleg Bartunov 2016-01-29 11:15:23 Re: Fuzzy substring searching with the pg_trgm extension