Use quick select instead of qsort to get median

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Use quick select instead of qsort to get median
Date: 2021-07-22 12:07:06
Message-ID: OS0PR01MB5716A0A4B9CFC4D8A662C64994E49@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When I was writing an extension which need to get the median of an array, I
tried to find if postgres provide some api that can do that. I found all the
places in postgres invoke qsort() and then get the median. I was thinking can
we do better by using "quick select" and is it worth it.

Currently, there are some places[1] in the code that need the median and can
use "quick select" instead. And some of them(spg_box_quad_picksplit /
spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX.
So, Peronally, It's acceptable to introduce a quick select api to improve these
places.

Since most of the logic of "quick select" is similar to quick sort, I think
we can reuse the code in sort_template.h. We only need to let the sort stop
when find the target top Kth element.

Attach a POC patch about this idea. I did some simple performance tests, I can
see about 10% performance gain in this test[2].

Thoughts ?

[1]
1.
entry_dealloc
...
/* Record the (approximate) median usage */
if (i > 0)
pgss->cur_median_usage = entries[i / 2]->counters.usage;
2.
spg_box_quad_picksplit
qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles);
...
centroid->low.x = lowXs[median];

3.
spg_range_quad_picksplit
qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound),
...
centroid = range_serialize(typcache, &lowerBounds[median],

4.
spg_quad_picksplit
qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp);
...
centroid->x = sorted[median]->x;

[2]
drop table quad_box_tbl;
CREATE unlogged TABLE quad_box_tbl (id int, b box);
truncate quad_box_tbl ;
explain (verbose, analyze)INSERT INTO quad_box_tbl
SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5))
FROM generate_series(1, 1000000) x order by random();

-----test create index
drop index quad_box_tbl_idx;
CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

-------test results
PATCH:
Time: 2609.664 ms (00:02.610)

HEAD:
Time: 2903.765 ms (00:02.944)

Best regards,
Houzj

Attachment Content-Type Size
0001-test-use-quick-select-to-get-median.patch application/octet-stream 10.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-07-22 12:11:24 Re: window build doesn't apply PG_CPPFLAGS correctly
Previous Message Andrew Dunstan 2021-07-22 12:04:34 Re: window build doesn't apply PG_CPPFLAGS correctly