Re: Polyphase merge is obsolete

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Polyphase merge is obsolete
Date: 2021-10-05 17:24:58
Message-ID: CAFBsxsEbWy=imDgm_HpQg2r98p5GnKBR0pkv4xBBbSNTNM8fRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 15, 2021 at 5:35 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Thanks, here's another rebase.
>
> - Heikki

I've had a chance to review and test out the v5 patches.

0001 is a useful simplification. Nothing in 0002 stood out as needing
comment.

I've done some performance testing of master versus both patches applied.
The full results and test script are attached, but I'll give a summary
here. A variety of value distributions were tested, with work_mem from 1MB
to 16MB, plus 2GB which will not use external sort at all. I settled on 2
million records for the sort, to have something large enough to work with
but also keep the test time reasonable. That works out to about 130MB on
disk. We have recent improvements to datum sort, so I used both single
values and all values in the SELECT list.

The system was on a Westmere-era Xeon with gcc 4.8. pg_prewarm was run on
the input tables. The raw measurements were reduced to the minimum of five
runs.

I can confirm that sort performance is improved with small values of
work_mem. That was not the motivating reason for the patch, but it's a nice
bonus. Even as high as 16MB work_mem, it's possible some of the 4-6%
differences represent real improvement and not just noise or binary
effects, but it's much more convincing at 4MB and below, with 25-30% faster
with non-datum integer sorts at 1MB work_mem. The nominal regressions seem
within the noise level, with one exception that only showed up in one set
of measurements (-10.89% in the spreadsheet). I'm not sure what to make of
that since it only happens in one combination of factors and nowhere else.

On Sat, Sep 11, 2021 at 4:28 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> + * Before PostgreSQL 14, we used the polyphase merge algorithm (Knuth's
> + * Algorithm 5.4.2D),
>
> I think the above 'Before PostgreSQL 14' should be 'Before PostgreSQL 15'
now that PostgreSQL 14 has been released.
>
> +static int64
> +merge_read_buffer_size(int64 avail_mem, int nInputTapes, int nInputRuns,
> + int maxOutputTapes)
>
> For memory to allocate, I think uint64 can be used (instead of int64).

int64 is used elsewhere in this file, and I see now reason to do otherwise.

Aside from s/14/15/ for the version as noted above, I've marked it Ready
for Committer.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
sort-bench-external-jcn.sh text/x-sh 4.9 KB
kway-merge-test-1.ods application/vnd.oasis.opendocument.spreadsheet 26.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-05 17:25:30 can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?
Previous Message Jaime Casanova 2021-10-05 17:24:47 Re: How to retain lesser paths at add_path()?