Re: Sort performance cliff with small work_mem

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sort performance cliff with small work_mem
Date: 2018-05-02 17:43:44
Message-ID: 8aa3334f-49f7-e797-5b55-6dc7cf031500@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/05/18 19:41, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, May 2, 2018 at 11:38 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> To fix, I propose that we change the above so that we always subtract
>>> tapeSpace, but if there is less than e.g. 32 kB of memory left after that
>>> (including, if it went below 0), then we bump availMem back up to 32 kB. So
>>> we'd always reserve 32 kB to hold the tuples, even if that means that we
>>> exceed 'work_mem' slightly.
>
>> Sounds very reasonable.
>
> Agreed. I think that was my code to start with, and the issue certainly
> didn't occur to me at the time.
>
> I don't like the idea of using hardwired "32kB" though; some multiple
> of TAPE_BUFFER_OVERHEAD seems more plausible. Perhaps MINORDER times
> TAPE_BUFFER_OVERHEAD would be good?

I don't think the amount that we reserve to hold the tuples should
depend on those things. The function is "allocating" memory for the tape
buffers, yes, but now we're talking about keeping some memory for the
tuples, while quicksorting the initial runs. That seems independent from
the number of tapes or the tape buffer size.

I'm not sure what you could derive that from, to make it less arbitrary.
At the moment, I'm thinking of just doing something like this:

/*
* Minimum amount of memory reserved to hold the sorted tuples in
* TSS_BUILDRUNS phase. This specifies a minimum size for the merge runs,
* when work_mem is very small.
*/
#define MIN_TUPLE_MEMORY (32 * 1024)

Independently of this, perhaps we should put in special case in
dumptuples(), so that it would never create a run with fewer than
maxTapes tuples. The rationale is that you'll need to hold that many
tuples in memory during the merge phase anyway, so it seems silly to
bail out before that while building the initial runs. You're going to
exceed work_mem by the roughly same amount anyway, just in a different
phase. That's not the case in this example, but it might happen when
sorting extremely wide tuples.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-05-02 17:48:37 Re: Sort performance cliff with small work_mem
Previous Message Joshua D. Drake 2018-05-02 17:05:43 GSOC 2018