Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date: 2015-12-18 19:22:02
Message-ID: CAM3SWZT94eM-0BFTcaCLcWGCL3ehpAGXFDJqs8M1L8YuquKTfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 18, 2015 at 9:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Attached revision updates both the main commit (the optimization), and
>> the backpatch commit (updated the contract).
>
> - /* abbreviation is possible here only for by-reference types */
> + /*
> + * Abbreviation is possible here only for by-reference types.
> Note that a
> + * pass-by-value representation for abbreviated values is forbidden, but
> + * that's a distinct, generic restriction imposed by the SortSupport
> + * contract.
>
> I think that you have not written what you meant to write here. I
> think what you mean is "Note that a pass-by-REFERENCE representation
> for abbreviated values is forbidden...".

You're right. Sorry about that.

> + /*
> + * If we produced only one initial run (quite likely if the total data
> + * volume is between 1X and 2X workMem), we can just use that
> tape as the
> + * finished output, rather than doing a useless merge. (This obvious
> + * optimization is not in Knuth's algorithm.)
> + */
> + if (state->currentRun == 1)
> + {
> + state->result_tape = state->tp_tapenum[state->destTape];
> + /* must freeze and rewind the finished output tape */
> + LogicalTapeFreeze(state->tapeset, state->result_tape);
> + state->status = TSS_SORTEDONTAPE;
> + return;
> + }
>
> I don't understand the point of moving this code. If there's some
> reason to do this after rewinding the tapes rather than beforehand, I
> think we should articulate that reason in the comment block.

I thought that was made clear by the 0001 commit message. Think of
what happens when we don't disable abbreviated in the final
TSS_SORTEDONTAPE phase should the "if (state->currentRun == 1)" path
have been taken (but *not* the path that also ends in
TSS_SORTEDONTAPE, when caller requires randomAccess but we spill to
tape, or any other case). What happens is: The code in 0002 gets
confused, and attempts to pass back a pointer value as an "abbreviated
key". That's a bug.

> The last hunk in your 0001 patch properly belongs in 0002.

You could certainly argue that the last hunk of 0001 belongs in 0002.
I only moved it to 0001 when I realized that we might as well keep the
branches in sync, since the ordering is insignificant from a 9.5
perspective (although it might still be tidier), and there is a need
to backpatch anyway. I'm not insisting on doing it that way.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2015-12-18 19:43:40 Fuzzy substring searching with the pg_trgm extension
Previous Message Peter Geoghegan 2015-12-18 19:04:10 Re: Refactoring speculative insertion with unique indexes a little