From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates |
Date: | 2015-12-18 19:22:02 |
Message-ID: | CAM3SWZT94eM-0BFTcaCLcWGCL3ehpAGXFDJqs8M1L8YuquKTfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 18, 2015 at 9:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Attached revision updates both the main commit (the optimization), and
>> the backpatch commit (updated the contract).
>
> - /* abbreviation is possible here only for by-reference types */
> + /*
> + * Abbreviation is possible here only for by-reference types.
> Note that a
> + * pass-by-value representation for abbreviated values is forbidden, but
> + * that's a distinct, generic restriction imposed by the SortSupport
> + * contract.
>
> I think that you have not written what you meant to write here. I
> think what you mean is "Note that a pass-by-REFERENCE representation
> for abbreviated values is forbidden...".
You're right. Sorry about that.
> + /*
> + * If we produced only one initial run (quite likely if the total data
> + * volume is between 1X and 2X workMem), we can just use that
> tape as the
> + * finished output, rather than doing a useless merge. (This obvious
> + * optimization is not in Knuth's algorithm.)
> + */
> + if (state->currentRun == 1)
> + {
> + state->result_tape = state->tp_tapenum[state->destTape];
> + /* must freeze and rewind the finished output tape */
> + LogicalTapeFreeze(state->tapeset, state->result_tape);
> + state->status = TSS_SORTEDONTAPE;
> + return;
> + }
>
> I don't understand the point of moving this code. If there's some
> reason to do this after rewinding the tapes rather than beforehand, I
> think we should articulate that reason in the comment block.
I thought that was made clear by the 0001 commit message. Think of
what happens when we don't disable abbreviated in the final
TSS_SORTEDONTAPE phase should the "if (state->currentRun == 1)" path
have been taken (but *not* the path that also ends in
TSS_SORTEDONTAPE, when caller requires randomAccess but we spill to
tape, or any other case). What happens is: The code in 0002 gets
confused, and attempts to pass back a pointer value as an "abbreviated
key". That's a bug.
> The last hunk in your 0001 patch properly belongs in 0002.
You could certainly argue that the last hunk of 0001 belongs in 0002.
I only moved it to 0001 when I realized that we might as well keep the
branches in sync, since the ordering is insignificant from a 9.5
perspective (although it might still be tidier), and there is a need
to backpatch anyway. I'm not insisting on doing it that way.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Artur Zakirov | 2015-12-18 19:43:40 | Fuzzy substring searching with the pg_trgm extension |
Previous Message | Peter Geoghegan | 2015-12-18 19:04:10 | Re: Refactoring speculative insertion with unique indexes a little |