Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date: 2015-12-09 19:31:35
Message-ID: CA+Tgmob8C_gDeOywGqmpf2tghYR6qhGhf9Va54Ct4j=MMpQHZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 10, 2015 at 9:03 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> This needs a rebase, there are several conflicts in src/backend/executor/nodeAgg.c
>
> I attached a revised version of the second patch in the series, fixing
> this bitrot.
>
> I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the
> previous patch, where no final on-the-fly merge step is required (no
> merge step is required whatsoever, because replacement selection
> managed to produce only one run). The function mergeruns() previously
> only "abandoned" abbreviated ahead of any merge step iff there was
> one. When there was only one run (not requiring a merge) it happened
> to continue to keep its state consistent with abbreviated keys still
> being in use. It didn't matter before, because abbreviated keys were
> only for tuplesort to compare, but that's different now.
>
> That bug is fixed in this revision by reordering things within
> mergeruns(). The previous order of the two things that were switched
> is not at all significant (I should know, I wrote that code).

I find the references to a "void" representation in this patch to be
completely opaque. I see that there are some such references in
tuplesort.c already, and most likely they were put there by commits
that I did, so I guess I have nobody but myself to blame, but I don't
know what this means, and I don't think we should let this terminology
proliferate.

My understanding is that the "void" representation is intended to
whatever Datum we originally got, which might be a pointer. Why not
just say that instead of referring to it this way?

My understanding is also that it's OK if the abbreviated key stays the
same even though the value has changed, but that the reverse could
cause queries to return wrong answers. The first part of that
justifies why this is safe when no abbreviation is available: we'll
return an abbreviated value of 0 for everything, but that's fine.
However, using the original Datum (which might be a pointer) seems
unsafe, because two binary-identical values could be stored at
different addresses and thus have different pointer representations.

I'm probably missing something here, so clue me in...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-09 19:40:10 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Andres Freund 2015-12-09 19:28:24 Re: Logical replication and multimaster