Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Date: 2014-01-22 05:07:00
Message-ID: CAKuK5J2R44SwGPyKJtrDZfGbWZ44CM1HHvhfzJP_ngz3MGdNWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2014 at 9:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> writes:
>> A rough summary of the patch follows:
>
>> - a GUC variable enables or disables this capability
>> - in nodeAgg.c, eliding duplicate tuples is enabled if the number of
>> distinct columns is equal to the number of sort columns (and both are
>> greater than zero).
>> - in createplan.c, eliding duplicate tuples is enabled if we are
>> creating a unique plan which involves sorting first
>> - ditto planner.c
>> - all of the remaining changes are in tuplesort.c, which consist of:
>> + a new macro, DISCARDTUP and a new structure member, discardtup, are
>> both defined and operate similar to COMPARETUP, COPYTUP, etc...
>> + in puttuple_common, when state is TSS_BUILDRUNS, we *may* simply
>> throw out the new tuple if it compares as identical to the tuple at
>> the top of the heap. Since we're already performing this comparison,
>> this is essentially free.
>> + in mergeonerun, we may discard a tuple if it compares as identical
>> to the *last written tuple*. This is a comparison that did not take
>> place before, so it's not free, but it saves a write I/O.
>> + We perform the same logic in dumptuples
>
> [ raised eyebrow ... ] And what happens if the planner drops the
> unique step and then the sort doesn't actually go to disk?

I'm not familiar enough with the code to be able to answer your
question with any sort of authority, but I believe that if the state
TSS_BUILDRUNS is never hit, then basically nothing new happens.

--
Jon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-01-22 05:52:39 Re: Funny representation in pg_stat_statements.query.
Previous Message Amit Kapila 2014-01-22 05:00:44 Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it