Re: Abbreviated keys for Numeric

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-03-21 02:10:34
Message-ID: 87a8z7gokj.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Peter" == Peter Geoghegan <pg(at)heroku(dot)com> writes:

Peter> Attached is a revision of this patch, that I'm calling v2. What
Peter> do you think, Andrew?

"No." is I think the best summary of my response.

I strongly suggest whichever committer ends up looking at this consider
my original version unchanged in preference to this. The cost/benefit
decision of supporting abbreviation on 32bit platforms is a point that
can be debated (I strongly support retaining the 32bit code, obviously),
but the substantive changes here are actively wrong.

Peter> Other than that, I've tried to keep things closer to the text
Peter> opclass. For example, the cost model now has a few debugging
Peter> traces (disabled by default). I have altered the ad-hoc cost
Peter> model so that it no longer concerns itself with NULL inputs,
Peter> which seemed questionable (not least since the abbreviation
Peter> conversion function isn't actually called for NULL inputs. Any
Peter> attempt to track the full count within numeric code therefore
Peter> cannot work.).

This is simply wrong. The reason why the cost model (in my version)
tracks non-null values by having its own counter is precisely BECAUSE
the passed-in memtupcount includes nulls, and therefore the code will
UNDERESTIMATE the fraction of successfully abbreviated values if the
comparison is based on memtupcount. In your version, if there are 9999
null values at the start of the input, then on the first non-null value
after that, memtupcount will be 10000 and there will be only 1 distinct
abbreviated value, causing abbreviation to spuriously abort.

The test to clamp the estimate to 1.0 is just nonsensical and serves no
purpose whatever, and the comment for it is wrong.

You should fix the text abbreviation code, not propagate your mistakes
further.

(BTW, there's an outright typo in your code, ';;' for ';' at the end of
a line. Sloppy.)

Peter> I also now allocate a buffer of scratch memory separately from
Peter> the main sortsupport object - doing one allocation for all
Peter> sortsupport state, bunched together as a buffer seemed like a
Peter> questionable micro-optimization.

It's yet another cache line... I admit I did not benchmark that choice,
but then neither did you.

Peter> It seemed unwise to silently disable abbreviation when someone
Peter> happened to build with DEC_DIGITS != 4. A static assertion now
Peter> gives these unusual cases the opportunity to make an informed
Peter> decision about either disabling abbreviation or not changing
Peter> DEC_DIGITS in light of the performance penalty, in a
Peter> self-documenting way.

A) Nobody in their right minds is ever going to do that anyway

B) Anybody who does that is either not concerned about performance or is
concerned only about performance of the low-level numeric ops, and
abbreviation is the last thing they're going to be worried about in
either case.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-03-21 02:16:00 Re: pg_recvlogical description
Previous Message Bruce Momjian 2015-03-21 02:02:26 Re: proposal: doc: simplify examples of dynamic SQL