Re: Abbreviated keys for Numeric

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-03-21 00:50:18
Message-ID: CAM3SWZQ_ZyuSjNwXYvdvAoW8otTFhXYs7jQ1HLnZtuHYXzoMzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a revision of this patch, that I'm calling v2. What do you
think, Andrew?

I've cut the int32 representation/alternative !USE_FLOAT8_BYVAL
encoding scheme entirely, which basically means that 32-bit systems
don't get to have this optimization. 64-bit systems have been
commonplace now for about a decade. This year, new phones came out
with 64-bit architectures, so increasingly even people that work with
embedded systems don't care about 32-bit. I'm not suggesting that
legacy doesn't matter - far from it - but I care much less about
having the latest performance improvements on what are largely legacy
systems. Experience suggests that this is a good time of the cycle to
cut scope. The last commitfest has a way of clarifying what is
actually important.

It seems unwise to include what is actually a fairly distinct encoding
scheme, which the int32/ !USE_FLOAT8_BYVAL variant really was (the
same can't really be said for text abbreviation, since that can
basically work the same way on 32-bit systems, with very little extra
code). This isn't necessarily the right decision in general, but I
feel it's the right decision in the present climate of everyone
frantically closing things out, and feeling burnt out. I'm sorry that
I threw some of your work away, but since we both have other pressing
concerns, perhaps this is understandable. It may be revisited, or I
may lose the argument on this point, but going this way cuts the code
by about 30%, and makes me feel a lot better about the risk of
regressing marginal cases, since I know we always have 8 bytes to work
with. There might otherwise be a danger of regressing under tested
32-bit platforms, or indeed missing other bugs, and frankly I don't
have time to think about that right now.

Other than that, I've tried to keep things closer to the text opclass.
For example, the cost model now has a few debugging traces (disabled
by default). I have altered the ad-hoc cost model so that it no longer
concerns itself with NULL inputs, which seemed questionable (not least
since the abbreviation conversion function isn't actually called for
NULL inputs. Any attempt to track the full count within numeric code
therefore cannot work.). I also now allocate a buffer of scratch
memory separately from the main sortsupport object - doing one
allocation for all sortsupport state, bunched together as a buffer
seemed like a questionable micro-optimization. For similar reasons, I
avoid playing tricks in the VARATT_IS_SHORT() case -- my preferred
approach to avoiding palloc()/pfree() cycles is to simply re-use the
same buffer across calls to numeric_abbrev_convert(), and maybe risk
having to enlarge the relatively tiny buffer once or twice. In other
words, it works more or less the same way as it does with text
abbreviation.

It seemed unwise to silently disable abbreviation when someone
happened to build with DEC_DIGITS != 4. A static assertion now gives
these unusual cases the opportunity to make an informed decision about
either disabling abbreviation or not changing DEC_DIGITS in light of
the performance penalty, in a self-documenting way.

The encoding scheme is unchanged. I think that your conclusions on
those details were sound. Numeric abbreviation has a more compelling
cost/benefit ratio than even that of text. I easily managed to get the
same 6x - 7x improvement that you reported when sorting 10 million
random numeric rows.

Thanks
--
Peter Geoghegan

Attachment Content-Type Size
numeric_sortsup_v2.patch text/x-patch 15.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-03-21 01:21:29 Re: GSoC - Idea Discussion
Previous Message David G. Johnston 2015-03-21 00:50:03 Re: proposal: doc: simplify examples of dynamic SQL