Re: Progress on fast path sorting, btree index creation time

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, Jim Decibel! Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-08 15:57:38
Message-ID: 20120208155738.GD24440@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 08, 2012 at 01:33:30PM +0000, Peter Geoghegan wrote:
> It doesn't necessarily matter if we increase the size of the postgres
> binary by 10%, precisely because most of that is not going to be in
> play from one instant to the next. I'm thinking, in particular, of
> btree index specialisations, where it could make perfect sense to "go
> crazy". You cannot have a reasonable discussion about such costs
> without considering that they will perhaps never be paid, given any
> plausible workload. That's why the percentage that the postgres binary
> size has been shown to increase by really isn't pertinent at all. At
> best, it's a weak proxy for such costs, assuming you don't have a
> demonscene-like preoccupation with reducing binary size, and I don't
> believe that we should.

When you start a binary, your executable is mapped to a file system
binary, and you page-fault in the pages you need. Now, if your
optimization was alone in its own 4k (x86) virtual pages, and you never
called the functions, odds are you would not pay a penalty, aside from
distribution penalty, and perhaps a small penalty if useful code was
before and after your block.

The sort code expansion, however, is done in-line, in the middle of the
sort code, you are clearly are filling in 64-byte (x86) CPU cache lines
with type-specific expansion code for every sort case, whether we use
the code or not. Now, I don't think it is a big problem, and I think
the speedup is worth it for common data types, but we can't say the cost
is zero.

Saying it another way, having a binary in your file system that you
never call is not overhead except for storage, but in this case, the
sort code expansion is inside critical functions we are already calling.

Frankly, it is the cost that has kept use away from using such
optimizations for a long time. I recently posted that the zero-cost
optimizations are mostly completed for sort and COPY, and we have to
start considering non-zero-cost optimizations --- sad, but true.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-02-08 15:59:02 Re: Progress on fast path sorting, btree index creation time
Previous Message Tom Lane 2012-02-08 15:19:40 Re: ecpglib use PQconnectdbParams