Re: Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, "Jim \"Decibel!\" Nasby" <decibel(at)decibel(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-08 13:33:30
Message-ID: CAEYLb_UjoFUMdsJjOna4q4UfLq1FL7Hj49GPUhnKA5qPi9ZyYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It doesn't necessarily matter if we increase the size of the postgres
binary by 10%, precisely because most of that is not going to be in
play from one instant to the next. I'm thinking, in particular, of
btree index specialisations, where it could make perfect sense to "go
crazy". You cannot have a reasonable discussion about such costs
without considering that they will perhaps never be paid, given any
plausible workload. That's why the percentage that the postgres binary
size has been shown to increase by really isn't pertinent at all. At
best, it's a weak proxy for such costs, assuming you don't have a
demonscene-like preoccupation with reducing binary size, and I don't
believe that we should.

It would be difficult for me to measure such things objectively, but
I'd speculate that the proprietary databases have much larger binaries
than ours, while having far fewer features, precisely because they
started applying tricks like this a long time ago. You could counter
that their code bases probably look terrible, and you'd have a point,
but so do I.

On 8 February 2012 02:38, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I've been skeptical of this patch for a number of reasons, the major
> one of which is that most workloads spend only a very small amount of
> time in doing qucksorts, and most quicksorts are of very small amounts
> of data and therefore fast anyway.   It is easy to construct an
> artificial test case that does lots and lots of in-memory sorting, but
> in real life I think that's not the great part of what people use
> databases for.

Fair enough, but if that's true, then it's also true that the cost due
to cache marginalisation - the only cost that I think is worth
considering at all - is correspondingly a small fraction of that very
small amount of sorting.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-02-08 14:05:02 Re: 16-bit page checksums for 9.2
Previous Message Robert Haas 2012-02-08 13:23:33 Re: Text-any concatenation volatility acting as optimization barrier