Re: rbtree code breaks GIN's adherence to maintenance_work_mem

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: rbtree code breaks GIN's adherence to maintenance_work_mem
Date: 2010-07-31 16:12:30
Message-ID: AANLkTi=CgdycmAKPP13wbpXNbDBVA8nyEPJEEa7YLov_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 31, 2010 at 12:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Jul 31, 2010 at 12:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> So, I would like somebody to show cause why that whole module shouldn't
>>> be ripped out and the code reverted to where it was in 8.4.  My
>>> recollection is that the argument for adding it was to speed things up
>>> in corner cases, but what I think it's actually going to do is slow
>>> things down in every case.
>
>> I've always been a bit suspicious of this code, too, even though I
>> didn't think about the memory consumption issue.  But see here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00307.php
>
> I did a bit of experimentation and confirmed my fears: HEAD is willing
> to eat about double the specified maintenance_work_mem.  If you cut
> back the setting so that its actual memory use is no more than 8.4's,
> it's about 33% slower on non-pathological data (I'm testing the dataset
> from Artur Dabrowski here).

That seems like a pretty serious regression.

> I'm tempted to suggest that making RBNode be a hidden struct containing
> a pointer to somebody else's datum is fundamentally the wrong way to
> go about things, because the extra void pointer is pure overhead,
> and we aren't ever going to be using these things in a context where
> memory usage isn't of concern.  If we refactored the API so that RBNode
> was intended to be the first field of some larger struct, as is done in
> dynahash tables for instance, we could eliminate the void pointer and
> the palloc inefficiency.  The added storage compared to what 8.4 used
> would be a parent link and the iteratorState/color fields, which would
> end up costing us 16 more bytes per EntryAccumulator rather than 64.
> Still not great but at least it's not a 2X penalty, and the memory
> allocation would become the caller's problem not rbtree's, so the
> problem of tracking usage would be no different from before.

Even if we do that, is it still going to be too much of a performance
regression overall?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-07-31 16:32:03 Re: rbtree code breaks GIN's adherence to maintenance_work_mem
Previous Message Tom Lane 2010-07-31 16:06:33 Re: ANALYZE versus expression indexes with nondefault opckeytype