Re: rbtree code breaks GIN's adherence to maintenance_work_mem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: rbtree code breaks GIN's adherence to maintenance_work_mem
Date: 2010-07-31 16:02:15
Message-ID: 1377.1280592135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Jul 31, 2010 at 12:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So, I would like somebody to show cause why that whole module shouldn't
>> be ripped out and the code reverted to where it was in 8.4. My
>> recollection is that the argument for adding it was to speed things up
>> in corner cases, but what I think it's actually going to do is slow
>> things down in every case.

> I've always been a bit suspicious of this code, too, even though I
> didn't think about the memory consumption issue. But see here:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00307.php

I did a bit of experimentation and confirmed my fears: HEAD is willing
to eat about double the specified maintenance_work_mem. If you cut
back the setting so that its actual memory use is no more than 8.4's,
it's about 33% slower on non-pathological data (I'm testing the dataset
from Artur Dabrowski here). The referenced tests showing significant
speedup over 8.4 were presumably all done without controlling for memory
usage. I'm not sure how much those results need to be discounted, but
they can't be taken at face value.

> I think there may be a better way to avoid the pathological cases, but
> I'm not sure what it is.

After a bit of further study I think maybe something could be done
towards making rbtree.c less memory-hungry: it's basically been coded
with zero attention to memory efficiency. The RBNode structs are
individually palloc'd, which means that on a 64-bit machine they take
about 80 bytes apiece. The EntryAccumulator structs they are frontends
for are only 32 bytes apiece (plus the probably-pass-by-reference Datum
value, which we can't easily do anything with), and they are allocated in
groups to avoid palloc overhead. EntryAccumulator did get 16 bytes
smaller since 8.4 as a result of removing the left/right pointers that
the rbtree structure is substituting for, but that doesn't make up
for this.

I'm tempted to suggest that making RBNode be a hidden struct containing
a pointer to somebody else's datum is fundamentally the wrong way to
go about things, because the extra void pointer is pure overhead,
and we aren't ever going to be using these things in a context where
memory usage isn't of concern. If we refactored the API so that RBNode
was intended to be the first field of some larger struct, as is done in
dynahash tables for instance, we could eliminate the void pointer and
the palloc inefficiency. The added storage compared to what 8.4 used
would be a parent link and the iteratorState/color fields, which would
end up costing us 16 more bytes per EntryAccumulator rather than 64.
Still not great but at least it's not a 2X penalty, and the memory
allocation would become the caller's problem not rbtree's, so the
problem of tracking usage would be no different from before.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-31 16:02:23 Re: patch for check constraints using multiple inheritance
Previous Message Robert Haas 2010-07-31 15:46:52 Re: ANALYZE versus expression indexes with nondefault opckeytype