Re: slow join on postgresql6.5

From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Wenjin Zheng <wenjin(dot)zheng(at)lsbc(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: slow join on postgresql6.5
Date: 2000-03-31 03:00:05
Message-ID: 3.0.1.32.20000330190005.00f683ec@mail.pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 11:10 AM 3/30/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza(at)pacifier(dot)com> writes:
>> This is an example where commercial systems that have indices
>> synchronized with data such that queries referencing only the
>> fields in indices can win big vs. PG in SOME (not all) cases.
>> In particular, when the indices are to a table that has a bunch
>> of other, perhaps long, columns. PG has to read the table and
>> drag all that dead weight around to do RI referential checking
>> and semantic actions.
>
>Keep in mind, though, that once we have TOAST the long columns are
>likely to get pushed out to a secondary table, so that the amount
>of data you have to read is reduced (as long as you don't touch
>any of the long columns, of course).

Sure...and you can BLOB or CLOB longer data in Oracle, too. TOASTing
isn't without costs, either...life's a tradeoff!

>
>The main reason that Postgres indexes can't be used without also
>consulting the main table is that we do not store transaction status
>information in index entries, only in real tuples. After finding
>an index entry we must still consult the referenced tuple to see
>if it's been deleted, or even committed yet. I believe this is a
>pretty good tradeoff.

I must wonder, though, given that proper syncing seems to be the
norm in commercial systems. Or so I'm lead to believe when Gray's
book, for instance. Or a good book on speeding up Oracle queries.

Whatever ... in this particular case - referential integrity
with MATCH <unspecified> and MATCH PARTIAL and multi-column
foreign keys - performance will likely drop spectacularly once the
leading column is NULL, while (say) with Oracle you'd expect much
less of a performance hit.

The point of my note is that this is probably worth documenting.

Don't get me wrong, these semantics and RI and multi-column keys
appear to be pretty inefficient by nature, I don't think anyone
is likely to be horrified to read that it might well be even worse
in PG than in certain commercial systems...

>I suppose that keeping tuple status in index entries could be a win
>on nearly-read-only tables, but I think that on average it'd be
>a performance loser.

Well...I've personally not studied the issue in detail, but just
have to wonder if the folks at Oracle are really as stupid as the
above analysis would make them appear to be. I presume that they
have a pretty good idea of the kind of mix large database installations
make, and presumably make choices designed to win on average.

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Baccus 2000-03-31 03:08:56 Re: slow join on postgresql6.5
Previous Message Bruce Momjian 2000-03-30 23:45:23 Re: Release schedule (was Re: Improvement in SET commandsyntax)