RE: slow join on postgresql6.5

From: Wenjin Zheng <wenjin(dot)zheng(at)lsbc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: RE: slow join on postgresql6.5
Date: 2000-04-03 15:56:27
Message-ID: E3E65B3C4E12D311BE6900A0C9EC2FA11554AC@GENOMICS1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you all very much for the help. The post really solved my problem.
Your help is greatly appreciated.

Wenjin Zheng
Bioinformatic Analyst
Biosource Technologies, Inc.
3333 Vaca Valley Parkway
Vacaville, CA 95688
(707)469-2353
email: wenjin(dot)zheng(at)lsbc(dot)com

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Thursday, March 30, 2000 8:11 AM
To: Don Baccus
Cc: Wenjin Zheng; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] slow join on postgresql6.5

Don Baccus <dhogaza(at)pacifier(dot)com> writes:
> This is an example where commercial systems that have indices
> synchronized with data such that queries referencing only the
> fields in indices can win big vs. PG in SOME (not all) cases.
> In particular, when the indices are to a table that has a bunch
> of other, perhaps long, columns. PG has to read the table and
> drag all that dead weight around to do RI referential checking
> and semantic actions.

Keep in mind, though, that once we have TOAST the long columns are
likely to get pushed out to a secondary table, so that the amount
of data you have to read is reduced (as long as you don't touch
any of the long columns, of course).

The main reason that Postgres indexes can't be used without also
consulting the main table is that we do not store transaction status
information in index entries, only in real tuples. After finding
an index entry we must still consult the referenced tuple to see
if it's been deleted, or even committed yet. I believe this is a
pretty good tradeoff. If we kept a copy of the status info in the
index, then UPDATE and DELETE would get hugely slower and more
complex, since they'd have to be able to find and mark all the
index entries pointing at a tuple as well as the tuple itself.
The extra info would also increase the size of index entries,
which is bad because the point of an index is that reading it
takes less disk traffic than reading the underlying table.
(BTW, to return to the original thread, one of the biggest reasons
that indexes with many columns are a loser is that they're so big.)

I suppose that keeping tuple status in index entries could be a win
on nearly-read-only tables, but I think that on average it'd be
a performance loser.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2000-04-03 17:07:07 Re: caching query results
Previous Message The Hermit Hacker 2000-04-03 14:31:32 Re: AW: src/test/locale/de_DE.ISO-8859-1