Joins and links

From: Leon <leon(at)udmnet(dot)ru>
To: pgsql-general <pgsql-general(at)postgreSQL(dot)org>
Subject: Joins and links
Date: 1999-07-05 12:46:36
Message-ID: 13740.990705@udmnet.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello all!

You probably remember me - recently I complained about speed
of joins in Postgres. After a short investigation the way was
found in which Postgres's optimizer can do the right job. It
was constructive discussion. Now I want to tell you what could
make Postgres better and faster. And what will make us (our
development group) happy. Maybe I am bothering someone, if
I do - tell me that.

Let me begin.

First of all, some accounting reports need to be delivered
very fast - within minutes or so. And what's bad is that
quite a few of these reports are quite time-consuming and search
intensive. In particular, internals of these reports include
a lot of joins on tables.

Secondly, almost all of accounting information naturally
fits into network data model, which can be implemented very
efficiently.

This stuff described here is not accounting-specific, it
can be found in every database which uses master-detail
tables and other such types of relations.

So. How is join being performed in such cases? Although I am
not an expert, I can imagine the way: first there is an (index)
scan on first table, and then an (index) scan on the second.
It is the best way, reality could be much worse as we have seen.

How can we radically improve performance in such cases? There
is a simple and quite obvious way. (For you not to think that
I am hallucinating I will tell you that there exist some
real servers that offer such features I am talking about)
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.

Queries could look like this:

table1:
a int4
b link (->table2)

table2:
c int4
recnum (system auxiliary field, really a record number in the table)

select * from table2 where table1.a > 5 and table1.b = table2.recnum

Such joins can fly really fast, as practice shows :)
Just consider: the thing table1.b = table2.recnum is a READY-MADE
join, so server doesn't have to build anything on top of that. It
can simply perform lookup through link, and since it is a physical
record number, this is done with the efficiency of C pointers! Thus
performance gain is ENORMOUS.

And it simplifies the optimizer, because it doesn't have to decide
anything about whether to use indices and such like. The join is
performed always the same way, and it is the best way.

This feature, being implemented, could bring Postgres ahead
of most commercial servers, so proving creative abilities of
free software community. Let us make a step in the future!

Best regards,
Leon

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ansley, Michael 1999-07-05 13:18:59 EXPLAIN
Previous Message Tamas Nyitrai 1999-07-05 12:01:11 Re: [GENERAL] GROUP BY problem with 6.5

Browse pgsql-hackers by date

  From Date Subject
Next Message Herouth Maoz 1999-07-05 13:49:58 Re: [GENERAL] Joins and links
Previous Message Thomas Lockhart 1999-07-05 04:22:28 Re: Postgresql 6.5-1 rpms on RedHat 6.0