Re: Performance improvement for joins where outer side is unique

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-01-30 11:37:19
Message-ID: CAApHDvod_uCMoUPovdpXbNkw50O14x3wwKoJmZLxkbBn71zdEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 1 January 2015 at 02:47, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> Hi,
>
> I've been hacking a bit at the join code again... This time I've been
> putting some effort into optimising the case where the inner side of the
> join is known to be unique.
> For example, given the tables:
>
> create table t1 (id int primary key);
> create table t2 (id int primary key);
>
> And query such as:
>
> select * from t1 left outer join t2 on t1.id=t2.id;
>
> It is possible to deduce at planning time that "for each row in the outer
> relation, only 0 or 1 rows can exist in the inner relation", (inner being
> t2)
>

I've been hacking at this unique join idea again and I've now got it
working for all join types -- Patch attached.

Here's how the performance is looking:

postgres=# create table t1 (id int primary key);
CREATE TABLE
postgres=# create table t2 (id int primary key);
CREATE TABLE
postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# vacuum analyze;
VACUUM
postgres=# \q

Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id;

With Patch on master as of 32bf6ee

D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 78
latency average: 769.231 ms
tps = 1.288260 (including connections establishing)
tps = 1.288635 (excluding connections establishing)

Master as of 32bf6ee

D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 70
latency average: 857.143 ms
tps = 1.158905 (including connections establishing)
tps = 1.159264 (excluding connections establishing)

That's a 10% performance increase.

I still need to perform more thorough benchmarking with different data
types.

One weird thing that I noticed before is that in an earlier revision of the
patch in the executor's join Initialise node code, I had set the
unique_inner to true for semi joins and replaced the SEMI_JOIN check for a
unique_join check in the execute node for each join method. With this the
performance results barely changed from standard... I've yet to find out
why.

The patch also has added a property to the EXPLAIN (VERBOSE) output which
states if the join was found to be unique or not.

The patch also still requires a final pass of comment fix-ups. I've just
plain run out of time for now.

I'll pick this up in 2 weeks time.

Regards

David Rowley

Attachment Content-Type Size
unijoin_2015-01-31_9af9cfa.patch application/octet-stream 55.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-01-30 12:06:45 NULL checks of deferenced pointers in picksplit method of intarray
Previous Message Etsuro Fujita 2015-01-30 10:20:52 Odd behavior of updatable security barrier views on foreign tables