Re: Performance improvement for joins where outer side is unique

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-10-14 05:03:56
Message-ID: CAFj8pRBDTGh9EJ1BnfTTGMXdb81z-+ec1Gt0fkSKtE0aPOG7yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-10-13 23:28 GMT+02:00 David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:

> On 4 September 2015 at 04:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>
>> Also: very nice performance results.
>>
>>
> Thanks.
>
> On following a thread in [General] [1] it occurred to me that this patch
> can give a massive improvement on Merge joins where the mark and restore
> causes an index scan to have to skip over many filtered rows again and
> again.
>
> I mocked up some tables and some data from the scenario on the [General]
> thread:
>
> create table customers (id bigint, group_id bigint not null);
> insert into customers select x.x,x.x%27724+1 from
> generate_series(1,473733) x(x);
> alter table customers add constraint customer_pkey primary key (id);
> create table balances (id bigint, balance int not null, tracking_number
> int not null, customer_id bigint not null);
> insert into balances select x.x, 100, 12345, x.x % 45 + 1 from
> generate_Series(1,16876) x(x);
> create index balance_customer_id_index on balances (customer_id);
> create index balances_customer_tracking_number_index on balances
> (customer_id,tracking_number);
> analyze;
>
> Unpatched I get:
>
> test=# explain analyze SELECT ac.* FROM balances ac join customers o ON
> o.id = ac.customer_id WHERE o.group_id = 45;
>
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual
> time=6.110..1491.408 rows=375 loops=1)
> Merge Cond: (ac.customer_id = o.id)
> -> Index Scan using balance_customer_id_index on balances ac
> (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..5.206
> rows=16876 loops=1)
> -> Index Scan using customer_pkey on customers o (cost=0.42..16062.75
> rows=17 width=8) (actual time=0.014..1484.382 rows=376 loops=1)
> Filter: (group_id = 45)
> Rows Removed by Filter: 10396168
> Planning time: 0.207 ms
> Execution time: 1491.469 ms
> (8 rows)
>
> Patched:
>
> test=# explain analyze SELECT ac.* FROM balances ac join customers o ON
> o.id = ac.customer_id WHERE o.group_id = 45;
>
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual
> time=6.037..11.528 rows=375 loops=1)
> Merge Cond: (ac.customer_id = o.id)
> -> Index Scan using balance_customer_id_index on balances ac
> (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..4.978
> rows=16876 loops=1)
> -> Index Scan using customer_pkey on customers o (cost=0.42..16062.75
> rows=17 width=8) (actual time=0.015..5.141 rows=2 loops=1)
> Filter: (group_id = 45)
> Rows Removed by Filter: 27766
> Planning time: 0.204 ms
> Execution time: 11.575 ms
> (8 rows)
>
> Now it could well be that the merge join costs need a bit more work to
> avoid a merge join in this case, but as it stands as of today, this is your
> performance gain.
>
> Regards
>

it is great

Pavel

>
> David Rowley
>
> [1]
> http://www.postgresql.org/message-id/CACZYdDiAxEAM2RkMHBMuhwVcM4txH+5E3HQGgGyzfzbn-PnvFg@mail.gmail.com
>
> --
> David Rowley http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-10-14 05:08:19 Re: Support for N synchronous standby servers - take 2
Previous Message Michael Paquier 2015-10-14 04:34:42 Re: Support for N synchronous standby servers - take 2