Re: Using indices for UNION.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, peter_e(at)gmx(dot)net
Subject: Re: Using indices for UNION.
Date: 2013-11-22 07:59:27
Message-ID: 20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, As I was reconsidering after your advise, I came up with
an idea of hacking on query tree tranaformation phase in
subquery_planner, not on that of plan generation phase as
before. (And the older patch is totally scrapped:-)

Currently flatten_simple_union_all() flattens 'simple' UNION
'ALL' at the top level of 'current' query. I noticed that a
little modification there also could flatten simple UNION (w/o
ALL), and it has finally almost same effect regarding more usage
of indexes for UNION. Additionally, applying still more the
'UNION ALL and inheritance table' patch, even coveres UNION's on
inheritance tables.

This patch is far small and neat (though I say so myself :-) than
the previous ones. The following plans we got from unpatched
PostgreSQL.

original=# explain (analyze on, costs off)
| select a, b from cu11 union select a, b from cu12 order by a, b limit 10;
| QUERY PLAN
| -----------------------------------------------------------------------------
| Limit (actual time=272.252..272.259 rows=10 loops=1)
| -> Unique (actual time=272.251..272.258 rows=10 loops=1)
| -> Sort (actual time=272.249..272.252 rows=10 loops=1)
| Sort Key: cu11.a, cu11.b
| Sort Method: external sort Disk: 3520kB
| -> Append (actual time=0.020..70.935 rows=200000 loops=1)
| -> Seq Scan on cu11 (actual time=0.019..26.741 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.006..25.183 rows=100000 loops=1)
| Total runtime: 273.162 ms

And of course for * (= a, b, c, d) this,

original=# explain (analyze on, costs off)
| select * from cu11 union select * from cu12 order by a, b limit 10;
| QUERY PLAN
| ----------------------------------------------------------------------------
| Limit (actual time=234.880..234.891 rows=10 loops=1)
| -> Unique (actual time=234.879..234.889 rows=10 loops=1)
| -> Sort (actual time=234.878..234.881 rows=10 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| Sort Method: external sort Disk: 5280kB
| -> Append (actual time=0.017..49.502 rows=200000 loops=1)
| -> Seq Scan on cu11 (actual time=0.017..15.649 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.007..14.939 rows=100000 loops=1)
| Total runtime: 236.029 ms

We'll get the following plan changed with this patch plus the
latest "pathkey_and_uniqueindx_v4_20131122.patch" patcch. (It got
a small fix on pathkeys fixation for index paths)

https://commitfest.postgresql.org/action/patch_view?id=1280

patched=# explain (analyze on, costs off)
| select * from cu11 union select * from cu12 order by a, b limit 10;
| QUERY PLAN
| ------------------------------------------------------------------------------
| Limit (actual time=0.059..0.081 rows=10 loops=1)
| -> Unique (actual time=0.058..0.079 rows=10 loops=1)
| -> Merge Append (actual time=0.056..0.065 rows=10 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| -> Index Scan ... on cu11 (actual time=0.032..0.037 rows=10 loops=1)
| -> Index Scan ... on cu12 (actual time=0.021..0.021 rows=1 loops=1)
| Total runtime: 0.162 ms

Moreover, with the 'UNION ALL' patches , say,
union_inh_idx_typ1_v1_20131024.patch and
union_inh_idx_typ1_add_v1_20131024.patch, we'll get the following
plan for UNION on inhertance tables,

https://commitfest.postgresql.org/action/patch_view?id=1270

patched2=# explain (analyze on, costs off)
| select * from pu1 union select * from pu2 order by a, b limit 10;
| QUERY PLAN
| ---------------------------------------------------------------------------
| Limit (actual time=0.209..0.234 rows=10 loops=1)
| -> Unique (actual time=0.206..0.230 rows=10 loops=1)
| -> Merge Append (actual time=0.204..0.216 rows=10 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -> Index Scan..on pu1 (actual time=0.004..0.004 rows=0 loops=1)
| -> Index Scan..on cu11 (actual time=0.050..0.058 rows=10 loops=1)
| -> Index Scan..on cu12 (actual time=0.052..0.052 rows=1 loops=1)
| -> Index Scan..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
| -> Index Scan..on cu21 (actual time=0.046..0.046 rows=1 loops=1)
| -> Index Scan..on cu22 (actual time=0.046..0.046 rows=1 loops=1)
| Total runtime: 0.627 ms

The attached union_uses_idx_v4_20131122.patch is changed as
follows.

- Rebased to current master.

- Scrapped the whold old stuff.

- flatten_simple_union_all() is renamed to
flatten_simple_union() and modified to do so. UNION is
flattend only if sortClause exists and distinctClause is NIL.

======
Test tables can be created following the command below,

| create table pu1 (a int not null, b int not null, c int, d text);
| create unique index i_pu1_ab on pu1 (a, b);
| create table cu11 (like pu1 including all) inherits (pu1);
| create table cu12 (like pu1 including all) inherits (pu1);
| create table pu2 (like pu1 including all);
| create table cu21 (like pu2 including all) inherits (pu2);
| create table cu22 (like pu2 including all) inherits (pu2);
| insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
| insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
| insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
| insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
=====

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
union_uses_idx_v4_20131122.patch text/x-patch 6.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-11-22 08:11:47 Re: new unicode table border styles for psql
Previous Message Kyotaro HORIGUCHI 2013-11-22 07:59:10 Re: Get more from indices.