Quick Links

Get more from indices.

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Get more from indices.
Date:	2013-10-31 10:43:10
Message-ID:	20131031.194310.212107585.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello, This is the last episode of the 'dance with indices'?
series.

Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.

> uniquetest=# create table t (a int, b int, c int, d text);
> uniquetest=# create unique index i_t_pkey on t(a, b);
> uniquetest=# insert into t
> (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
> uniquetest=# analyze;
>
> uniquetest=# explain (costs off, analyze on) select distinct * from t;
> QUERY PLAN
> ---------------------------------------------------------------------------
> Unique (actual time=149.625..245.978 rows=100001 loops=1)
> -> Sort (actual time=149.624..199.806 rows=100001 loops=1)
> Sort Key: a, b, c, d
> Sort Method: external merge Disk: 2328kB
> -> Seq Scan on t (actual time=0.016..15.597 rows=100001 loops=1)
> Total runtime: 251.272 ms

With this patch, the plan for this query becomes as follows,

> uniquetest=# explain (costs off, analyze on) select distinct * from t;
> QUERY PLAN
> ---------------------------------------------------------------------
> Index Scan using i_t_pkey on t
> (actual time=0.019..32.457 rows=100001 loops=1)
> Total runtime: 37.488 ms

This only seems not so valuable to archive, but with my previous
MergeAppend for UNION patch, this will have more value.

> uniquetest=# create table pu1 (a int, b int, c int, d text);
> uniquetest=# create unique index pu1_pkey_idx on pu1 (a, b);
> uniquetest=# create table cu11 (like pu1 including all) inherits (pu1);
> uniquetest=# create table cu12 (like pu1 including all) inherits (pu1);
> uniquetest=# create table pu2 (like pu1 including all);
> uniquetest=# create table cu21 (like pu1 including all) inherits (pu2);
> uniquetest=# create table cu22 (like pu1 including all) inherits (pu2);
> uniquetest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11'
> from generate_series(000000, 099999) a);
> uniquetest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12'
> from generate_series(100000, 199999) a);
> uniquetest=# insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21'
> from generate_series(150000, 249999) a);
> uniquetest=# insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22'
> from generate_series(250000, 349999) a);
> uniquetest=# explain (analyze on, costs off)
> select * from pu1 union select * from pu2 order by a, b limit 100;
> QUERY PLAN
> --------------------------------------------------------------------------
> Limit (actual time=0.226..0.591 rows=100 loops=1)
> -> Unique (actual time=0.223..0.561 rows=100 loops=1)
> -> Merge Append (actual time=0.223..0.470 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> -> Limit (actual time=0.125..0.326 rows=100 loops=1)
> -> Unique (actual time=0.125..0.303 rows=100 loops=1)
> -> Merge Append (actual time=0.123..0.220 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> -> Index Scan using..on pu1 (actual time=0.005..0.005 rows=0 loops=1)
> -> Index Scan using..on cu11(actual time=0.071..0.128 rows=100 loops=1)
> -> Index Scan using..on cu12 (actual time=0.045..0.045 rows=1 loops=1)
> -> Limit (actual time=0.096..0.096 rows=1 loops=1)
> -> Unique (actual time=0.096..0.096 rows=1 loops=1)
> -> Merge Append (actual time=0.094..0.094 rows=1 loops=1)
> Sort Key: pu2.a, pu2.b, pu2.c, pu2.d
> -> Index Scan using..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
> -> Index Scan using..on cu21 (actual time=0.047..0.047 rows=1 loops=1)
> -> Index Scan using..on cu22 (actual time=0.043..0.043 rows=1 loops=1)
> Total runtime: 1.987 ms

On the other hand, what we will get from the unpatched PostgreSQL is,

> uniquetest=# explain (analyze on, costs off)
> select * from pu1 union select * from pu2 order by a, b limit 100;
> QUERY PLAN
> -------------------------------------------------------------------------
> Limit (actual time=535.620..535.706 rows=100 loops=1)
> -> Unique (actual time=535.618..535.695 rows=100 loops=1)
> -> Sort (actual time=535.617..535.637 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> Sort Method: external sort Disk: 10568kB
> -> Append (actual time=0.012..144.144 rows=400000 loops=1)
> -> Append (actual time=0.012..49.570 rows=200000 loops=1)
> -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
> -> Seq Scan on cu11 (actual time=0.011..16.202 rows=100000 loops=1)
> -> Seq Scan on cu12 (actual time=0.008..16.334 rows=100000 loops=1)
> -> Append (actual time=0.010..54.052 rows=200000 loops=1)
> -> Seq Scan on pu2 (actual time=0.000..0.000 rows=0 loops=1)
> -> Seq Scan on cu21 (actual time=0.009..16.444 rows=100000 loops=1)
> -> Seq Scan on cu22 (actual time=0.021..18.923 rows=100000 loops=1)
> Total runtime: 537.765 ms

What do you think about this?

This could be realized by following modifications,

- Consider a query pathekeys is useful for ordering when a index
pathkey is a subset of that when the index is UNIQUE.
(build_index_paths, pathkeys_useful_for_ordering,
truncate_useless_pathkeys, grouping_planner)

- Judge a path is orderd or not counting the uniqueness of the
path.

- Regard IndexPath on UNIQUE index is reagarded uniquely
ordered.

- Regard MergeAppendPath of which all children is uniquely
orderd as (not necessarily uniquely) ordered.
(path_is_ordered)

- Judge the necessity of sorting on above
premises. (grouping_planner)

Needless to say, theses patches have no effect on any other set
operations than UNION(ALL). They are, INTERSECT(ALL),
EXCEPT(ALL).

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
pathkey_and_uniqueidx_v1_20131031.patch	text/x-patch	12.2 KB

Responses

Re: Get more from indices. at 2013-10-31 14:59:45 from Tom Lane
Re: Get more from indices. at 2013-11-07 22:10:49 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Etsuro Fujita	2013-10-31 10:54:31	Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
Previous Message	Kyotaro HORIGUCHI	2013-10-31 10:42:51	Using indices for UNION.