Re: Ordered Partitioned Table Scans

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2019-03-27 23:04:41
Message-ID: CAKJS1f8Oey0qPq80Xk7thhtOK2mPJ_pTFnnc6ARYX3T3cDo0GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 27 Mar 2019 at 21:24, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Noticed a typo.
>
> + * multiple subpaths then we can't make guarantees about the
> + * order tuples in those subpaths, so we must leave the
>
> order of tuples?

I'll fix that. Thanks.

> Again, sorry if this was discussed, but I got curious about why
> partitions_are_ordered() thinks it can say true even for an otherwise
> ordered list-partitioned table, but containing a null-only partition,
> which is *always* scanned last. If a query says ORDER BY partkey NULLS
> FIRST, then it's not alright to proceed with assuming partitions are
> ordered even if partitions_are_ordered() said so.

If it's *always* scanned last then it's fine for ORDER BY partkey
NULLS LAST. If they have ORDER BY partkey NULLS FIRST then we won't
match on the pathkeys.

Or if they do ORDER BY partkey DESC NULLS FIRST, then we're also fine,
since we reverse the order of the subpaths list in that case. ORDER
BY partkey DESC NULLS LAST is not okay, and we don't optimise that
since it won't match the pathkeys we generate in
build_partition_pathkeys().

> Related, why does build_partition_pathkeys() pass what it does for
> nulls_first parameter of make_pathkey_from_sortinfo()?
>
> cpathkey = make_pathkey_from_sortinfo(root,
> keyCol,
> NULL,
> partscheme->partopfamily[i],
> partscheme->partopcintype[i],
> partscheme->partcollation[i],
> ScanDirectionIsBackward(scandir),
> ==> ScanDirectionIsBackward(scandir),
> 0,
> partrel->relids,
> false);
>
> I think null values are almost always placed in the last partition, unless
> the null-accepting list partition also accepts some other non-null value.
> I'm not sure exactly how we can determine the correct value to pass here,
> but what's there in the patch now doesn't seem to be it.

The code looks okay to me. It'll generate pathkeys like ORDER BY
partkey NULLS LAST for forward scans and ORDER BY partkey DESC NULLS
FIRST for backwards scans.

Can you explain what cases you think the code gets wrong?

Here's a preview of the actual and expected behaviour:

# explain (costs off) select * from listp order by a asc nulls last;
QUERY PLAN
------------------------------------------------------------
Append
-> Index Only Scan using listp1_a_idx on listp1
-> Index Only Scan using listp2_a_idx on listp2
-> Index Only Scan using listp_null_a_idx on listp_null
(4 rows)

# explain (costs off) select * from listp order by a asc nulls first;
-- not optimised
QUERY PLAN
------------------------------------
Sort
Sort Key: listp1.a NULLS FIRST
-> Append
-> Seq Scan on listp1
-> Seq Scan on listp2
-> Seq Scan on listp_null
(6 rows)

# explain (costs off) select * from listp order by a desc nulls first;
-- subpath list is simply reversed in this case.
QUERY PLAN
---------------------------------------------------------------------
Append
-> Index Only Scan Backward using listp_null_a_idx on listp_null
-> Index Only Scan Backward using listp2_a_idx on listp2
-> Index Only Scan Backward using listp1_a_idx on listp1
(4 rows)

# explain (costs off) select * from listp order by a desc nulls last;
-- not optimised
QUERY PLAN
--------------------------------------
Sort
Sort Key: listp1.a DESC NULLS LAST
-> Append
-> Seq Scan on listp1
-> Seq Scan on listp2
-> Seq Scan on listp_null
(6 rows)

We could likely improve the two cases that are not optimized by
putting the NULL partition in the correct place in the append
subpaths, but for now, we don't really have an efficient means to
identify which subpath that is. I've not looked at your partition
planning improvements patch for a while to see if you're storing a
Bitmapset of the non-pruned partitions in RelOptInfo. Something like
that would allow us to make this better. Julien and I have talked
about other possible cases to optimise if we have that. e.g if the
default partition is pruned then we can optimise a RANGE partitioned
table with a default. So there's definitely more to be done on this. I
think there's a general consensus that what we're doing in the patch
already is enough to be useful.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-03-27 23:07:43 Re: Fix XML handling with DOCTYPE
Previous Message legrand legrand 2019-03-27 22:39:56 Re: Planning counters in pg_stat_statements (using pgss_store)