Re: Ordered Partitioned Table Scans

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2019-03-28 11:00:00
Message-ID: cd776832-c2a7-0d77-55b5-863cdb428864@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

On 2019/03/28 8:04, David Rowley wrote:
> On Wed, 27 Mar 2019 at 21:24, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Noticed a typo.
>>
>> + * multiple subpaths then we can't make guarantees about the
>> + * order tuples in those subpaths, so we must leave the
>>
>> order of tuples?
>
> I'll fix that. Thanks.
>
>> Again, sorry if this was discussed, but I got curious about why
>> partitions_are_ordered() thinks it can say true even for an otherwise
>> ordered list-partitioned table, but containing a null-only partition,
>> which is *always* scanned last. If a query says ORDER BY partkey NULLS
>> FIRST, then it's not alright to proceed with assuming partitions are
>> ordered even if partitions_are_ordered() said so.
>
> If it's *always* scanned last then it's fine for ORDER BY partkey
> NULLS LAST. If they have ORDER BY partkey NULLS FIRST then we won't
> match on the pathkeys.

Sorry, I had meant to say that null values may or may not appear in the
last partition depending on how the null-accepting partition is defined.
I see that the code in partitions_are_ordered() correctly returns false if
null partition is not the last one, for example, for:

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2_null partition of p for values in (2, null);
create table p3 partition of p for values in (3);

Maybe, a small comment regarding how that works correctly would be nice.

> Or if they do ORDER BY partkey DESC NULLS FIRST, then we're also fine,
> since we reverse the order of the subpaths list in that case. ORDER
> BY partkey DESC NULLS LAST is not okay, and we don't optimise that
> since it won't match the pathkeys we generate in
> build_partition_pathkeys().

OK.

>> Related, why does build_partition_pathkeys() pass what it does for
>> nulls_first parameter of make_pathkey_from_sortinfo()?
>>
>> cpathkey = make_pathkey_from_sortinfo(root,
>> keyCol,
>> NULL,
>> partscheme->partopfamily[i],
>> partscheme->partopcintype[i],
>> partscheme->partcollation[i],
>> ScanDirectionIsBackward(scandir),
>> ==> ScanDirectionIsBackward(scandir),
>> 0,
>> partrel->relids,
>> false);
>>
>> I think null values are almost always placed in the last partition, unless
>> the null-accepting list partition also accepts some other non-null value.
>> I'm not sure exactly how we can determine the correct value to pass here,
>> but what's there in the patch now doesn't seem to be it.
>
> The code looks okay to me. It'll generate pathkeys like ORDER BY
> partkey NULLS LAST for forward scans and ORDER BY partkey DESC NULLS
> FIRST for backwards scans.
>
> Can you explain what cases you think the code gets wrong?
>
> Here's a preview of the actual and expected behaviour:
>
> # explain (costs off) select * from listp order by a asc nulls last;
> QUERY PLAN
> ------------------------------------------------------------
> Append
> -> Index Only Scan using listp1_a_idx on listp1
> -> Index Only Scan using listp2_a_idx on listp2
> -> Index Only Scan using listp_null_a_idx on listp_null
> (4 rows)
>
>
> # explain (costs off) select * from listp order by a asc nulls first;
> -- not optimised
> QUERY PLAN
> ------------------------------------
> Sort
> Sort Key: listp1.a NULLS FIRST
> -> Append
> -> Seq Scan on listp1
> -> Seq Scan on listp2
> -> Seq Scan on listp_null
> (6 rows)
>
> # explain (costs off) select * from listp order by a desc nulls first;
> -- subpath list is simply reversed in this case.
> QUERY PLAN
> ---------------------------------------------------------------------
> Append
> -> Index Only Scan Backward using listp_null_a_idx on listp_null
> -> Index Only Scan Backward using listp2_a_idx on listp2
> -> Index Only Scan Backward using listp1_a_idx on listp1
> (4 rows)
>
>
> # explain (costs off) select * from listp order by a desc nulls last;
> -- not optimised
> QUERY PLAN
> --------------------------------------
> Sort
> Sort Key: listp1.a DESC NULLS LAST
> -> Append
> -> Seq Scan on listp1
> -> Seq Scan on listp2
> -> Seq Scan on listp_null
> (6 rows)

Ah, everything seems to be working correctly. Thanks for the explanation
and sorry about the noise.

> We could likely improve the two cases that are not optimized by
> putting the NULL partition in the correct place in the append
> subpaths, but for now, we don't really have an efficient means to
> identify which subpath that is. I've not looked at your partition
> planning improvements patch for a while to see if you're storing a
> Bitmapset of the non-pruned partitions in RelOptInfo. Something like
> that would allow us to make this better. Julien and I have talked
> about other possible cases to optimise if we have that. e.g if the
> default partition is pruned then we can optimise a RANGE partitioned
> table with a default. So there's definitely more to be done on this. I
> think there's a general consensus that what we're doing in the patch
> already is enough to be useful.

Certainly. When trying out various combinations of ORDER BY ASC/DESC NULL
FIRST/LAST yesterday, I wrongly thought the plan came out wrong in one or
two cases, but now see that that's not the case.

Also, the comment you added in the latest patch sheds some light on the
matter, so that helps too.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darafei Komяpa Praliaskouski 2019-03-28 11:36:58 Re: Berserk Autovacuum (let's save next Mandrill)
Previous Message Masahiko Sawada 2019-03-28 10:28:27 Re: Berserk Autovacuum (let's save next Mandrill)