Re: Ordered Partitioned Table Scans

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2018-12-19 10:26:14
Message-ID: CAOBaU_Y8xdYJt8L2CQwpXhWz5Ohvw-kxwyRUYQO3PjgMoMpPBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 19, 2018 at 10:51 AM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Wed, 19 Dec 2018 at 20:40, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> > If I understand correctly, the new behavior is controlled by
> > partitions_are_ordered(), but it only checks for declared partitions,
> > not partitions that survived pruning. Did I miss something or is it
> > the intended behavior?
>
> Yeah, it was mentioned up-thread a bit.
>
> I wrote:
> > I retrospectively read that thread after Amit mentioned about your
> > patch. I just disagree with Robert about caching this flag. The
> > reason is, if the flag is false due to some problematic partitions, if
> > we go and prune those, then we needlessly fail to optimise that case.
> > I propose we come back and do the remaining optimisations with
> > interleaved LIST partitions and partitioned tables with DEFAULT
> > partitions later, once we have a new "live_parts" field in
> > RelOptInfo. That way we can just check the live parts to ensure
> > they're compatible with the optimization. If we get what's done
> > already in then we're already a bit step forward.

Ah, sorry I did read this but I misunderstood it. I really need to
catchup what changed for partitioning since pg11 more thoroughly.

> The reason I'm keen to leave this alone today is that determining
> which partitions are pruned requires looking at each partition's
> RelOptInfo and checking if it's marked as a dummy rel. I'm trying to
> minimise the overhead of this patch by avoiding doing any
> per-partition processing. If we get the "live_parts" Bitmapset, then
> this becomes cheaper as Bitmapsets are fairly efficient at finding the
> next set member, even when they're large and sparsely populated.

I see. But since for now the optimisation will only be done
considering all partitions, I still think that it's better to store a
bool flag in the PartitionDesc to describe if it's natively ordered or
not, and therefore also handle the case for
non-intervleaved-multi-datums list partitioning. It won't add much
overhead and will benefit way more cases.

We can still revisit that when a live_parts Bitmapset is available in
RelOptInfo (and maybe other flag that say if partitions were pruned or
not, and/or if the default partition was pruned).

> > Also, generate_mergeappend_paths should
> > probably be renamed to something like generate_sortedappend_paths
> > since it can now generate either Append or MergeAppend paths.
>
> You might be right about naming this something else, but
> "sortedappend" sounds like an Append node with a Sort node above it.
> "orderedappend" feels slightly better, although my personal vote would
> be not to rename it at all. Sometimes generating an Append seems like
> an easy enough corner case to mention in the function body.

Ok, I don't have a very strong opinion on it and orderedappend sounds
less ambiguous.

> > I'm also wondering if that's ok to only generate either a (sorted)
> > Append or a MergeAppend. Is it possible that in some cases it's
> > better to have a MergeAppend rather than a sorted Append, given that
> > MergeAppend is parallel-aware and the sorted Append isn't?
>
> That might have been worth a thought if we had parallel MergeAppends,
> but we don't. You might be thinking of GatherMerge.

Ah, oups indeed :)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-12-19 11:02:38 Re: What to name the current heap after pluggable storage / what to rename?
Previous Message Thomas Munro 2018-12-19 10:15:56 Re: What to name the current heap after pluggable storage / what to rename?