Re: Ordered Partitioned Table Scans

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2018-12-19 09:51:01
Message-ID: CAKJS1f-=-drzd7H1hFihPYgxyDKU+CRbZ-fMV7KFDPfouktW8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, 19 Dec 2018 at 20:40, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> I started to look at v5.

Thanks for giving this a look over.

> If I understand correctly, the new behavior is controlled by
> partitions_are_ordered(), but it only checks for declared partitions,
> not partitions that survived pruning. Did I miss something or is it
> the intended behavior?

Yeah, it was mentioned up-thread a bit.

I wrote:
> I retrospectively read that thread after Amit mentioned about your
> patch. I just disagree with Robert about caching this flag. The
> reason is, if the flag is false due to some problematic partitions, if
> we go and prune those, then we needlessly fail to optimise that case.
> I propose we come back and do the remaining optimisations with
> interleaved LIST partitions and partitioned tables with DEFAULT
> partitions later, once we have a new "live_parts" field in
> RelOptInfo. That way we can just check the live parts to ensure
> they're compatible with the optimization. If we get what's done
> already in then we're already a bit step forward.

The reason I'm keen to leave this alone today is that determining
which partitions are pruned requires looking at each partition's
RelOptInfo and checking if it's marked as a dummy rel. I'm trying to
minimise the overhead of this patch by avoiding doing any
per-partition processing. If we get the "live_parts" Bitmapset, then
this becomes cheaper as Bitmapsets are fairly efficient at finding the
next set member, even when they're large and sparsely populated.

> Also, generate_mergeappend_paths should
> probably be renamed to something like generate_sortedappend_paths
> since it can now generate either Append or MergeAppend paths.

You might be right about naming this something else, but
"sortedappend" sounds like an Append node with a Sort node above it.
"orderedappend" feels slightly better, although my personal vote would
be not to rename it at all. Sometimes generating an Append seems like
an easy enough corner case to mention in the function body.

> I'm also wondering if that's ok to only generate either a (sorted)
> Append or a MergeAppend. Is it possible that in some cases it's
> better to have a MergeAppend rather than a sorted Append, given that
> MergeAppend is parallel-aware and the sorted Append isn't?

That might have been worth a thought if we had parallel MergeAppends,
but we don't. You might be thinking of GatherMerge.

I've attached a v6 patch. The only change is the renamed the
generate_mergeappend_paths() function to
generate_orderedappend_paths(), and also the required comment updates
to go with it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v6-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-f.patch application/octet-stream 58.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2018-12-19 09:58:58 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Michael Paquier 2018-12-19 09:22:55 Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS