Re: Partitioned tables and relfilenode

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-21 17:21:06
Message-ID: CANP8+jLYZ7NMjXzZH+MH57ME3Hq5hR-y9hfczwgH6wnTVqMgdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 March 2017 at 16:33, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 16 March 2017 at 10:03, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> On 2017/03/15 7:09, Robert Haas wrote:
>>
>>>> I think that eliding the Append node when there's only one child may
>>>> be unsafe in the case where the child's attribute numbers are
>>>> different from the parent's attribute numbers. I remember Tom making
>>>> some comment about this when I was working on MergeAppend, although I
>>>> no longer remember the specific details.
>>>
>>> Append node elision does not occur in the one-child case. With the patch:
>> ...
>>> create table q1 partition of q for values in (1);
>>> explain select * from q;
>>> QUERY PLAN
>>> ------------------------------------------------------------
>>> Append (cost=0.00..35.50 rows=2550 width=4)
>>> -> Seq Scan on q1 (cost=0.00..35.50 rows=2550 width=4)
>>> (2 rows)
>>>
>>> Maybe that should be done, but this patch doesn't implement that.
>>
>> Robert raises the possible problem that removing the Append wouldn't
>> work when the parent and child attribute numbers don't match. Surely
>> that never happens with partitions, by definition?
>
> No, the attribute numbers don't have to match. This decision was made
> a long time ago, and there have been a whole bunch of followup commits
> since the original partitioning patch that were dedicated to fixing up
> cases where that wasn't working properly in the original commit. It
> seems superficially attractive to require the attribute numbers to
> match, but it creates some really unpleasant cases. For example,
> suppose a user tries to creates a partitioned table, drops a column,
> then creates a standalone table which matches the apparent column list
> of the partitioned table, then tries to attach it as a partition.
>
> ERROR: the columns you previously dropped from the parent that you
> can't see and don't know about aren't the same as the ones dropped
> from the standalone table you're trying to attach as a partition
> DETAIL: Try recreating your proposed new partition with a
> pass-by-value column of width 8 after the third column, and then
> dropping that column before trying to attach it as a partition.
> HINT: Abandon all hope, ye who enter here.
>
> Not cool with that.

Thanks for the explanation.

> The decision not to require the attribute numbers to match doesn't
> necessarily mean we can't get rid of the Append node, though. First
> of all, in a lot of practical cases the attribute numbers will all
> match. Second, if they don't, the most that would be required is a
> projection step, which could usually be done without a separate node
> because most nodes are projection-capable. And maybe not even that
> much is needed; I'd have to go back and look at what Tom was worried
> about the last time this came up. (Hmm, maybe the problem had to do
> with varnos matching, rather then attribute numbers?)

There used to be some code there to fix them up, not sure where that went.

> Another and independent problem with eliding the Append node is that,
> if we did that, we'd still have to guarantee that the parent relation
> corresponding to the Append node got locked somehow. Otherwise, we'll
> be accessing the tuple routing information for a table on which we
> don't have a lock. That's probably a solvable problem, too, but it
> hasn't been solved yet.

Hmm, why would we need to access tuple routing information?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-03-21 17:23:06 Re: GUC for cleanup indexes threshold.
Previous Message Bruce Momjian 2017-03-21 17:19:00 Re: Patch: Write Amplification Reduction Method (WARM)