Re: Bug in pg_dump --table and --exclude-table for declarative partition table handling.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in pg_dump --table and --exclude-table for declarative partition table handling.
Date: 2017-05-09 09:40:14
Message-ID: CAFjFpRdDCrk_DNk1+GHCLjmEZaKrVbQNsPrgKipA6TTbUmuXgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 9, 2017 at 12:43 PM, amul sul <sulamul(at)gmail(dot)com> wrote:
> Hi,
>
> Current pg_dump --exclude-table option excludes partitioned relation
> and dumps all its child relations and vice versa for --table option, which
> I think is incorrect.
>
> In this case we might need to explore all partitions and exclude or include
> from dump according to given pg_dump option, attaching WIP patch proposing
> same fix. Thoughts/Comments?
>

+1 for fixing this.

Since we didn't catch this issue earlier it looks like we don't have
testcases testing this scenario. May be we should add those in the
patch.

The way this patch has been written, there is a possibility that a
partition would be included multiple times in the list of tables, if
names of the partition and the parent table both match the pattern.
That's not a correctness issue, but it would slow down
simple_oid_list_member() a bit because of longer OID list.

I am not sure what would be the use of dumping a partitioned table
without its partitions or vice-versa. I think, instead, look
partitioned table and its partitions as a single unit as far as dump
is concerned. So, either we dump partitioned table along with all its
partitions or we don't dump any of those. In that case, we should
modify the query in expand_table_name_patterns(), to not include
partitions in the result. Rest of the code in the patch would take
care of including/excluding the partitions when the parent table is
included/excluded.

This patch looks up pg_inherits for every partitioned tables that it
encounters, which is costly. Instead, a fix with better performance is
to set a table dumpable based on the corresponding setting of its
parent. The parent is available in TableInfo structure, but the
comment there says that it's set only for dumpable objects. The
comment is correct since flagInhTables() skips the tables with dump =
false. May be we should modify this function not to skip the
partitions, find their parent tables and set the dump flat based on
the parents. This means that the immediate parent's dump flag should
be set correctly before any of its children are encountered by the
flagInhTables() for the case of multi-level partitioning to work. I
don't think we can guarantee that. May be we can modify
flagInhTables() to set the parent tables of parent if that's not done
already and then set the dump flag from top parent to bottom. If we do
that, we will have to add code in tflagInhTables() to skip the entries
whose parents have been set already since those might have been set
because of an earlier grand child.

Even better if we could dump the partitions along with partitioned
table instead of creating separate TableInfo entries for those. But I
think that's a lot of change.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-05-09 09:43:19 Re: Bug in pg_dump --table and --exclude-table for declarative partition table handling.
Previous Message Amit Langote 2017-05-09 09:29:18 Re: Bug in pg_dump --table and --exclude-table for declarative partition table handling.