Re: speedup COPY TO for partitioned table.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speedup COPY TO for partitioned table.
Date: 2025-10-16 01:21:10
Message-ID: CAD21AoCW4pmwk3f0UJ_F5XJR_FDHrmsWNaFr43dYqgR0SU1BvQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 14, 2025 at 6:53 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Tue, Oct 14, 2025 at 4:08 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > > +-- Test COPY TO with a foreign table or when the foreign table is a partition
> > > > +COPY async_p3 TO stdout; --error
> > > > +ERROR: cannot copy from foreign table "async_p3"
> > > > +HINT: Try the COPY (SELECT ...) TO variant.
> > > >
> > > > async_p3 is a foreign table so it seems not related to this patch.
> > > >
> > > I replied in
> > > https://postgr.es/m/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ%40mail.gmail.com
> > > I kind of doubt anyone would submit a patch just to rewrite a coverage test for
> > > the sake of coverage itself. While we're here, adding nearby coverage tests
> > > should be fine?
> >
> > For me, it's perfectly fine to have patches just for improving the
> > test coverage and I think we have had such patches ever. Given this
> > patch expands the supported relation kind, I guess it makes sense to
> > cover other cases as well in this patch (i.e., foreign tables and
> > sequences) or to have a separate patch to increase the overall test
> > coverage of copyto.c.
> >
>
> Let's have a seperate patch to handle COPY test coverage.
>
> > >
> > > i just found out I ignored the case when partitioned tables have RLS.
> > > when exporting a partitioned table,
> > > find_all_inheritors will sort the returned partition by oid.
> > > in DoCopy, we can do the same:
> > > make a SortBy node for SelectStmt->sortClause also mark the
> > > RangeVar->inh as true.
> > > OR
> > > ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.
> > >
> > > please see the change I made in DoCopy.
> >
> > Good catch. However, I guess adding a SortBy node with "tableoid"
> > doesn't necessarily work in the same way as the 'COPY TO' using
> > find_all_inheritors():
> >
> > + /*
> > + * To COPY data from multiple partitions, we rely on the order of
> > + * the partitions' tableoids, which matches the order produced by
> > + * find_all_inheritors.
> > + */
> >
> > The table list returned by find_all_inheritors() is deterministic, but
> > it doesn't sort the whole list by their OIDs. If I understand
> > correctly, it retrieves all descendant tables in a BFS order. For
> > example, if I create the tables in the following sequence:
> >
> > create table p (i int) partition by list (i);
> > create table c12 partition of p for values in (1, 2) partition by list (i);
> > create table c12_1 partition of c12 for values in (1);
> > create table c12_2 partition of c12 for values in (2);
> > create table c3 partition of p for values in (3);
> > insert into p values (1), (2), (3);
> > alter table p enable row level security;
> > create policy policy_p on p using (i > 0);
> > create user test_user;
> > grant select on table p to test_user;
> >
> > I got the result without RLS:
> >
> > copy p to stdout;
> > 3
> > 1
> > 2
> >
> > whereas I got the results with RLS:
> >
> > copy p to stdout;
> > 1
> > 2
> > 3
> >
> > I think that adding SortBy doesn't help more than making the results
> > deterministic. Or we can re-sort the OID list returned by
> > find_all_inheritors() to match it. However, I'm not sure that we need
> > to make COPY TO results deterministic in the first place. It's not
> > guaranteed that the order of tuples returned from 'COPY TO rel' where
> > rel is not a partitioned table is sorted nor even deterministic (e.g.,
> > due to sync scans). If 'rel' is a partitioned table without RLS, the
> > order of tables to scan is deterministic but returned tuples within a
> > single partition is not. Given that sorting the whole results is not
> > cost free, I'm not sure that guaranteeing this ordering also for
> > partitioned tables with RLS would be useful for users.
> >
>
> I removed the "SortBy node", and also double checked the patch again.
> Please check the attached v18.

Thank you for updating the patch!

I've reviewed the patch and here is one review comment:

from->inh = false; /* apply ONLY */
+ if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
+ from->inh = true;

It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.

I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v18_masahiko_fix.patch application/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-16 01:25:45 Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath
Previous Message Michael Paquier 2025-10-16 01:18:39 Re: [BUG] temporary file usage report with extended protocol and unnamed portals