Re: Data is copied twice when specifying both child and parent table in publication

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-27 23:01:59
Message-ID: 185c4ac8-fbc8-11a5-d721-55f639969eb8@timescale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2023 at 11:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> If the tests you have in mind are only related to this patch set then
> feel free to propose them here if you feel the current ones are not
> sufficient.

I think the new tests added by Wang cover my concerns (thanks!). I share
Peter's comment that we don't seem to have a regression test covering
only the bug description itself -- just ones that combine that case with
row and column restrictions -- but if you're all happy with the existing
approach then I have nothing much to add there.

I was staring at this subquery in fetch_table_list():

> + " ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> + " FROM pg_attribute a\n"
> + " WHERE a.attrelid = gpt.relid AND\n"
> + " a.attnum = ANY(gpt.attrs)\n"
> + " ) AS attnames\n"

On my machine this takes up roughly 90% of the runtime of the query,
which makes for a noticeable delay with a bigger test case (a couple of
FOR ALL TABLES subscriptions on the regression database). And it seems
like we immediately throw all that work away: if I understand correctly,
we only use the third column for its interaction with DISTINCT. Would it
be enough to just replace that whole thing with gpt.attrs?

Thanks,
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-03-27 23:03:42 Re: Parallel Full Hash Join
Previous Message Sandro Santilli 2023-03-27 22:58:47 Re: [PATCH] Support % wildcard in extension upgrade filenames