Quick Links

Re: Data is copied twice when specifying both child and parent table in publication

From:	Jacob Champion <jchampion(at)timescale(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject:	Re: Data is copied twice when specifying both child and parent table in publication
Date:	2023-03-27 23:01:59
Message-ID:	185c4ac8-fbc8-11a5-d721-55f639969eb8@timescale.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 20, 2023 at 11:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> If the tests you have in mind are only related to this patch set then
> feel free to propose them here if you feel the current ones are not
> sufficient.

I think the new tests added by Wang cover my concerns (thanks!). I share
Peter's comment that we don't seem to have a regression test covering
only the bug description itself -- just ones that combine that case with
row and column restrictions -- but if you're all happy with the existing
approach then I have nothing much to add there.

I was staring at this subquery in fetch_table_list():

> + " ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> + " FROM pg_attribute a\n"
> + " WHERE a.attrelid = gpt.relid AND\n"
> + " a.attnum = ANY(gpt.attrs)\n"
> + " ) AS attnames\n"

On my machine this takes up roughly 90% of the runtime of the query,
which makes for a noticeable delay with a bigger test case (a couple of
FOR ALL TABLES subscriptions on the regression database). And it seems
like we immediately throw all that work away: if I understand correctly,
we only use the third column for its interaction with DISTINCT. Would it
be enough to just replace that whole thing with gpt.attrs?

Thanks,
--Jacob

In response to

Re: Data is copied twice when specifying both child and parent table in publication at 2023-03-21 06:21:51 from Amit Kapila

Responses

RE: Data is copied twice when specifying both child and parent table in publication at 2023-03-28 09:59:49 from wangw.fnst@fujitsu.com

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2023-03-27 23:03:42	Re: Parallel Full Hash Join
Previous Message	Sandro Santilli	2023-03-27 22:58:47	Re: [PATCH] Support % wildcard in extension upgrade filenames