Re: Column Filtering in Logical Replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2021-09-25 07:45:08
Message-ID: CALDaNm0yN8LFRETa-0SV1EGH6_VsFJPmpx_fZ3EjSoCb1zfF-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 24, 2021 at 6:55 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2021-Sep-23, Amit Kapila wrote:
>
> > Alvaro, do you have any thoughts on these proposed grammar changes?
>
> Yeah, I think pubobj_name remains a problem in that you don't know its
> return type -- could be a String or a RangeVar, and the user of that
> production can't distinguish. So you're still (unnecessarily, IMV)
> stashing an object of undetermined type into ->object.
>
> I think you should get rid of both pubobj_name and pubobj_expr and do
> somethine like this:
>
> /* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
> PublicationObjSpec: TABLE ColId
> {
> $$ = makeNode(PublicationObjSpec);
> $$->pubobjtype = PUBLICATIONOBJ_TABLE;
> $$->rangevar = makeRangeVarFromQualifiedName($1, NULL, @1, yyscanner);
> }
> | TABLE ColId indirection
> {
> $$ = makeNode(PublicationObjSpec);
> $$->pubobjtype = PUBLICATIONOBJ_TABLE;
> $$->rangevar = makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> }
> | ALL TABLES IN_P SCHEMA ColId
> {
> $$ = makeNode(PublicationObjSpec);
> $$->pubobjtype = PUBLICATIONOBJ_REL_IN_SCHEMA;
> $$->name = $4;
> }
> | ALL TABLES IN_P SCHEMA CURRENT_SCHEMA /* XXX should this be "IN_P CURRENT_SCHEMA"? */
> {
> $$ = makeNode(PublicationObjSpec);
> $$->pubobjtype = PUBLICATIONOBJ_CURRSCHEMA;
> $$->name = $4;
> }
> | ColId
> {
> $$ = makeNode(PublicationObjSpec);
> $$->name = $1;
> $$->pubobjtype = PUBLICATIONOBJ_CONTINUATION;
> }
> | ColId indirection
> {
> $$ = makeNode(PublicationObjSpec);
> $$->rangevar = makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> $$->pubobjtype = PUBLICATIONOBJ_CONTINUATION;
> }
> | CURRENT_SCHEMA
> {
> $$ = makeNode(PublicationObjSpec);
> $$->pubobjtype = PUBLICATIONOBJ_CURRSCHEMA;
> }
> ;

Apart from the issue that Hou San pointed, I found one issue with
introduction of PUBLICATIONOBJ_CURRSCHEMA, I was not able to
differentiate if it is table or schema in the following cases:
CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA CURRENT_SCHEMA;
CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA sch1, CURRENT_SCHEMA;
CREATE PUBLICATION pub1 FOR table t1, CURRENT_SCHEMA;
The differentiation is required to differentiate and add a schema or a table.

I felt it was better to use PUBLICATIONOBJ_CONTINUATION in case of
CURRENT_SCHEMA in multiple object cases like:
PublicationObjSpec: TABLE relation_expr
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_TABLE;
$$->rangevar = $2;
}
| ALL TABLES IN_P SCHEMA ColId
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_REL_IN_SCHEMA;
$$->name = $5;
$$->location = @5;
}
| ALL TABLES IN_P SCHEMA CURRENT_SCHEMA /* XXX
should this be "IN_P CURRENT_SCHEMA"? */
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_REL_IN_SCHEMA;
$$->name = "CURRENT_SCHEMA";
$$->location = @5;
}
| ColId
{
$$ =
makeNode(PublicationObjSpec);
$$->name = $1;
$$->pubobjtype =
PUBLICATIONOBJ_CONTINUATION;
$$->location = @1;
}
| ColId indirection
{
$$ =
makeNode(PublicationObjSpec);
$$->rangevar =
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
$$->pubobjtype =
PUBLICATIONOBJ_CONTINUATION;
$$->location = @1;
}
| CURRENT_SCHEMA
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_CONTINUATION;
$$->name = "CURRENT_SCHEMA";
$$->location = @1;
}
| extended_relation_expr /* grammar
like tablename * , ONLY tablename, ONLY ( tablename )*/
{
$$ =
makeNode(PublicationObjSpec);
/*$$->rangevar =
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner); */
$$->rangevar = $1;
$$->pubobjtype =
PUBLICATIONOBJ_CONTINUATION;
}
;

I'm ok with your suggestion along with the above proposed changes. I
felt the changes proposed at [1] were also fine. Let's change it to
whichever is better, easily extendable and can handle the Column
filtering project, ALL TABLES IN SCHEMA, ALL SEQUENCES IN SCHEMA
projects, and other projects in the future. Based on that we can check
in the parser changes independently and then the remaining series of
the patches can be rebased on top of it accordingly. Thoughts?

> so in AlterPublicationStmt you would have stanzas like
>
> | ALTER PUBLICATION name ADD_P pub_obj_list
> {
> AlterPublicationStmt *n = makeNode(AlterPublicationStmt);
> n->pubname = $3;
> n->pubobjects = preprocess_pubobj_list($5);
> n->action = DEFELEM_ADD;
> $$ = (Node *)n;
> }
>
> where preprocess_pubobj_list (defined right after processCASbits and
> somewhat mimicking it and SplitColQualList) takes all
> PUBLICATIONOBJ_CONTINUATION and turns them into either
> PUBLICATIONOBJ_TABLE entries or PUBLICATIONOBJ_REL_IN_SCHEMA entries,
> depending on what the previous entry was. (And of course if there is no
> previous entry, raise an error immediately). Note that node
> PublicationObjSpec now has two fields, one for RangeVar and another for
> a plain name, and tables always use the second one, except when they are
> continuations, but of course those continuations that use name are
> turned into rangevars in the preprocess step. I think that would make
> the code in ObjectsInPublicationToOids less messy.

I agree with this. I will make the changes for this in the next version.

> (I don't think using the string "CURRENT_SCHEMA" is a great solution.
> Did you try having a schema named CURRENT_SCHEMA?)

Here CURRENT_SCHEMA is not used for the schema name, it will be
replaced with the name of the schema that is first in the search path.

[1] - https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-09-25 09:28:52 Re: psql - add SHOW_ALL_RESULTS option
Previous Message Fabien COELHO 2021-09-25 07:40:50 Re: rand48 replacement