Re: Logical Replication Custom Column Expression

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Stavros Koureas <koureasstavros(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Logical Replication Custom Column Expression
Date: 2022-11-21 11:35:15
Message-ID: CAExHW5vHgM1O-1b=BwfqnbAJCNX=pcD2ydRFpokzzTxRBzDKdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
<koureasstavros(at)gmail(dot)com> wrote:
>
> Hi all,
>
> Working with PostgreSQL Logical Replication is just great! It helps a lot doing real time replication for analytical purposes without using any other 3d party service. Although all these years working as product architect of reporting i have noted a few requirements which are always a challenge and may help enhance logical replication even better.
>
> To the point:
> PostgreSQL14 Logical Replication allows replication of a table to another table that exists in another database or even in another host. It also allows multiple upstream tables using the same structure to downstream into a single table.
> CREATE PUBLICATION pb_test FOR TABLE test
>
> PostgreSQL15 Logical Replication allows even better replication options, like selecting subsets of the columns from publisher tables. It also supports plenty of options like disable_on_error etc.
> CREATE PUBLICATION pb_test FOR TABLE test ("id", "name")
>
> What does not support is the option for defining custom column expressions, as keys or values, into the upstream (publication). This will give more flexibility into making replication from multiple upstreams into less downstreams adding more logic. For instance, in a project for analytical purposes there is the need to consolidate data from multiple databases into one and at the same time keep the origin of each replicated data identified by a tenanant_id column. In this case we also need the ability to define the new column as an additional key which will participate into the destination table.
>
> Tenant 1 table
> id serial pk
> description varchar
>
> Tenant 2 table
> id integer pk
> description varchar
>
> Group table
> tenant integer pk
> id integer pk
> description varchar
>
> Possible syntax to archive that
> CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} ,"id", "name")
>
> Example
> CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", "name")

I think that's a valid usecase.

This looks more like a subscription option to me. In multi-subscriber
multi-publisher scenarios, on one subscriber a given upstream may be
tenant 1 but on some other it could be 2. But I don't think we allow
specifying subscription options for a single table. AFAIU, the origin
ids are available as part of the commit record which contained this
change; that's how conflict resolution is supposed to know it. So
somehow the subscriber will need to fetch those from there and set the
tenant.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-11-21 11:43:38 Re: Reducing power consumption on idle servers
Previous Message Matthias van de Meent 2022-11-21 11:34:12 Re: Cleanup: Duplicated, misplaced comment in HeapScanDescData