Re: Detecting schema changes during logical replication

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Detecting schema changes during logical replication
Date: 2017-05-07 21:54:26
Message-ID: CA+mi_8bJ_uPr67j-6mbin537DVvfk=bOhmWneyBRfbZu89q0tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 7, 2017 at 8:04 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2017-05-07 19:27:08 +0100, Daniele Varrazzo wrote:
>> I'm putting together a replication system based on logical
>> replication.
>
> Interesting. If you very briefly could recap what it's about... ;)

I need to replicate some tables from a central database into the
database that should run a secondary system. For a similar use case we
have used Londiste in the past, which has served us good, but its
usage has not been problem-free. Logical decoding seems much less
invasive on the source database than a trigger-based replication
solution, and has less moving part to care about and maintain.

For the moment I'm hacking into a fork of Euler project for wal
decoding into json (https://github.com/dvarrazzo/wal2json), mostly
adding configurability, so that we may be able to replicate only the
tables we need, skip certain fields etc. I'm also taking a look at
minimising the amount of information produced: sending over and over
the column names and types for every record seems a waste, hence my
question.

>> I would like to send table information only the first
>> time a table is seen by the 'change_cb' callback, but of course there
>> could be some schema change after replication started. So I wonder: is
>> there any information I can find in the 'Relation' structure of the
>> change callback, which may suggest that there could have been a change
>> in the table schema, hence a new schema should be sent to the client?
>
> The best way I can think of - which is also what is implemented in the
> in-core replication framework - is to have a small cache on-top of the
> relcache. That cache is kept coherent using
> CacheRegisterRelcacheCallback(). Then whenever there's a change you
> look up that change in that cache, and send the schema information if
> it's been invalidated since you last sent something. That's also how
> the new stuff in v10 essentially works:
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_change(), does a lookup for its own metadata using get_rel_sync_entry()
> which then checks relentry->schema_sent. Invalidation unsets
> schema_sent in rel_sync_cache_relation_cb.

Thank you very much, it seems exactly what I need. I'll try hacking
around this callback.

-- Daniele

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-05-07 23:17:31 logical replication deranged sender
Previous Message Fabien COELHO 2017-05-07 20:55:44 Re: proposal psql \gdesc