Re: Logcal replication in large scale

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Nicklas Avén <nicklas(dot)aven(at)jordogskog(dot)no>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logcal replication in large scale
Date: 2017-09-20 07:44:06
Message-ID: CABUevEzcyqRgNDoHLsempevA8Yj_43LuYnbf0mvWiXP00VDs8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén <nicklas(dot)aven(at)jordogskog(dot)no>
wrote:

> Hallo all
>
> I am thrilled about logical replication in PostgreSQL 10. My head have
> started spinning about use cases.
>
> Would it be possible to use logical replication as a distribution
> method of data?
>

As an answer to the generic question: yes :)

> I think about map data from national mapping authorities. The problem
> is to get the updates of their data sets. Especially the open data sets
> are now distributed as files (shape files) in Sweden and as pg_dump in
> Norway.
>
> I guess it is beyond what logical replication is designed for, so I ask
> what problems that might arise for a scenario like:
>
> The distributor has a publication database with logical replication
> publications of the tables. All users, probably thousands or more,
> would subscribe to that publication to get an updated copy of the data
> set.
>
> How would the publication server react? I guess the WAL-files will only
> be written once anyway?
>

Yes. But they will need to be kept around until *all* subscribers have
pulled down their changes. So even one subscriber falling behind will mean
your WAL will never get cleaned up.

Of course, you can keep some sort of watcher process that kills old
replication slots.

I am also not sure how well PostgreSQL will react to having thousands of
replication slots. It's not what the system was designed for I believe :)

You might be better of using logical decoding (which goes back to 9.4) to
stream the data out, but not having each subscriber be a postgresql
subscriber. Either using it to generate some sort of "diff files" that can
then be consumed downstream, or by distributing it via some kind of
dedicated queuing system designed to handle that many downstreams.

> My guess is that it will be a lower payload than today anyway when the
> whole data set have to be fetched to get updates.
>

As always that's going to depend on the number of updates. If the same
thing is updated 1000 times, then if you use logical replication it gets
transferred 1000 times. So there are definitely cases when there will be
*more* traffic with logical replication, but in cases like yours I would
guess it will be less.

//Magnus

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Güttler 2017-09-20 07:59:44 Re: Final pg_dumpall should happen in Single-User-Mode
Previous Message Michael Paquier 2017-09-20 07:43:19 Re: Logical decoding client has the power to crash the server