Logical replication for async service communication?

From: Sean Huber <sean(dot)m(dot)huber(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Logical replication for async service communication?
Date: 2020-05-15 04:33:32
Message-ID: CAM8f5Mi1Ftj+48PZxN1AbM-P=4YMLENY5zRaPwTbmbkFwCsTkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Has anyone attempted to use logical replication with table partitioning for
async service communication?

Proof of concept:
https://gist.github.com/shuber/8e53d42d0de40e90edaf4fb182b59dfc

Services would commit messages to their own databases along with the rest
of their data (with the same transactional guarantees) and then messages
are "realtime" replicated (with all of its features and guarantees) to the
receiving service's database where their workers (e.g. que-rb
<https://github.com/que-rb/que>, skip locked polling, etc) are waiting to
respond by inserting messages into *their* database to be replicated back.

Throw in a trigger to automatically acknowledge/cleanup/notify messages and
I think we've got something that resembles a queue? Maybe make that same
trigger match incoming messages against a "routes" table (based on message
type, certain JSON schemas
<https://github.com/gavinwahl/postgres-json-schema> in the payload, etc)
and write matches to the que-rb jobs table instead for some kind of
distributed/replicated work queue hybrid?

My motivations for this line of thinking were mostly based around high
availability and isolating service downtime/failures from each other. Our
PostgreSQL databases are the most critical pieces of infrastructure for all
of our services - if it's down then we don't want the impacted service to
even attempt to be doing work. On the other hand, we don't want a service's
downtime to impact its ability to receive (queued) messages from other
services that it can resume consuming (once, in order) when it's back up.

We're exploring other message queues but keep getting drawn back to
PostgreSQL because we can get the same transactional guarantees with our
messages/jobs as the rest of our data. Even the act of enqueuing a job or
sending a message to another service is something that must be committed
and can be rolled back like everything else.

For our potential use case specifically, we're not dealing with high levels
of realtime traffic etc - we're not even close to 1k jobs/messages per
second.

I'm looking to poke holes in this concept before sinking anymore time
exploring the idea. Any feedback/warnings/concerns would be much
appreciated, thanks for your time!

Sean Huber

Browse pgsql-general by date

  From Date Subject
Next Message Chris Withers 2020-05-15 06:06:39 Re: surprisingly slow creation of gist index used in exclude constraint
Previous Message Tom Lane 2020-05-15 01:13:30 Re: view selection during query rewrite