Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 18:26:35
Message-ID: 50819B5B.7030507@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> If we describe a queue as something you put stuff in at one end and
> get it out in same or some other specific order at the other end, then
> WAL _is_ a queue when you use it for replication (if you just write to it,
> then it is "Log", if you write and read, it is "Queue")

For that matter, WAL is a queue you use for recovery. But, for that
matter, BerkeleyDB is a database just as PostgreSQL as a database. That
doesn't mean you can use BerkeleyDB and PostgreSQL for all the same tasks.

> All it takes to make this scenario work is keeping track of LSN or simply
> log position on the slave side.
>
> What you seem to be wanting is support for a cooperative consumers,
> that is multiple consumers on the same queue working together and
> sharing the work to process the incoming event .
>
> This can be easily achieved using a single ordered event stream and
> extra bookkeeping structures on the consumer side (look at cooperative
> consumer samples in skytools).

What I'm saying is, we'll get nowhere promoting an application queue
which is permanently inferior to existing, popular open source software.
My advice: Forget about the application queue aspects of this. Focus
on making it work for replication and matviews, which are already hard
use cases to optimize.

If someone can turn this feature into the base for a distributed
queueing system later, then great. But let's not complicate this
feature by worrying about a use case it may never fulfill.

> Thanks to introducing logical replication, it now makes sense to have
> actions recorded _only_ in this queue and this is what the whole RC was
> about.

Yes, I agree.

I'm just pointing out that the needs of a replication queue and of an
application queue are divergent.

> Currently the "clear tech spec" is just this:
>
> * works as table on INSERTS up to inserting logical WAL record
> describing the
> insert but no data is inserted locally.

Yeah, I think where you confused a bunch of people here is the
definition of "locally". Let me see if I understand this:

* a Writer would INSERT data into the LOG ONLY TABLE (L.O.T.), which
write would be synched to WAL but there would be no in-memory or on-disk
version of the table updated.

* Readers could subscribe to the LSN for the L.O.T. and would receive a
stream of INSERTs, which they could handle as they wished.

Is my understanding correct? If it is, I have more questions!

> with all things that follow from the local table having no data
> - unique constraints don't make sense
> - indexes make no sense
> - updates and deletes hit no data
> - etc. . .

Right.

> As Simon explained, the initial RFC was just about not keeping the
> data in local table if we know it will never be accessed

Ah, so to answer Simon's question: no, this RFC makes no sense without a
description of expected Reader activity.

> (at leas not
> for anything except vacuum and delete/truncate)

If the table is not being represented as a table in the catalog or on
disk, why would it ever need to be vacuumed?

> It is very hard for me to tell for sure if walsender->walreceiver combo
> "reads the events" on master or slave side

Well, presumably the only way a Reader on the master could get the queue
would be for the master to subscribe to its own LSN. No?

> HEAD is the queue producer, where the events go in (any insert on master)
>
> TAIL (to avoid another word) is where they come out
> (walreader -> walreceiver moving the events to slave)

BTW, I suggest using "Writer" and "Reader" for the queue roles, not
"Head" and "Tail", which terms are rather unclear.

> Think of an analogy with a snake feeding on berries used by
> an ant colony to get the nutrients in the berries to its nest :)

That's a very ... unique analogy. ;-)

>>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?
>>
> I guess WRITE ONLY tables would get us more publicity would not be
> entirely correct, as the data is readable from the log .

I like LOG ONLY TABLES, actually; it's the mirror of UNLOGGED TABLEs.
Or REPLICATION MESSAGE TABLE.

Now, since I've pointed out what use case this mechanism does not apply
to (replacing a generic application queue), let me point out some ones
which it *does* apply to, and handily:

* Updating matviews on a replica
* Updating a cache (assuming an autonomous LSN reader)
* Remote security logging (especially if combined with command triggers)

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-19 18:28:45 Re: assertion failure w/extended query protocol
Previous Message Peter Geoghegan 2012-10-19 18:07:59 Re: assertion failure w/extended query protocol