Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

From: Steve Singer <steve(at)ssinger(dot)info>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Date: 2012-06-26 00:14:22
Message-ID: BLU0-SMTP67D92025ED330B4714F8DDCE00@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12-06-21 04:37 AM, Andres Freund wrote:
> Hi Steve,
> Thanks!
>

Attached is a detailed review of the patch.

Very good analysis, thanks!
> Another reasons why we cannot easily do 1) is that subtransactions aren't
> discernible from top-level transactions before the top-level commit happens,
> we can only properly merge in the right order (by "sorting" via lsn) once we
> have seen the commit record which includes a list of all committed
> subtransactions.
>

Based on that and your comments further down in your reply (and that no
one spoke up and disagreed with you) It sounds like that doing (1) isn't
going to be practical.

> I also don't think 1) would be particularly welcome by people trying to
> replicate into foreign systems.
>

They could still sort the changes into transaction groups before
applying to the foreign system.

> I planned to have some cutoff 'max_changes_in_memory_per_txn' value.
> If it has
> been reached for one transaction all existing changes are spilled to disk. New
> changes again can be kept in memory till its reached again.
>
Do you want max_changes_per_in_memory_txn or do you want to put a limit
on the total amount of memory that the cache is able to use? How are you
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how
much memory their system has and that they can devote to the apply cache
versus other things, giving them guidance on how estimating how much
open transactions they might have at a point in time and how many
WAL change records each transaction generates seems like a step
backwards from the progress we've been making in getting Postgresql to
be easier to tune. The maximum number of transactions that could be
opened at a time is governed by max_connections on the master at the
time the WAL was generated , so I don't even see how the machine
processing the WAL records could autotune/guess that.

> We need to support serializing the cache for crash recovery + shutdown of the
> receiving side as well. Depending on how we do the wal decoding we will need
> it more frequently...
>

Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state
everytime it calls cache->commit() to ensure that consumers don't get
invoked twice on the same transaction.

If the apply module is making changes to the same backend that the apply
cache serializes to then both the state for the apply cache and the
changes that committed changes/transactions make will be persisted (or
not persisted) together. What if I am replicating from x86 to x86_64
via a apply module that does textout conversions?

x86 Proxy x86_64
----WAL------> apply
cache
| (proxy catalog)
|
apply module
textout --------------------->
SQL statements

How do we ensure that the commits are all visible(or not visible) on
the catalog on the proxy instance used for decoding WAL, the destination
database, and the state + spill files of the apply cache stay consistent
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd
rather we didn't). Can we come up with a way of avoiding the need for
them to be consistent with each other?

I think apply modules will need to be able to be passed the same
transaction twice (once before the crash and again after) and come up
with a way of deciding if that transaction has a) Been applied to the
translation/proxy catalog and b) been applied on the replica instance.
How is the walreceiver going to decide which WAL sgements it needs to
re-process after a crash? I would want to see more of these details
worked out before we finalize the interface between the apply cache and
the apply modules and how the serialization works.

Code Review
=========

applycache.h
-----------------------
+typedef struct ApplyCacheTupleBuf
+{
+ /* position in preallocated list */
+ ilist_s_node node;
+
+ HeapTupleData tuple;
+ HeapTupleHeaderData header;
+ char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big
the data in the transaction is? Wouldn't workloads with inserts of lots
of small rows in a transaction eat up lots of memory that is allocated
but storing nothing? The only alternative I can think of is dynamically
allocating these and I don't know what the cost/benefit of that overhead
will be versus spilling to disk sooner.

+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
+*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

ApplyCacheReleaseChange(...) ? I keep thinking of 'Return' as us
returning the data somewhere not the memory.

applycache.c:
-------------------

I've taken a quick look through this file and I don't see any issues
other than the many FIXME's and other issues you've identified already,
which I don't expect you to address in this CF.

> Andres
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-06-26 00:42:43 Re: WAL format changes
Previous Message Heikki Linnakangas 2012-06-26 00:09:34 Re: WAL format changes