Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2018-12-18 23:56:07
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Hi Alexey,

Attached is an updated version of the patches, with all the fixes I've
done in the past. I believe it should fix at least some of the issues
you reported - certainly the problem with stream_cleanup_files, but
perhaps some of the other issues too.

I'm a bit confused by the changes to TAP tests. Per the patch summary,
some .pl files get renamed (nor sure why), a new one is added, etc. So
I've instead enabled streaming subscriptions in all tests, which with
this patch produces two failures:

Test Summary Report
t/ (Wstat: 7424 Tests: 1 Failed: 0)
Non-zero exit status: 29
Parse errors: Bad plan. You planned 7 tests but ran 1.
t/ (Wstat: 256 Tests: 2 Failed: 1)
Failed test: 2
Non-zero exit status: 1

So yeah, there's more stuff to fix. But I can't directly apply your
fixes because the updated patches are somewhat different.

On 12/18/18 3:07 PM, Alexey Kondratov wrote:
> On 18.12.2018 1:28, Tomas Vondra wrote:
>>> 4) There was a problem with marking top-level transaction as having
>>> catalog changes if one of its subtransactions has. It was causing a
>>> problem with DDL statements just after subtransaction start (savepoint),
>>> so data from new columns is not replicated.
>>> 5) Similar issue with schema send. You send schema only once per each
>>> sub/transaction (IIRC), while we have to update schema on each catalog
>>> change: invalidation execution, snapshot rebuild, adding new tuple cids.
>>> So I ended up with adding is_schema_send flag to ReorderBufferTXN, since
>>> it is easy to set it inside RB and read in the output plugin. Probably,
>>> we have to choose a better place for this flag.
>> Hmm. Can you share an example how to trigger these issues?
> Test cases inside and old ones (with
> streaming=true option added) should reproduce all these issues. In
> general, it happens in a txn like:
> then the second insert may discover old version of catalog.

Yeah, that's the issue I've discovered before and thought it got fixed.

>> Interesting. Any idea where does the extra overhead in this particular
>> case come from? It's hard to deduce that from the single flame graph,
>> when I don't have anything to compare it with (i.e. the flame graph for
>> the "normal" case).
> I guess that bottleneck is in disk operations. You can check
> logical_repl_worker_new_perf.svg flame graph: disk reads (~9%) and
> writes (~26%) take around 35% of CPU time in summary. To compare,
> please, see attached flame graph for the following transaction:
> INSERT INTO large_text
> SELECT (SELECT string_agg('x', ',')
> FROM generate_series(1, 2000)) FROM generate_series(1, 1000000);
> Execution Time: 44519.816 ms
> Time: 98333,642 ms (01:38,334)
> where disk IO is only ~7-8% in total. So we get very roughly the same
> ~x4-5 performance drop here. JFYI, I am using a machine with SSD for tests.
> Therefore, probably you may write changes on receiver in bigger chunks,
> not each change separately.

Possibly, I/O is certainly a possible culprit, although we should be
using buffered I/O and there certainly are not any fsyncs here. So I'm
not sure why would it be cheaper to do the writes in batches.

BTW does this mean you see the overhead on the apply side? Or are you
running this on a single machine, and it's difficult to decide?

>> So I'm not particularly worried, but I'll look into that. I'd be much
>> more worried if there was measurable overhead in cases when there's no
>> streaming happening (either because it's disabled or the memory limit
>> was not hit).
> What I have also just found, is that if a table row is large enough to
> be TOASTed, e.g.:
> INSERT INTO large_text
> SELECT (SELECT string_agg('x', ',')
> FROM generate_series(1, 1000000)) FROM generate_series(1, 1000);
> then logical_work_mem limit is not hit and we neither stream, nor spill
> to disk this transaction, while it is still large. In contrast, the
> transaction above (with 1000000 smaller rows) being comparable in size
> is streamed. Not sure, that it is easy to add proper accounting of
> TOAST-able columns, but it worth it.

That's certainly strange and possibly a bug in the memory accounting
code. I'm not sure why would that happen, though, because TOAST data
look just like regular INSERT changes. Interesting. I wonder if it's
already fixed in this updated version, but it's a bit too late to
investigate that today.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Add-logical_work_mem-to-limit-ReorderBuffer-20181219.patch.gz application/gzip 9.9 KB
0002-Immediately-WAL-log-assignments-20181219.patch.gz application/gzip 6.4 KB
0003-Issue-individual-invalidations-with-wal_lev-20181219.patch.gz application/gzip 4.9 KB
0004-Extend-the-output-plugin-API-with-stream-me-20181219.patch.gz application/gzip 6.1 KB
0005-Implement-streaming-mode-in-ReorderBuffer-20181219.patch.gz application/gzip 12.3 KB
0006-Add-support-for-streaming-to-built-in-repli-20181219.patch.gz application/gzip 20.1 KB
0007-Track-statistics-for-streaming-spilling-20181219.patch.gz application/gzip 4.2 KB
0008-BUGFIX-set-final_lsn-for-subxacts-before-cl-20181219.patch.gz application/gzip 635 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-12-19 00:20:09 Re: Doc typo?
Previous Message Tom Lane 2018-12-18 23:47:42 Re: Doc typo?