Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2018-12-17 16:23:44
Message-ID: 40c38758-04b5-74f4-c963-cf300f9e5dff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

> This new version is mostly just a rebase to current master (or almost,
> because 2pc decoding only applies to 29180e5d78 due to minor bitrot),
> but it also addresses the new stuff committed since last version (most
> importantly decoding of TRUNCATE). It also fixes a bug in WAL-logging of
> subxact assignments, where the assignment was included in records with
> XID=0, essentially failing to track the subxact properly.

I started reviewing your patch about a month ago and tried to do an
in-depth review, since I am very interested in this patch too. The new
version is not applicable to master 29180e5d78, but everything is OK
after applying 2pc patch before. Anyway, I guess it may complicate
further testing and review, since any potential reviewer has to take
into account both patches at once. Previous version was applicable to
master and was working fine for me separately (excepting a few
patch-specific issues, which I try to explain below).

Patch review
========

First of all, I want to say thank you for such a huge work done. Here
are some problems, which I have found and hopefully fixed with my
additional patch (please, find attached, it should be applicable to the
last commit of your newest patch version):

1) The most important issue is that your tap tests were broken—there was
missing option "WITH (streaming=true)" in the subscription creating
statement. Therefore, spilling mechanism has been tested rather than
streaming.

2) After fixing tests the first one with simple streaming is immediately
failed, because of logical replication worker segmentation fault. It
happens, since worker tries to call stream_cleanup_files inside
stream_open_file at the stream start, while nxids is zero, then it goes
to the negative value and everything crashes. Something similar may
happen with xids array, so I added two checks there.

3) The next problem is much more critical and is dedicated to historic
MVCC visibility rules. Previously, walsender was starting to decode
transaction on commit and we were able to resolve all xmin, xmax,
combocids to cmin/cmax, build tuplecids hash and so on, but now we start
doing all these things on the fly.

Thus, rather difficult situation arises: HeapTupleSatisfiesHistoricMVCC
is trying to validate catalog tuples, which are currently in the future
relatively to the current decoder position inside transaction, e.g. we
may want to resolve cmin/cmax of a tuple, which was created with cid 3
and deleted with cid 5, while we are currently at cid 4, so our
tuplecids hash is not complete to handle such a case.

I have updated HeapTupleSatisfiesHistoricMVCC visibility rules with two
options:

/*
 * If we accidentally see a tuple from our transaction, but cannot
resolve its
 * cmin, so probably it is from the future, thus drop it.
 */
if (!resolved)
    return false;

and

/*
 * If we accidentally see a tuple from our transaction, but cannot
resolve its
 * cmax or cmax == InvalidCommandId, so probably it is still valid,
thus accept it.
 */
if (!resolved || cmax == InvalidCommandId)
    return true;

4) There was a problem with marking top-level transaction as having
catalog changes if one of its subtransactions has. It was causing a
problem with DDL statements just after subtransaction start (savepoint),
so data from new columns is not replicated.

5) Similar issue with schema send. You send schema only once per each
sub/transaction (IIRC), while we have to update schema on each catalog
change: invalidation execution, snapshot rebuild, adding new tuple cids.
So I ended up with adding is_schema_send flag to ReorderBufferTXN, since
it is easy to set it inside RB and read in the output plugin. Probably,
we have to choose a better place for this flag.

6) To better handle all these tricky cases I added new tap
test—014_stream_tough_ddl.pl—which consist of really tough combination
of DDL, DML, savepoints and ROLLBACK/RELEASE in a single transaction.

I marked all my fixes and every questionable place with comment and
"TOCHECK:" label for easy search. Removing of pretty any of these fixes
leads to the tests fail due to the segmentation fault or replication
mismatch. Though I mostly read and tested old version of patch, but
after a quick look it seems that all these fixes are applicable to the
new version of patch as well.

Performance
========

I have also performed a series of performance tests, and found that
patch adds a huge overhead in the case of a large transaction consisting
of many small rows, e.g.:

CREATE TABLE large_test (num1 bigint, num2 double precision, num3 double
precision);

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_test (num1, num2, num3)
SELECT round(random()*10), random(), random()*142
FROM generate_series(1, 1000000) s(i);

Execution Time: 2407.709 ms
Total Time: 11494,238 ms (00:11,494)

With synchronous_standby_names and 64 MB logical_work_mem it takes up to
x5 longer, while without patch it is about x2. Thus, logical replication
streaming is approximately x4 as slower for similar transactions.

However, dealing with large transactions consisting of a small number of
large rows is much better:

CREATE TABLE large_text (t TEXT);

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 1000000)) FROM generate_series(1, 125);

Execution Time: 3545.642 ms
Total Time: 7678,617 ms (00:07,679)

It is around the same x2 as without patch. If someone is interested I
also added flamegraphs of walsender and logical replication worker
during first numerical transaction processing.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
logical_repl_worker_new_perf.svg.zip application/zip 56.0 KB
walsender_new_perf.svg.zip application/zip 34.7 KB
0009-Fix-worker-historic-MVCC-visibility-rules-subxacts-s.patch text/x-patch 20.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-17 16:52:03 Re: Why aren't we using strsignal(3) ?
Previous Message Pavel Stehule 2018-12-17 16:11:07 Re: Referential Integrity Checks with Statement-level Triggers