Re: Speedup twophase transactions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Speedup twophase transactions
Date: 2016-12-21 22:35:25
Message-ID: CAB7nPqQ0WbUNrzjbGhnzxV96iQGb_jfb5HUX9+PkCMRSd+LEPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 21, 2016 at 10:37 PM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
> On 21 Dec 2016, at 19:56, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>> That's indeed way simpler than before. Have you as well looked at the
>> most simple approach discussed? That would be just roughly replacing
>> the pg_fsync() calls currently in RecreateTwoPhaseFile() by a save
>> into a list as you are doing, then issue them all checkpoint.Even for
>
>> 2PC files that are created and then removed before the next
>> checkpoint, those will likely be in system cache.
>
> Yes, I tried that as well. But in such approach another bottleneck arises
> —
> new file creation isn’t very cheap operation itself. Dual xeon with 100
> backends
> quickly hit that, and OS routines about file creation occupies first places
> in perf top. Probably that depends on filesystem (I used ext4), but avoiding
> file creation when it isn’t necessary seems like cleaner approach.
> On the other hand it is possible to skip file creation by reusing files,
> for example
> naming them by dummy PGPROC offset, but that would require some changes
> to places that right now looks only at filenames.

Interesting. Thanks for looking at it. Your latest approach looks more
promising based on that then.

>> And this saves lookups at the WAL segments
>> still present in pg_xlog, making the operation at checkpoint much
>> faster with many 2PC files to process.
>
> ISTM your reasoning about filesystem cache applies here as well, but just
> without spending time on file creation.

True. The more spread the checkpoints and 2PC files, the more risk to
require access to disk. Memory's cheap anyway. What was the system
memory? How many checkpoints did you trigger for how many 2PC files
created? Perhaps it would be a good idea to look for the 2PC files
from WAL records in a specific order. Did you try to use
dlist_push_head instead of dlist_push_tail? This may make a difference
on systems where WAL segments don't fit in system cache as the latest
files generated would be looked at first for 2PC data.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-12-21 23:49:40 Re: [PATCH] Transaction traceability - txid_status(bigint)
Previous Message Artur Zakirov 2016-12-21 22:07:11 Re: [BUG?] pg_event_trigger_ddl_commands() error with ALTER TEXT SEARCH CONFIGURATION