Re: Speedup twophase transactions

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speedup twophase transactions
Date: 2016-01-11 18:43:47
Message-ID: CANP8+j+S3ed_2zLRR7jST8-qv2=tQc_L=1YVscm=cpT6k53r5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 January 2016 at 12:58, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:

>
> > On 10 Jan 2016, at 12:15, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > So we've only optimized half the usage? We're still going to cause
> replication delays.
>
> Yes, replica will go through old procedures of moving data to and from
> file.
>
> > We can either
> >
> > 1) Skip fsyncing the RecreateTwoPhaseFile and then fsync during
> restartpoints
>
> From what i’ve seen with old 2pc code main performance bottleneck was
> caused by frequent creating of files. So better to avoid files if possible.
>
> >
> > 2) Copy the contents to shmem and then write them at restartpoint as we
> do for checkpoint
> > (preferred)
>
> Problem with shared memory is that we can’t really predict size of state
> data, and anyway it isn’t faster then reading data from WAL
> (I have tested that while preparing original patch).
>
> We can just apply the same logic on replica that on master: do not do
> anything special on prepare, and just read that data from WAL.
> If checkpoint occurs during recovery/replay probably existing code will
> handle moving data to files.
>
> I will update patch to address this issue.
>

I'm looking to commit what we have now, so lets do that as a separate but
necessary patch please.

> > I think padding will negate the effects of the additional bool.
> >
> > If we want to reduce the size of the array GIDSIZE is currently 200, but
> XA says maximum 128 bytes.
> >
> > Anybody know why that is set to 200?
>
> Good catch about GID size.
>

I'll apply that as a separate patch also.

> If we talk about further optimisations i see two ways:
>
> 1) Optimising access to GXACT. Here we can try to shrink it; introduce
> more granular locks,
> e.g. move GIDs out of GXACT and lock GIDs array only once while checking
> new GID uniqueness; try to lock only part of GXACT by hash; etc.
>

Have you measured lwlocking as a problem?

> 2) Be optimistic about consequent COMMIT PREPARED. In normal workload next
> command after PREPARE will be COMMIT/ROLLBACK, so we can save
> transaction context and release it only if next command isn’t our
> designated COMMIT/ROLLBACK. But that is a big amount of work and requires
> changes to whole transaction pipeline in postgres.
>

We'd need some way to force session pools to use that correctly, but yes,
agreed.

> Anyway I suggest that we should consider that as a separate task.

Definitely. From the numbers, I can see there is still considerable
performance gain to be had.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-01-11 18:46:05 Re: proposal: PL/Pythonu - function ereport
Previous Message Jim Nasby 2016-01-11 18:41:50 Re: proposal: PL/Pythonu - function ereport