From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Speedup twophase transactions |
Date: | 2016-01-11 18:43:47 |
Message-ID: | CANP8+j+S3ed_2zLRR7jST8-qv2=tQc_L=1YVscm=cpT6k53r5g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11 January 2016 at 12:58, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
>
> > On 10 Jan 2016, at 12:15, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > So we've only optimized half the usage? We're still going to cause
> replication delays.
>
> Yes, replica will go through old procedures of moving data to and from
> file.
>
> > We can either
> >
> > 1) Skip fsyncing the RecreateTwoPhaseFile and then fsync during
> restartpoints
>
> From what i’ve seen with old 2pc code main performance bottleneck was
> caused by frequent creating of files. So better to avoid files if possible.
>
> >
> > 2) Copy the contents to shmem and then write them at restartpoint as we
> do for checkpoint
> > (preferred)
>
> Problem with shared memory is that we can’t really predict size of state
> data, and anyway it isn’t faster then reading data from WAL
> (I have tested that while preparing original patch).
>
> We can just apply the same logic on replica that on master: do not do
> anything special on prepare, and just read that data from WAL.
> If checkpoint occurs during recovery/replay probably existing code will
> handle moving data to files.
>
> I will update patch to address this issue.
>
I'm looking to commit what we have now, so lets do that as a separate but
necessary patch please.
> > I think padding will negate the effects of the additional bool.
> >
> > If we want to reduce the size of the array GIDSIZE is currently 200, but
> XA says maximum 128 bytes.
> >
> > Anybody know why that is set to 200?
>
> Good catch about GID size.
>
I'll apply that as a separate patch also.
> If we talk about further optimisations i see two ways:
>
> 1) Optimising access to GXACT. Here we can try to shrink it; introduce
> more granular locks,
> e.g. move GIDs out of GXACT and lock GIDs array only once while checking
> new GID uniqueness; try to lock only part of GXACT by hash; etc.
>
Have you measured lwlocking as a problem?
> 2) Be optimistic about consequent COMMIT PREPARED. In normal workload next
> command after PREPARE will be COMMIT/ROLLBACK, so we can save
> transaction context and release it only if next command isn’t our
> designated COMMIT/ROLLBACK. But that is a big amount of work and requires
> changes to whole transaction pipeline in postgres.
>
We'd need some way to force session pools to use that correctly, but yes,
agreed.
> Anyway I suggest that we should consider that as a separate task.
Definitely. From the numbers, I can see there is still considerable
performance gain to be had.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2016-01-11 18:46:05 | Re: proposal: PL/Pythonu - function ereport |
Previous Message | Jim Nasby | 2016-01-11 18:41:50 | Re: proposal: PL/Pythonu - function ereport |