Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-05-01 00:22:59
Message-ID: CAD21AoDH6zaUFJLyntZ_aiA-ij_NvqyMox+f7sg1TygMUJOSDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda
<ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2021/03/17 12:03, Masahiko Sawada wrote:
> > I've attached the updated version patch set.
>
> Thanks for updating the patches! I'm now restarting to review of 2PC because
> I'd like to use this feature in PG15.

Thank you for reviewing the patch! Much appreciated.

>
>
> I think the following logic of resolving and removing the fdwxact entries
> by the transaction resolver needs to be fixed.
>
> 1. check if pending fdwxact entries exist
>
> HoldInDoubtFdwXacts() checks if there are entries which the condition is
> InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
> array. The fdwXactLock is released at the end of this phase.
>
> 2. resolve and remove the entries held in 1th phase.
>
> ResolveFdwXacts() resloves the status per each fdwxact entry using the
> indexes. The end of resolving, the transaction resolver remove the entry in
> fdwxacts array via remove_fdwact().
>
> The way to remove the entry is the following. Since to control using the
> index, the indexes of getting in the 1st phase are meaningless anymore.
>
> /* Remove the entry from active array */
> FdwXactCtl->num_fdwxacts--;
> FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
>
> This seems to lead resolving the unexpected fdwxacts and it can occur the
> following assertion error. That's why I noticed. For example, there is the
> case which a backend inserts new fdwxact entry in the free space, which the
> resolver removed the entry right before, and the resolver accesses the new
> entry which doesn't need to resolve yet because it use the indexes checked in
> 1st phase.
>
> Assert(fdwxact->lockeing_backend == MyBackendId);

Good point. I agree with your analysis.

>
>
>
> The simple solution is that to get fdwXactLock exclusive all the time from the
> begining of 1st phase to the finishing of 2nd phase. But, I worried that the
> performance impact became too big...
>
> I came up with two solutions although there may be better solutions.
>
> A. to remove resolved entries at once after resolution for all held entries is
> finished
>
> If so, we don't need to take the exclusive lock for a long time. But, this
> have other problems, which pg_remove_foreign_xact() can still remove entries
> and we need to handle the fail of resolving.
>
> I wondered that we can solve the first problem to introduce a new lock like
> "removing lock" and only the processes which hold the lock can remove the
> entries. The performance impact is limited since the insertion the fdwxact
> entries is not blocked by this lock. And second problem can be solved using
> try-catch sentence.
>
>
> B. to merge 1st and 2nd phase
>
> Now, the resolver resolves the entries together. That's the reason why it's
> difficult to remove the entries. So, it seems to solve the problem to execute
> checking, resolving and removing per each entry. I think it's better since
> this is simpler than A. If I'm missing something, please let me know.

It seems to me that solution B would be simpler and better. I'll try
to fix this issue by using solution B and rebase the patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-05-01 00:23:06 Re: RFE: Make statistics robust for unplanned events
Previous Message Tom Lane 2021-05-01 00:13:33 Re: Granting control of SUSET gucs to non-superusers