Re: MultiXact\SLRU buffers configuration

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-05-14 05:19:42
Message-ID: 3B099683-ECCD-43CD-A3D6-F08C3745002A@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 14 мая 2020 г., в 06:25, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
>
> At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>>
>>
>>> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>>>
>>> I've went ahead and created 3 patches:
>>> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
>>> 2. Reduce locking level to shared on read of MultiXactId members
>>> 3. Configurable cache size
>>
>> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>>
>> When we create a new MultiXact we do:
>> 1. Generate new MultiXactId under MultiXactGenLock
>> 2. Record new mxid with members and offset to WAL
>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>> 4. Write members to SLRU under MultiXactMemberControlLock
>
> But, don't we hold exclusive lock on the buffer through all the steps
> above?
Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers committed tuple delete, but standby sees it as alive.

>> When we read MultiXact we do:
>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
>
> So transactions never see such incomplete mxids, I believe.
I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is locking whole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.

It looks like this:
0x00007fcd56896ff7 in __GI___select (nfds=nfds(at)entry=0, readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0, exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41
#0 0x00007fcd56896ff7 in __GI___select (nfds=nfds(at)entry=0, readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0, exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41
#1 0x000056186e0d54bd in pg_usleep (microsec=microsec(at)entry=1000) at ./build/../src/port/pgsleep.c:56
#2 0x000056186dd5edf2 in GetMultiXactIdMembers (from_pgupgrade=0 '\000', onlyLock=<optimized out>, members=0x7ffd83377080, multi=3106214809) at ./build/../src/backend/access/transam/multixact.c:1370
#3 GetMultiXactIdMembers () at ./build/../src/backend/access/transam/multixact.c:1202
#4 0x000056186dd2d2d9 in MultiXactIdGetUpdateXid (xmax=<optimized out>, t_infomask=<optimized out>) at ./build/../src/backend/access/heap/heapam.c:7039
#5 0x000056186dd35098 in HeapTupleGetUpdateXid (tuple=tuple(at)entry=0x7fcba3b63d58) at ./build/../src/backend/access/heap/heapam.c:7080
#6 0x000056186e0cd0f8 in HeapTupleSatisfiesMVCC (htup=<optimized out>, snapshot=0x56186f44a058, buffer=230684) at ./build/../src/backend/utils/time/tqual.c:1091
#7 0x000056186dd2d922 in heapgetpage (scan=scan(at)entry=0x56186f4c8e78, page=page(at)entry=3620) at ./build/../src/backend/access/heap/heapam.c:439
#8 0x000056186dd2ea7c in heapgettup_pagemode (key=0x0, nkeys=0, dir=ForwardScanDirection, scan=0x56186f4c8e78) at ./build/../src/backend/access/heap/heapam.c:1034
#9 heap_getnext (scan=scan(at)entry=0x56186f4c8e78, direction=direction(at)entry=ForwardScanDirection) at ./build/../src/backend/access/heap/heapam.c:1801
#10 0x000056186de84f51 in SeqNext (node=node(at)entry=0x56186f4a4f78) at ./build/../src/backend/executor/nodeSeqscan.c:81
#11 0x000056186de6a3f1 in ExecScanFetch (recheckMtd=0x56186de84ef0 <SeqRecheck>, accessMtd=0x56186de84f20 <SeqNext>, node=0x56186f4a4f78) at ./build/../src/backend/executor/execScan.c:97
#12 ExecScan (node=0x56186f4a4f78, accessMtd=0x56186de84f20 <SeqNext>, recheckMtd=0x56186de84ef0 <SeqRecheck>) at ./build/../src/backend/executor/execScan.c:164

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-05-14 05:23:05 Re: PG 13 release notes, first draft
Previous Message Kyotaro Horiguchi 2020-05-14 05:12:25 Re: pg13: xlogreader API adjust