Re: Make tuple deformation faster

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Victor Yegorov <vyegorov(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make tuple deformation faster
Date: 2025-06-14 07:04:28
Message-ID: CAApHDvrs7sqcv7ZPg9JMuKwSeEsQybTfeUivpieTdZZb2xjHyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 7 Jun 2025 at 23:00, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> #5 0x00005dd0a1788377 in ExceptionalCondition (conditionName=0x5dd0a1822ee8 "memcmp(&tmp, cattr, sizeof(CompactAttribute)) == 0",
> fileName=0x5dd0a1822ed9 "tupdesc.c", lineNumber=165) at assert.c:66

Thanks for finding that and for the test case. I can recreate this locally.

It seems to be down to a race condition with the shared TupleDesc. If
I expand that memcmp Assert to check equality on each of the
individual fields, I see that it's failing on attcacheoff. This is
because a worker does some tuple deforming while the failing Assert
backend is calling verify_compact_attribute(). That results in the
attcacheoff being changed by the deforming backend and the new
attcacheoff becomes visible sometime after the
verify_compact_attribute() backend does tmp.attcacheoff =
cattr.attcacheoff; and before it does the Assert.

Using your script with the attached more_asserting.patch applied, I get:

ITERATION 13
NOTICE: tmp.attcacheoff = -1, cattr->attcacheoff = 4

(So, it seems my concerns about this in [1] were warranted...) This
whole shared TupleDesc code seems quite dangerous. Unfortunately, this
isn't a new problem to PostgreSQL 18. This race condition existed
prior to CompactAttribute, it's just we never noticed because there
was no code checking for this before. The hazard here is that any code
checking a shared TupleDesc's attcacheoff might not get the same value
on subsequent reads. Maybe that's not too bad as the compiler would
likely only load the value from the struct once and store that in a
register. Also, since the offset is an int, store should be written
atomically and won't leave readers susceptible to torn reads.

If we can live with the race condition for now, then it's possible to
fix the failing Assert by making a copy of the TupleDesc's
CompactAttribute so we're not Asserting against the shared memory one.
I've done that in the attached fix_idea.patch.

I do have a half-done patch which changes where attcacheoff is
populated so it gets populated when the TupleDesc is made rather than
during tuple deformation. That allows the deforming code to be shrunk
down and allows us to use a special-case deformer to deform all
leading non-nullable fixed-width attributes. That should also fix this
issue, but that's v19 material.

I propose I just push the fix_idea.patch and leave it at that for v18.

Does anyone have any other ideas?

David

[1] https://postgr.es/m/CAApHDvr8e8-7tL5SUHoB-CDKF162BEMszDumH0W8-%2BxZzrpP0w%40mail.gmail.com

Attachment Content-Type Size
more_asserting.patch application/octet-stream 1.2 KB
fix_idea.patch application/octet-stream 1.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-06-14 07:05:36 Re: pg_dump --with-* options
Previous Message Amit Kapila 2025-06-14 05:37:49 Re: Replication slot is not able to sync up