Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Date: 2023-08-17 14:27:36
Message-ID: 34ff6307-d5be-d008-56ed-6249f3cd0749@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On 2023-07-15 Sa 03:00, Alexander Lakhin wrote:
> Hello,
>
> 28.06.2023 21:49, Tom Lane wrote:
>> The bad news is that while investigating this I came across
>> another hazard that seems much messier to fix.  To wit, if
>> you have a tuple with "missing" pass-by-ref columns, then
>> some of the columns output by heap_deform_tuple() will
>> contain pointers into the tupdesc (cf. getmissingattr()).
>> That's not sufficient lifespan in the scenario we're dealing
>> with here: if the tupdesc gets trashed anytime before the
>> eventual ExecEvalFieldStoreForm(), we'll have garbage in the
>> result tuple.
>
> Please excuse me, as I'm definitely late to the party, but may I ask some
> questions to understand the issue discussed? I still can't see a
> practical
> way to get garbage data in a tuple with missing columns, so maybe I'm
> missing something (except for columns).
>
> If we talk about ExecEvalFieldStoreDeForm() -> heap_deform_tuple() ->
> getmissingattr(), then I can't find a way to add a non-null default for a
> record type to exploit this path. The rowtypes test contains:
> -- at the moment this will not work due to ALTER TABLE inadequacy:
> alter table fullname add column suffix text default '';
> ERROR:  cannot alter table "fullname" because column "people.fn" uses
> its row type
>
> So it seems to me, that ExecEvalFieldStoreDeForm() can't be called for a
> record/tuple having missing non-null attributes. And even if ALTER
> TABLE was
> "adequate", does it mean that ExecEvalFieldStoreDeForm() would return
> default
> values from the underlying table type?
>
>> Worse, I fear there may be many other places with latent bugs of the
>> same ilk.  Before the attmissingval code was added, one could assume
>> that the result of heap_deform_tuple would hold good as long as you
>> had a pin on the source tuple.  But now it depends on metadata as
>> well, and I don't think we have a coherent story about that.
>
> But is it true, that having a pin on tupdesc is enough to make sure that
> the result of heap_deform_tuple() holds good?
>
> If so, then probably we should find places, where tupdesc gets unpinned
> before that result is saved somewhere.
> I see the following callers of the heap_deform_tuple():
> src/backend/replication/logical/reorderbuffer.c
>     ReorderBufferToastReplace(): heap_deform_tuple() followed by
> heap_form_tuple()
> src/backend/executor/execTuples.c
>     ExecForceStoreHeapTuple(), ExecForceStoreMinimalTuple(),
> ExecStoreHeapTupleDatum(): use long-living and hopefully pinned
> slot->tts_tupleDescriptor
> src/backend/executor/execExprInterp.c
>     ExecEvalFieldStoreDeForm(): the above question applies here
> src/backend/executor/spi.c
>     SPI_modifytuple(): heap_deform_tuple() followed by heap_form_tuple()
> src/backend/utils/adt/rowtypes.c
>     record_out(), hash_record(), hash_record_extended(): tupdesc
> released after extracting/processing data
>     record_send(): tupdesc released after sending data
>     record_cmp(), record_eq(), record_image_cmp(), record_image_eq():
> tupdesc1. tupdesc2 released after comparing data
> src/backend/utils/adt/jsonfuncs.c
>     populate_record(): heap_deform_tuple() followed by heap_form_tuple()
> src/backend/utils/adt/expandedrecord.c
>     deconstruct_expanded_record() uses long-living (pinned)
> erh->er_tupdesc
> src/backend/utils/adt/ri_triggers.c
>     RI_Initial_Check(), RI_PartitionRemove_Check(): use long-living?
> SPI_tuptable->tupdesc
> src/backend/access/heap/heapam_handler.c
>     reform_and_rewrite_tuple(): uses RelationGetDescr(OldHeap)
> (pinned, I suppose)
> src/backend/access/heap/heaptoast.c
>     heap_toast_delete(), heap_toast_insert_or_update() use long-living
> rel->rd_att;
>     toast_flatten_tuple(): heap_deform_tuple(..., tupleDesc, ...)
> followed by heap_form_tuple(..., tupleDesc, ...)
>     toast_flatten_tuple_to_datum(): heap_deform_tuple followed by
> detoasting attrs
> src/backend/access/heap/heapam.c
>     ExtractReplicaIdentity(): heap_deform_tuple(..., desc, ...)
> followed by heap_form_tuple(..., desc, ...)
> src/backend/access/common/heaptuple.c
>     heap_modify_tuple(): heap_deform_tuple(..., tupleDesc, ...)
> followed by heap_form_tuple(..., tupleDesc, ...)
>     heap_modify_tuple_by_cols(): heap_deform_tuple(..., tupleDesc,
> ...) followed by heap_form_tuple(..., tupleDesc, ...)
> src/backend/access/common/tupconvert.c
>     execute_attr_map_tuple(): heap_deform_tuple() followed by
> heap_form_tuple()
> src/test/regress/regress.c
>     make_tuple_indirect(): heap_deform_tuple() followed by
> heap_form_tuple()
> src/pl/plpgsql/src/pl_exec.c
>     exec_move_row() called with tupdesc = NULL, SPI_tuptable->tupdesc,
> or tupdesc released after the call
> contrib/hstore/hstore_io.c
>     hstore_from_record(), hstore_populate_record(): tupdesc released
> after extracting data
>
> And even if I missed some possibly problematic calls, maybe it's worth to
> consider fixing exactly that places...
>
> Also, besides heap_deform_tuple(), getmissingattr() is called from
> heap_getattr(). And heap_getattr() is used in many places, but most of
> them
> are protected too due to tupdesc pinned.
> Two suspicious places that I found are GetAttributeByName() and
> GetAttributeByNum(), so when using these functions you can really get the
> missing attribute value with the tupdesc released. But what
> circumstances do
> we need to end up with an invalid data?
> 1) Write a function that will call GetAttributeByName() and hold it's
>  result inside for some time.
> 2) Add a passed-by-ref column with a default value to a table.
> 3) Pass to that function a tuple without the "missing" column (just
>  "SELECT func(table) FROM table" won't do, but func(OLD) in a trigger
> should).
> 4) Trigger a relcache invalidation.
> 5) Perform a database access inside that function to process the
> invalidation.
> 6) Access the result of GetAttributeByName() after that.
>
> Maybe we could construct such test case, e.g. add to pg_regress one more
> SQL-accessible C function, bu9ccvhhihvcnb·t I wonder, is this the way
> to go?
>
> On the other hand, probably there are extensions, that use
> GetAttributeByName() in the non-safe way (as the function documentation
> doesn't warn about such issues), but maybe just perform datumCopy inside
> that function (and similar one(s))?
> Though, perhaps I just don't notice the elephant in the core...

I was waiting to see if others had a reaction to this, but ...

I think the last point (possible unsafe use by extensions) is reason
enough to proceed with the proposed mitigation, which is pretty simple
and non-invasive..

cheers

andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-08-17 14:35:23 BUG #18059: Unexpected error 25001 in stored procedure
Previous Message Masahiko Sawada 2023-08-17 14:03:30 Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()