Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Date: 2020-02-29 06:10:52
Message-ID: 20200229061052.qceiitudgo5es3ns@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> Turns out that I am to blame for that. All the way back in 9.4. For
> logical decoding I needed to make ScanPgRelation() use a specific type
> of snapshot during one corner case of logical decoding. For reasons lost
> to time, I didn't continue to pass NULL to systable_beginscan() in the
> plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
> Note the missing RegisterSnapshot()...

Oh, I pierced through the veil: It's likely because the removal of
SnapshotNow happened concurrently to the development of logical
decoding. Before using proper snapshot for catalog scans passing in
SnapshotNow was precisely the right thing to do...

I think that's somewhat unlikely to actively cause problems in practice,
as ScanPgRelation() requires that we already have a lock on the
relation, we only look for a single row, and I don't think we rely on
the result's tid to be correct. I don't immediately see a case where
this would trigger in a problematic way.

After fixing the ScanPgRelation case I found another occurance of the
problem:
The catalogsnapshot copied at (ignore the slight off line numbers please):

#0 GetCatalogSnapshot (relid=1249) at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:454
#1 0x0000560d725b198d in systable_beginscan (heapRelation=0x7f13429f2a08, indexId=2659, indexOK=true, snapshot=0x0, nkeys=2, key=0x7fff26e04db0)
at /home/andres/src/postgresql/src/backend/access/index/genam.c:378
#2 0x0000560d72b4117f in SearchCatCacheMiss (cache=0x560d74590800, nkeys=2, hashValue=1029784422, hashIndex=102, v1=697088, v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1359
#3 0x0000560d72b41045 in SearchCatCacheInternal (cache=0x560d74590800, nkeys=2, v1=697088, v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1299
#4 0x0000560d72b40d09 in SearchCatCache (cache=0x560d74590800, v1=697088, v2=3, v3=0, v4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1153
#5 0x0000560d72b5b65f in SearchSysCache (cacheId=7, key1=697088, key2=3, key3=0, key4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1112
#6 0x0000560d72b5b9dd in SearchSysCacheCopy (cacheId=7, key1=697088, key2=3, key3=0, key4=0)
at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1187
#7 0x0000560d72645501 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1821
#8 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397
#9 0x0000560d7263fa17 in deleteOneObject (object=0x560d745d37a0, depRel=0x7fff26e052d0, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1261
#10 0x0000560d7263e4d6 in deleteObjectsInList (targetObjects=0x560d745d1ec0, depRel=0x7fff26e052d0, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:271
#11 0x0000560d7263e58a in performDeletion (object=0x7fff26e05304, behavior=DROP_CASCADE, flags=29)
at /home/andres/src/postgresql/src/backend/catalog/dependency.c:356
#12 0x0000560d72655f3f in RemoveTempRelations (tempNamespaceId=686167) at /home/andres/src/postgresql/src/backend/catalog/namespace.c:4155
#13 0x0000560d72655f72 in RemoveTempRelationsCallback (code=0, arg=0) at /home/andres/src/postgresql/src/backend/catalog/namespace.c:4174
#14 0x0000560d729a58e2 in shmem_exit (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:239
#15 0x0000560d729a57b5 in proc_exit_prepare (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:194

is released at the end of systable_endscan. And then xmin is reset at:

#0 SnapshotResetXmin () at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:1038
#1 0x0000560d72bb9bfc in InvalidateCatalogSnapshot () at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:521
#2 0x0000560d72b43d62 in LocalExecuteInvalidationMessage (msg=0x7fff26e04e70) at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:562
#3 0x0000560d729b277b in ReceiveSharedInvalidMessages (invalFunction=0x560d72b43d26 <LocalExecuteInvalidationMessage>,
resetFunction=0x560d72b43f92 <InvalidateSystemCaches>) at /home/andres/src/postgresql/src/backend/storage/ipc/sinval.c:120
#4 0x0000560d72b44070 in AcceptInvalidationMessages () at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:683
#5 0x0000560d729b8f4f in LockRelationOid (relid=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:136
#6 0x0000560d725341e3 in relation_open (relationId=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/access/common/relation.c:56
#7 0x0000560d725b22c3 in index_open (relationId=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:130
#8 0x0000560d727b6991 in ExecOpenIndices (resultRelInfo=0x560d7468ffa0, speculative=false)
at /home/andres/src/postgresql/src/backend/executor/execIndexing.c:199
#9 0x0000560d7264fc7e in CatalogOpenIndexes (heapRel=0x7f13429f2a08) at /home/andres/src/postgresql/src/backend/catalog/indexing.c:51
#10 0x0000560d72650010 in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0)
at /home/andres/src/postgresql/src/backend/catalog/indexing.c:228
#11 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1830
#12 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397

which then hits an assertion at:

#2 0x0000560d725a4e86 in heap_page_prune_opt (relation=0x7f13429f2a08, buffer=3153) at /home/andres/src/postgresql/src/backend/access/heap/pruneheap.c:131
#3 0x0000560d7259a5b3 in heapam_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60, slot=0x560d745d1ef8,
call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at /home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:137
#4 0x0000560d725eeedc in table_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60, slot=0x560d745d1ef8,
call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at /home/andres/src/postgresql/src/include/access/tableam.h:1020
#5 0x0000560d725ef478 in table_index_fetch_tuple_check (rel=0x7f13429f2a08, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60, all_dead=0x7fff26e04c59)
at /home/andres/src/postgresql/src/backend/access/table/tableam.c:213
#6 0x0000560d725b4ef7 in _bt_check_unique (rel=0x7f1342a29ce0, insertstate=0x7fff26e04d90, heapRel=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES,
is_unique=0x7fff26e04dc1, speculativeToken=0x7fff26e04d88) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:452
#7 0x0000560d725b48e0 in _bt_doinsert (rel=0x7f1342a29ce0, itup=0x560d746892b0, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7f13429f2a08)
at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:247
#8 0x0000560d725c0167 in btinsert (rel=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0, ht_ctid=0x560d746901d4, heapRel=0x7f13429f2a08,
checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtree.c:207
#9 0x0000560d725b2530 in index_insert (indexRelation=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0, heap_t_ctid=0x560d746901d4,
heapRelation=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:186
#10 0x0000560d7264ff34 in CatalogIndexInsert (indstate=0x560d7468ffa0, heapTuple=0x560d746901d0)
at /home/andres/src/postgresql/src/backend/catalog/indexing.c:157
#11 0x0000560d7265003e in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0)
at /home/andres/src/postgresql/src/backend/catalog/indexing.c:232
#12 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1830
#13 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397

So, um. What happens is that doDeletion() does a catalog scan, which
sets a snapshot. The results of that catalog scan are then used to
perform modifications. But at that point there's no guarantee that we
still hold *any* snapshot, as e.g. invalidations can trigger the catalog
snapshot being released.

I don't see how that's safe. Without ->xmin preventing that,
intermediate row versions that we did look up could just get vacuumed
away, and replaced with a different row. That does seem like a serious
issue?

I think there's likely a lot of places that can hit this? What makes it
safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
->xmin when there previously has been *any* catalog access? Because in
contrast to normal table modifications, there's currently nothing at all
forcing us to hold a snapshot between catalog lookups an their
modifications?

Am I missing something? Or is this a fairly significant hole in our
arrangements?

The easiest way to fix this would probably be to have inval.c call a
version of InvalidateCatalogSnapshot() that leaves the oldest catalog
snapshot around, but sets up things so that GetCatalogSnapshot() will
return a freshly taken snapshot? ISTM that pretty much every
InvalidateCatalogSnapshot() call within a transaction needs that behaviour?

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2020-02-29 08:40:44 Re: BUG #15858: could not stat file - over 4GB
Previous Message Pavel Stehule 2020-02-29 05:43:54 proposal \gcsv