Re: snapshot too old issues, first around wraparound and then more.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2020-04-01 23:59:51
Message-ID: 20200401235951.vo4ilpdcukdodju2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-01 15:30:39 -0700, Peter Geoghegan wrote:
> On Wed, Apr 1, 2020 at 3:00 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I like that idea. I think that I've spotted what may be an independent
> > bug, but I have to wait around for a minute or two to reproduce it
> > each time. Makes it hard to get to a minimal test case.
>
> I now have simple steps to reproduce a bug when I start Postgres
> master with "--old_snapshot_threshold=1" (1 minute).

Thanks, that's super helpful.

> This example shows wrong answers to queries in session 2:
>
> Session 1:
>
> pg(at)regression:5432 [1444078]=# create table snapshot_bug (col int4);
> CREATE TABLE
> pg(at)regression:5432 [1444078]=# create index on snapshot_bug (col );
> CREATE INDEX
> pg(at)regression:5432 [1444078]=# insert into snapshot_bug select i from
> generate_series(1, 500) i;
> INSERT 0 500
>
> Session 2 starts, and views the data in a serializable transaction:
>
> pg(at)regression:5432 [1444124]=# begin isolation level serializable ;
> BEGIN
> pg(at)regression:5432 [1444124]=*# select col from snapshot_bug where col
> >= 0 order by col limit 14;
> ┌─────┐
> │ col │
> ├─────┤
> │ 1 │
> │ 2 │
> │ 3 │
> │ 4 │
> │ 5 │
> │ 6 │
> │ 7 │
> │ 8 │
> │ 9 │
> │ 10 │
> │ 11 │
> │ 12 │
> │ 13 │
> │ 14 │
> └─────┘
> (14 rows)
>
> So far so good. Now session 2 continues:

> pg(at)regression:5432 [1444078]=# delete from snapshot_bug where col < 15;
> DELETE 14

I got a bit confused here - you seemed to have switched session 1 and 2
around? Doesn't seem to matter much though, I was able to reproduce this.

This indeed seems a separate bug.

The backtrace to the point where the xmin horizon is affected by
TransactionIdLimitedForOldSnapshots() is:

#0 TransactionIdLimitedForOldSnapshots (recentXmin=2082816071, relation=0x7f52ff3b56f8) at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:1870
#1 0x00005567f4cd1a55 in heap_page_prune_opt (relation=0x7f52ff3b56f8, buffer=175) at /home/andres/src/postgresql/src/backend/access/heap/pruneheap.c:106
#2 0x00005567f4cc70e2 in heapam_index_fetch_tuple (scan=0x5567f6db3028, tid=0x5567f6db2e40, snapshot=0x5567f6d67d68, slot=0x5567f6db1b60,
call_again=0x5567f6db2e46, all_dead=0x7ffce13d78de) at /home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:137
#3 0x00005567f4cdf5e6 in table_index_fetch_tuple (scan=0x5567f6db3028, tid=0x5567f6db2e40, snapshot=0x5567f6d67d68, slot=0x5567f6db1b60,
call_again=0x5567f6db2e46, all_dead=0x7ffce13d78de) at /home/andres/src/postgresql/src/include/access/tableam.h:1020
#4 0x00005567f4ce0767 in index_fetch_heap (scan=0x5567f6db2de0, slot=0x5567f6db1b60) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:577
#5 0x00005567f4f19191 in IndexOnlyNext (node=0x5567f6db16a0) at /home/andres/src/postgresql/src/backend/executor/nodeIndexonlyscan.c:169
#6 0x00005567f4ef4bc4 in ExecScanFetch (node=0x5567f6db16a0, accessMtd=0x5567f4f18f20 <IndexOnlyNext>, recheckMtd=0x5567f4f1951c <IndexOnlyRecheck>)
at /home/andres/src/postgresql/src/backend/executor/execScan.c:133
#7 0x00005567f4ef4c39 in ExecScan (node=0x5567f6db16a0, accessMtd=0x5567f4f18f20 <IndexOnlyNext>, recheckMtd=0x5567f4f1951c <IndexOnlyRecheck>)
at /home/andres/src/postgresql/src/backend/executor/execScan.c:182
#8 0x00005567f4f195d4 in ExecIndexOnlyScan (pstate=0x5567f6db16a0) at /home/andres/src/postgresql/src/backend/executor/nodeIndexonlyscan.c:317
#9 0x00005567f4ef0f71 in ExecProcNodeFirst (node=0x5567f6db16a0) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:444
#10 0x00005567f4f1d694 in ExecProcNode (node=0x5567f6db16a0) at /home/andres/src/postgresql/src/include/executor/executor.h:245
#11 0x00005567f4f1d7d2 in ExecLimit (pstate=0x5567f6db14b8) at /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:95
#12 0x00005567f4ef0f71 in ExecProcNodeFirst (node=0x5567f6db14b8) at /home/andres/src/postgresql/src/backend/executor/execProcnode.c:444
#13 0x00005567f4ee57c3 in ExecProcNode (node=0x5567f6db14b8) at /home/andres/src/postgresql/src/include/executor/executor.h:245
#14 0x00005567f4ee83dd in ExecutePlan (estate=0x5567f6db1280, planstate=0x5567f6db14b8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true,
numberTuples=0, direction=ForwardScanDirection, dest=0x5567f6db3c78, execute_once=true)
at /home/andres/src/postgresql/src/backend/executor/execMain.c:1646
#15 0x00005567f4ee5e23 in standard_ExecutorRun (queryDesc=0x5567f6d0c490, direction=ForwardScanDirection, count=0, execute_once=true)
at /home/andres/src/postgresql/src/backend/executor/execMain.c:364
#16 0x00005567f4ee5c35 in ExecutorRun (queryDesc=0x5567f6d0c490, direction=ForwardScanDirection, count=0, execute_once=true)
at /home/andres/src/postgresql/src/backend/executor/execMain.c:308
#17 0x00005567f510c4de in PortalRunSelect (portal=0x5567f6d49260, forward=true, count=0, dest=0x5567f6db3c78)
at /home/andres/src/postgresql/src/backend/tcop/pquery.c:912
#18 0x00005567f510c191 in PortalRun (portal=0x5567f6d49260, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5567f6db3c78,
altdest=0x5567f6db3c78, qc=0x7ffce13d7de0) at /home/andres/src/postgresql/src/backend/tcop/pquery.c:756
#19 0x00005567f5106015 in exec_simple_query (query_string=0x5567f6cdd7a0 "select col from snapshot_bug where col >= 0 order by col limit 14;")
at /home/andres/src/postgresql/src/backend/tcop/postgres.c:1239

which in my tree is the elog() in the block below:
if (!same_ts_as_threshold)
{
if (ts == update_ts)
{
PrintOldSnapshotMapping("non cached limit via update_ts", false);

xlimit = latest_xmin;
if (NormalTransactionIdFollows(xlimit, recentXmin))
{
elog(LOG, "increasing threshold from %u to %u (via update_ts)",
recentXmin, xlimit);
SetOldSnapshotThresholdTimestamp(ts, xlimit);
}
}

the mapping at that point is:

2020-04-01 16:14:00.025 PDT [1272381][4/2:0] WARNING: old snapshot mapping at "non cached limit via update_ts" with head ts: 1, current entries: 2 max entries: 11, offset: 0
entry 0 (ring 0): min 1: xid 2082816067
entry 1 (ring 1): min 2: xid 2082816071

and the xmin changed is:
2020-04-01 16:14:00.026 PDT [1272381][4/2:0] LOG: increasing threshold from 2082816071 to 2082816072 (via update_ts)

in the frame of heap_prune_page_opt():
(rr) p snapshot->whenTaken
$5 = 639097973135655
(rr) p snapshot->lsn
$6 = 133951784192
(rr) p MyPgXact->xmin
$7 = 2082816071
(rr) p BufferGetBlockNumber(buffer)
$11 = 0

in the frame for TransactionIdLimitedForOldSnapshots:
(rr) p ts
$8 = 639098040000000
(rr) p latest_xmin
$9 = 2082816072
(rr) p update_ts
$10 = 639098040000000

The primary issue here is that there is no TestForOldSnapshot() in
heap_hot_search_buffer(). Therefore index fetches will never even try to
detect that tuples it needs actually have already been pruned away.

The wrong queries I saw took longer to reproduce, so I've not been able
to debug the precise reasons.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-02 00:09:25 Re: Add A Glossary
Previous Message Tom Lane 2020-04-01 23:57:32 Re: Proposal: Expose oldest xmin as SQL function for monitoring