Issue with logical replication: MyPgXact->xmin already is valid

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Issue with logical replication: MyPgXact->xmin already is valid
Date: 2017-10-06 10:16:18
Message-ID: 768f2cb6-0e2c-4dcd-33f4-d54c1addb9a3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When creating logical replication slots we quite often get the following
error:

ERROR: cannot build an initial slot snapshot when MyPgXact->xmin
already is valid

which cause restart of WAL sender.
The comment to this line doesn't clarify much:

/* so we don't overwrite the existing value */
if (TransactionIdIsValid(MyPgXact->xmin))
elog(ERROR, "cannot build an initial slot snapshot when
MyPgXact->xmin already is valid");

I have checked that MyPgXact->xmin is set by GetSnapshotData:

#3 0x000000000086025e in GetSnapshotData (snapshot=0xf28040
<CatalogSnapshotData>) at procarray.c:1714
#4 0x0000000000a48523 in GetNonHistoricCatalogSnapshot (relid=2615) at
snapmgr.c:479
#5 0x0000000000a484d3 in GetCatalogSnapshot (relid=2615) at snapmgr.c:452
#6 0x00000000004f15bf in systable_beginscan (heapRelation=0x256bdb0,
indexId=2684, indexOK=1 '\001', snapshot=0x0, nkeys=1,
key=0x7ffdf633c770) at genam.c:353
#7 0x00000000009d676e in SearchCatCache (cache=0x25230a8, v1=39586984,
v2=0, v3=0, v4=0) at catcache.c:1169
#8 0x00000000009ec13d in SearchSysCache (cacheId=35, key1=39586984,
key2=0, key3=0, key4=0) at syscache.c:1109
#9 0x00000000009ec259 in GetSysCacheOid (cacheId=35, key1=39586984,
key2=0, key3=0, key4=0) at syscache.c:1187
#10 0x0000000000574b85 in get_namespace_oid (nspname=0x25c0ca8
"pg_catalog", missing_ok=1 '\001') at namespace.c:3009
#11 0x0000000000574886 in LookupExplicitNamespace (nspname=0x25c0ca8
"pg_catalog", missing_ok=1 '\001') at namespace.c:2871
#12 0x000000000057437d in get_ts_config_oid (names=0x25c2438,
missing_ok=1 '\001') at namespace.c:2653
#13 0x00000000009f56ca in check_TSCurrentConfig (newval=0x7ffdf633cb90,
extra=0x7ffdf633cba0, source=PGC_S_FILE) at ts_cache.c:600
#14 0x0000000000a1bbdc in call_string_check_hook (conf=0xf26870
<ConfigureNamesString+6992>, newval=0x7ffdf633cb90,
extra=0x7ffdf633cba0, source=PGC_S_FILE, elevel=12) at guc.c:9912
#15 0x0000000000a14420 in parse_and_validate_value (record=0xf26870
<ConfigureNamesString+6992>, name=0x25c0620
"default_text_search_config", value=0x25c0658 "pg_catalog.english",
source=PGC_S_FILE, elevel=12,
newval=0x7ffdf633cb90, newextra=0x7ffdf633cba0) at guc.c:5840
#16 0x0000000000a15543 in set_config_option (name=0x25c0620
"default_text_search_config", value=0x25c0658 "pg_catalog.english",
context=PGC_SIGHUP, source=PGC_S_FILE, action=GUC_ACTION_SET,
changeVal=1 '\001', elevel=12,
is_reload=0 '\000') at guc.c:6442
#17 0x0000000000a1eb5c in ProcessConfigFileInternal (context=PGC_SIGHUP,
applySettings=1 '\001', elevel=13) at guc-file.l:439
#18 0x0000000000a1e4ee in ProcessConfigFile (context=PGC_SIGHUP) at
guc-file.l:155
#19 0x000000000082433c in WalSndWaitForWal (loc=25991904) at
walsender.c:1309
#20 0x0000000000823423 in logical_read_xlog_page (state=0x25a4f10,
targetPagePtr=25985024, reqLen=6880, targetRecPtr=25991880,
cur_page=0x25a6c60 "\227\320\005", pageTLI=0x25a57bc) at walsender.c:761
#21 0x0000000000558c3d in ReadPageInternal (state=0x25a4f10,
pageptr=25985024, reqLen=6880) at xlogreader.c:556
#22 0x0000000000558405 in XLogReadRecord (state=0x25a4f10,
RecPtr=25991880, errormsg=0x7ffdf633cea8) at xlogreader.c:255
#23 0x000000000080dda6 in DecodingContextFindStartpoint (ctx=0x25a4c50)
at logical.c:450
#24 0x0000000000823a0c in CreateReplicationSlot (cmd=0x24dc398) at
walsender.c:934
#25 0x00000000008247e4 in exec_replication_command (cmd_string=0x254e8f0
"CREATE_REPLICATION_SLOT \"shardman_copy_t_10_3_4_17307_sync_17302\"
TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1515
#26 0x000000000088eccc in PostgresMain (argc=1, argv=0x24f0b28,
dbname=0x24f0948 "postgres", username=0x24bf7f0 "knizhnik") at
postgres.c:4086
#27 0x00000000007ef9e2 in BackendRun (port=0x24dee00) at postmaster.c:4357
#28 0x00000000007ef10c in BackendStartup (port=0x24dee00) at
postmaster.c:4029
#29 0x00000000007eb6cc in ServerLoop () at postmaster.c:1753
#30 0x00000000007eacb8 in PostmasterMain (argc=3, argv=0x24bd660) at
postmaster.c:1361
#31 0x0000000000728593 in main (argc=3, argv=0x24bd660) at main.c:228

I wonder if it is normal situation or something goes wrong?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-10-06 10:34:04 Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Previous Message Craig Ringer 2017-10-06 09:57:24 Re: Still another race condition in recovery TAP tests