From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A few patches to clarify snapshot management |
Date: | 2025-08-19 00:14:36 |
Message-ID: | 20250819001436.38.nmisch@google.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 15, 2025 at 02:12:03PM +0300, Heikki Linnakangas wrote:
> On 10/08/2025 01:23, Noah Misch wrote:
> > On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:
> > > On 07/01/2025 00:00, Andres Freund wrote:
> > > > Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
> > > > code in an output functions calling GetTransactionSnapshot() or such to do
> > > > some internal lookups?
> > >
> > > I haven't seen any. And I don't think that would work correctly while doing
> > > logical decoding anyway, because historical snapshots only track XIDs that
> > > modify catalogs. regclassout and enumout do work because they use the
> > > catalog snapshot rather than GetTransactionSnapshot().
> > >
> > > (I committed that change in commit 1585ff7387 already, but discussion is
> > > still welcome of course)
> >
> > https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
> > for its row_filter feature.
> > So far, I know of these options:
> >
> > 1. Make pglogical block the row_filter feature for any v18+ origin.
> > 2. Revert postgresql.git commit 1585ff7387.
> > 3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
> > snapshots directly. That should keep simple row_filter expressions like
> > "col > 0" functioning. Entering plpgsql or similarly-complex logic will
> > fail with "cannot take query snapshot during logical decoding", and we'll
> > consider that to be working as intended.
> > 4. Fail later and lazily, for just the most-unreasonable cases. For example,
> > fail when HistoricSnapshot applies to a write operation. (Maybe this
> > already fails. I didn't check.)
> >
> > Which of those or other options should we consider?
>
> Hmm, so what snapshot should you use for these row filter expressions
> anyway?
>
> As long as you don't try to access any tables, it doesn't matter
> much. Although, the effective catalog snapshot also affects how any
> functions used in the expression are resolved. If you CREATE OR REPLACE the
> row filter function while logical decoding is active, what version of the
> function do you expect to be used? I think that's a little fuzzy, and you
> might get different answer for the initial sync step and the on-going
> decoding. We don't necessarily need to solve that here. Nevertheless, what
> would be the least surprising answer to that?
>
> Currently, in v17 and below, we will use the historic snapshot, which
> represents the point in time that we are decoding. Is that the right
> choice? I'm not sure. A historic snapshot is only supposed to be used for
> catalogs, it's not clear if it works correctly for arbitrary queries. And
> it's not clear it's the right choice for resolving the row filter functions
> either.
>
> How about always using a fresh snapshot instead? Instead of pushing the
> historic snapshot as the active snapshot, _disable_ the historic snapshot
> and use GetTransactionSnapshot() to acquire a regular snapshot?
I see advantages in using the historic snapshot:
1. It's the longstanding behavior, and applications aren't complaining.
2. If someone wants "fresh snapshot", they can do that today with a C
extension that provides an execute_at_fresh_snapshot(sql text) SQL
function. If we adopt the fresh snapshot in pglogical or in core, I don't
see a comparably-clean way for the application code to get back to the
historic snapshot. (That's because the historic snapshot lives only in
stack variables at the moment in question.)
3. If an application is relying on the longstanding behavior and needs to
adapt to the proposed "fresh snapshot" behavior, that may be invasive to
implement and harmful to performance. For example, instead of reading from
a user_catalog_table inside the filter, the application may need to
duplicate that table's data into the rows being filtered.
Does the "fresh snapshot" alternative bring strengths to outweigh those?
> Another apprach is to continue down the path you attempted. There are many
> places in plpgsql and elsewhere where we call GetTransactionSnapshot(), but
> are they really necessary when you're executing something like the row
> filter expression? I think the row filter expression is supposed to be
> read-only. There are optimizations already to avoid GetTransactionSnapshot()
> calls in read-only functions (i.e. immutable), but we could expand those to
> any function in a read-only transaction, and set XactReadOnly while
> evaluating the row filter expression.
>
> The second attached patch makes that change in PostgreSQL code. With those
> changes, the pglogical change you attempted to do
> "PushActiveSnapshot(GetCatalogSnapshot(InvalidOid))" seems to work. I'm not
> sure it covers all the cases though, there might be more
> GetTransactionSnapshot() calls lurking.
Yep. As you say, the GetTransactionSnapshot() calls probably aren't
necessary, so this could work long-term. That patch's edit in src/pl may
imply similar needs lurking in non-core PLs.
> Finally, attached is a pglogical test case to test what happens if you
> change the datatype of the table, while there's decoding active with a
> complex row filter function that also accesses the table. I'm not sure how
> that should behave and I think that falls into the category of "don't do
> that". But FWIW, on v17 it tries to read it fails with this:
>
> ERROR: could not read blocks 0..0 in file "base/16384/17512": read only 0
> of 8192 bytes
Reading reliably with a historic snapshot would require adding
user_catalog_table. Ideally, the error message would lead the user to a
conclusion like "you're reading a non-catalog with a historic snapshot; this
is expected after a rewrite of that non-catalog". With user_catalog_table:
- CREATE TABLE public.rowfilter_ddl_tbl (id int primary key);
+ CREATE TABLE public.rowfilter_ddl_tbl (id int primary key) WITH (user_catalog_table = true);
... the v17 ALTER fails with 'ERROR: cannot rewrite table "rowfilter_ddl_tbl"
used as a catalog table'. Not bad. Incidentally, while that's the result
with a production build, a v17 --enable-cassert build crashes earlier (at
today's REL_17_STABLE and today's pglogical head):
#4 0x000055ed959bd415 in ExceptionalCondition (conditionName=conditionName(at)entry=0x55ed95ae46f8 "ActiveSnapshot->as_snap->active_count == 1",
fileName=fileName(at)entry=0x55ed95a49ce1 "snapmgr.c", lineNumber=lineNumber(at)entry=718) at assert.c:66
#5 0x000055ed959ff409 in UpdateActiveSnapshotCommandId () at snapmgr.c:718
#6 0x000055ed956f8342 in _SPI_execute_plan (plan=plan(at)entry=0x55edab044410, options=options(at)entry=0x7ffd8b03f4b0, snapshot=snapshot(at)entry=0x0,
crosscheck_snapshot=crosscheck_snapshot(at)entry=0x0, fire_triggers=fire_triggers(at)entry=true) at spi.c:2668
#7 0x000055ed956f8bc2 in SPI_execute_plan_with_paramlist (plan=0x55edab044410, params=0x55edab021750, read_only=false, tcount=tcount(at)entry=0) at spi.c:751
#8 0x00007f18a25e78a7 in exec_run_select (estate=estate(at)entry=0x7ffd8b03fbd0, expr=expr(at)entry=0x55edab030b58, portalP=portalP(at)entry=0x0, maxtuples=0) at pl_exec.c:5824
#9 0x00007f18a25e7b8a in exec_eval_expr (estate=0x7ffd8b03fbd0, expr=0x55edab030b58, isNull=0x7ffd8b03f597, rettype=0x7ffd8b03f598, rettypmod=0x7ffd8b03f59c)
at pl_exec.c:5714
#10 0x00007f18a25ea7f2 in exec_assign_expr (estate=estate(at)entry=0x7ffd8b03fbd0, target=0x55edab020b50, expr=0x55edab030b58) at pl_exec.c:5039
#11 0x00007f18a25ebe45 in exec_stmt_assign (estate=0x7ffd8b03fbd0, stmt=0x55edab030ac8) at pl_exec.c:2156
#12 exec_stmts (estate=estate(at)entry=0x7ffd8b03fbd0, stmts=0x55edab030c38) at pl_exec.c:2020
#13 0x00007f18a25ebdd3 in exec_stmt_if (estate=0x55edaaf3af40, stmt=<optimized out>) at pl_exec.c:2535
#14 exec_stmts (estate=estate(at)entry=0x7ffd8b03fbd0, stmts=0x55edab030cd8) at pl_exec.c:2036
#15 0x00007f18a25ede6b in exec_stmt_block (estate=estate(at)entry=0x7ffd8b03fbd0, block=block(at)entry=0x55edab0310d0) at pl_exec.c:1943
#16 0x00007f18a25edf6d in exec_toplevel_block (estate=estate(at)entry=0x7ffd8b03fbd0, block=0x55edab0310d0) at pl_exec.c:1634
#17 0x00007f18a25ee7e1 in plpgsql_exec_function (func=func(at)entry=0x55edab024a18, fcinfo=fcinfo(at)entry=0x55edaafd8a18, simple_eval_estate=simple_eval_estate(at)entry=0x0,
simple_eval_resowner=simple_eval_resowner(at)entry=0x0, procedure_resowner=procedure_resowner(at)entry=0x0, atomic=<optimized out>) at pl_exec.c:623
#18 0x00007f18a25f8e43 in plpgsql_call_handler (fcinfo=0x55edaafd8a18) at pl_handler.c:277
#19 0x000055ed956b239f in ExecInterpExpr (state=0x55edaafd83a0, econtext=0x55edab012180, isnull=<optimized out>) at execExprInterp.c:740
#20 0x00007f18a3003bc0 in pglogical_change_filter (data=0x55edaafb7268, relation=0x7f18a2a93ab8, change=0x55edab006dc0, att_list=<synthetic pointer>)
at pglogical_output_plugin.c:656
#21 pg_decode_change (ctx=0x55edaafb5460, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0) at pglogical_output_plugin.c:690
#22 0x000055ed957fe6f9 in change_cb_wrapper (cache=<optimized out>, txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at logical.c:1137
#23 0x000055ed9580a1f8 in ReorderBufferApplyChange (rb=<optimized out>, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0, streaming=false)
at reorderbuffer.c:2019
#24 ReorderBufferProcessTXN (rb=0x55edaafb9ce0, txn=0x55edaafefdb0, commit_lsn=37174848, snapshot_now=<optimized out>, command_id=command_id(at)entry=0,
streaming=streaming(at)entry=false) at reorderbuffer.c:2300
#25 0x000055ed9580a4dc in ReorderBufferReplay (txn=<optimized out>, rb=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=0, xid=<optimized out>) at reorderbuffer.c:2767
#26 0x000055ed9580b041 in ReorderBufferCommit (rb=<optimized out>, xid=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=<optimized out>) at reorderbuffer.c:2791
#27 0x000055ed957fad92 in DecodeCommit (ctx=0x55edaafb5460, buf=0x7ffd8b040500, parsed=0x7ffd8b040370, xid=794, two_phase=false) at decode.c:746
#28 xact_decode (ctx=0x55edaafb5460, buf=0x7ffd8b040500) at decode.c:242
#29 0x000055ed957fa721 in LogicalDecodingProcessRecord (ctx=0x55edaafb5460, record=0x55edaafb5838) at decode.c:116
#30 0x000055ed95825c12 in XLogSendLogical () at walsender.c:3445
#31 0x000055ed95828526 in WalSndLoop (send_data=send_data(at)entry=0x55ed95825bd0 <XLogSendLogical>) at walsender.c:2835
#32 0x000055ed958295bc in StartLogicalReplication (cmd=<optimized out>) at walsender.c:1525
#33 exec_replication_command (
cmd_string=cmd_string(at)entry=0x55edaaed62d0 "START_REPLICATION SLOT \"pgl_postgres_test_provider_test_sube55bf37\" LOGICAL 0/2341A18 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', \"binary.want_i"...) at walsender.c:2160
#34 0x000055ed958811f4 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4763
...
(gdb) p ActiveSnapshot->as_snap->active_count
$1 = 3
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-19 00:20:23 | Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations |
Previous Message | Matheus Alcantara | 2025-08-19 00:14:15 | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |