From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Proposal: Conflict log history table for Logical Replication |
Date: | 2025-08-07 09:38:20 |
Message-ID: | CAFiTN-v2N256FZAMcgG9f2s7mENC2r0ME9HoKMb+tER7yxUCjQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
Thanks Shveta for your opinion on the design.
> > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > This proposal aims to address these limitations by introducing a
> > > conflict log history table, providing a structured, and queryable
> > > record of all logical replication conflicts. This should be a
> > > configurable option whether to log into the conflict log history
> > > table, server logs or both.
> > >
> >
> > +1 for the idea.
Thanks
> >
> > > This proposal has two main design questions:
> > > ===================================
> > >
> > > 1. How do we store conflicting tuples from different tables?
> > > Using a JSON column to store the row data seems like the most flexible
> > > solution, as it can accommodate different table schemas.
> >
> > Yes, that is one option. I have not looked into details myself, but
> > you can also explore 'anyarray' used in pg_statistics to store 'Column
> > data values of the appropriate kind'.
I think conversion from row to json and json to row is convenient and
also other extensions like pgactive/bdr also provide as JSON. But we
can explore this alternative options as well, thanks
> > > 2. Should this be a system table or a user table?
> > > a) System Table: Storing this in a system catalog is simple, but
> > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > is an exception, this is not what we generally do IMHO.
> > > b) User Table: This offers more flexibility. We could allow a user to
> > > specify the table name during CREATE SUBSCRIPTION. Then we choose to
> > > either create the table internally or let the user create the table
> > > with a predefined schema.
> > >
> > > A potential drawback is that a user might drop or alter the table.
> > > However, we could mitigate this risk by simply logging a WARNING if
> > > the table is configured but an insertion fails.
> >
> > I believe it makes more sense for this to be a catalog table rather
> > than a user table. I wanted to check if we already have a large
> > catalog table of this kind, and I think pg_statistic could be an
> > example of a sizable catalog table. To get a rough idea of how size
> > scales with data, I ran a quick experiment: I created 1000 tables,
> > each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> > I inserted 1000 rows into each table and ran ANALYZE to collect
> > statistics. Here’s what I observed on a fresh database before and
> > after:
> >
> > Before:
> > pg_statistic row count: 412
> > Table size: ~256 kB
> >
> > After:
> > pg_statistic row count: 6,412
> > Table size: ~5.3 MB
> >
> > Although it isn’t an exact comparison, this gives us some insight into
> > how the statistics catalog table size grows with the number of rows.
> > It doesn’t seem excessively large with 6k rows, given the fact that
> > pg_statistic itself is a complex table having many 'anyarray'-type
> > columns.
Yeah that's good analysis, apart from this pg_largeobject is also a
catalog which grows with each large object and growth rate for that
will be very high because it stores large object data in catalog.
> >
> > That said, irrespective of what we decide, it would be ideal to offer
> > users an option for automatic purging, perhaps via a retention period
> > parameter like conflict_stats_retention_period (say default to 30
> > days), or a manual purge API such as purge_conflict_stats('older than
> > date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> > stats tables, but Oracle does provide such purging options for some of
> > their statistics tables (not related to conflicts), see [1], [2].
> > And to manage it better, it could be range partitioned on timestamp.
Yeah that's an interesting suggestion to timestamp based partitioning
it for purging.
> It seems BDR also has one such conflict-log table which is a catalog
> table and is also partitioned on time. It has a default retention
> period of 30 days. See 'bdr.conflict_history' mentioned under
> 'catalogs' in [1]
>
> [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views
Actually bdr is an extension and this table is under extension
namespace (bdr.conflict_history) so this is not really a catalog but
its a extension managed table. So logically for PostgreSQL its an
user table but yeah this is created and managed by the extension.
--
Regards,
Dilip Kumar
Google
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-08-07 09:53:12 | Re: Backpatching injection point core facilities to REL_17_STABLE |
Previous Message | Tomas Vondra | 2025-08-07 09:36:10 | Re: Adding basic NUMA awareness |