Re: System column support for partitioned tables using heap

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: System column support for partitioned tables using heap
Date: 2022-07-19 08:43:50
Message-ID: CAKqnccghtbeCv+VszPyih-AgcaQaZ_1g_BC45WuCXKhxZ2yXGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.

Yeah, I figured it was niche when I saw so little mention of the issue.

My reason for xmax() in the result is to break down the affected rows count
into an insert count, and a modified estimate. Not super critical, but
helpful. I've built out some simple custom logging table in out system for
this kind of detail, and folks have been wanting to break down rows
submitted, rows inserted, and rows updated a bit better. Rows submitted is
easy and rows inserted is too...update is an estimate as I'm not using
anything fancy with xmax() to sort out what exactly happened.

For clarification, we're not using an ORM, and may need to support
straggling clients, so our push cycle works like this:

* Create a view with the fields expected in the insert. I figured I'd use
CREATE VIEW instead of CREATE TYPE as then I can quickly check out the
details against live data, and I still get a custom compound type.

* Write a function that accepts an array of view_name_type. I *love* Postgres'
typing system, It has spoiled me forever. Can't submit badly formatted
objects from the client, they're rejected automatically.

* Write a client-side routine to package data as an array and push it into
the insert handling function. The function unnests the array, and then the
actual insert code draws from the unpacked values. If I need to extend the
table, I can add a new function that knows about the revised fields, and
revise (when necessary) earlier supported formats to map to new
types/columns/defaults.

There are few CTEs in there, including one that does the main insert and
returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <>
0 (not an insert).

> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?

You should never feel bad about anything ;-) You and others on that thread
contribute so much that I'm getting value out of.

I had it in mind that it would be nice to have some kind of
catalog/abstraction that would make it possible to interrogate what system
columns are available on a table/partition based on access method. In my
vague notion, that might make some of the other ideas from that thread,
such as index-oriented stores with quite different physical layouts, easier
to implement. But, it's all free when you aren't the one who can write the
code.

I've switched the partition-based tables back to returning * on the insert
CTE, and then aggregating that to add to a log table and the client result.
It's fine. A rich result summary would be very nice. As in rows
added/modified/deleted on whatever table(s). If anyone ever decides to
implement such a structure for MERGE, it would be nice to see it
retrofitted to the other data modification commands where RETURNING works.

On Tue, Jul 19, 2022 at 6:13 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
> wrote:
> > This fails on a partitioned table because xmax() may not exist. In fact,
> it does exist in all of those tables, but the system doesn't know how to
> guarantee that. I know which tables are partitioned, and can downgrade the
> result on partitioned tables to the count(*) I've been using to date. But
> now I'm wondering if working with xmax() like this is a poor idea going
> forward. I don't want to lean on a feature/behavior that's likely to
> change. For example, I noticed the other day that MERGE does not support
> RETURNING.
> >
> > I'd appreciate any insight or advice you can offer.
>
> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.
>
> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Morris de Oryx 2022-07-19 08:54:05 Re: System column support for partitioned tables using heap
Previous Message Masahiko Sawada 2022-07-19 08:31:07 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns