Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]
Date: 2022-11-16 13:49:59
Message-ID: CA+144275w8hxENZeaSFtG8qQE5YJY1saVyj70mXQWd-EGKiPjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

I think the discussion went a little sideways, so let me recap what I'm
suggesting:

1. I mentioned that there is a missing callback when the filenode is
unlinked and this is particularly evident when dropping a table.
2. It was correctly pointed out to me that an implementor need to ensure
that dropping a table is transactional.
3. I argued that the callback is still correct and outlined how this can
be handled by a table access method using xact callbacks, if necessary.

I see huge potential in the table access method and would like to do my
part in helping it in succeeding. I noted that the API is biased in how the
existing heap implementation works and is also very focused on
implementations of "local" storage engines. For more novel architectures
(for example, various sorts of distributed architectures) and to be easier
to work with, I think that the API can be improved in a few places. This is
a first step in the direction of making the API both easier to use as well
as enabling more novel use-cases.

Writing implementations with ease is more about having the right callbacks
in the right places and providing the right information, but in some cases
it is just not possible to implement efficient functionality with the
current interface. I think it can be useful to separate these two kinds of
enhancements when discussing the API, but I think both are important for
the table access methods to be practically usable and to leverage the full
power of this concept.

On Tue, Aug 2, 2022 at 1:44 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2021-04-05 21:57:12 +0200, Mats Kindahl wrote:
> > 2. In the storage layer, the function RelationDropStorage is called,
> > which will record the table to be dropped in the pendingDeletes
> >
> > When committing (or aborting) the transaction, there are two calls that
> are
> > interesting, in this order:
> >
> > 1. CallXactCallbacks which calls registered callbacks
> > 2. smgrDoPendingDeletes, which calls the storage layer directly to
> > perform the actual deletion, if necessary.
> >
> > Now, suppose that we want to replace the storage layer with a different
> > one. It is straightforward to replace it by implementing the Table AM
> > methods above, but we are missing a callback on dropping the table. If we
> > have that, we can record the table-to-be-dropped in a similar manner to
> how
> > the heap AM does it and register a transaction callback using
> > RegisterXactCallback.
>
> don't think implementing dropping relation data at-commit/rollback using
> xact callbacks can be correct. The dropping needs to be integrated with the
> commit / abort records, so it is redone during crash recovery - that's not
> possible with xact callbacks.
>

Yes, but this patch is about making the extension aware that a file node is
being unlinked.

> To me it still seems fundamentally the wrong direction to implement a "drop
> relation callback" tableam callback.
>

This is not really a "drop table" callback, it is just the most obvious
case where this is missing. So, just to recap the situation as it looks
right now.

Here is (transactional) truncate table:

1. Allocate a new file node in the same tablespace as the table
2. Add the file node to the list of pending node to delete
3. Overwrite the existing file node in the relation with the new one
4. Call table_relation_set_new_filenode to tell extension that there is
a new filenode for the relation

Here is drop table:

1. Add the existing file node to the list of pending deletes
2. Remove the table from the catalogs

For an extension writer, the disappearance of the old file node is
"invisible" since there is no callback about this, but it is very clear
when a new file node is allocated. In addition to being inconsistent, it
adds an extra burden on the extension writer. To notice that a file node
has been unlinked you can register a transaction handler and investigate
the pending list at commit or abort time. Even though possible, there are
two problems with this: 1) the table access method is notified "late" in
the transaction that the file node is going away, and 2) it is
unnecessarily complicated to register a transaction handler only for
inspecting this.

Telling the access method that the filenode is unlinked by adding a
callback is by far the best solution since it does not affect existing
extensions and will give the table access methods opportunities to act on
it immediately.

I have attached an updated patch that changes the names of the callbacks
since there was a name change. I had also missed the case of unlinking a
file node when tables were truncated, so I have added a callback for this
as well.

Best wishes,
Mats Kindahl

> Greetings,
>
> Andres Freund
>

Attachment Content-Type Size
0001-Add-callback-for-unlinking-file-node-of-table.patch text/x-patch 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ted Yu 2022-11-16 13:51:48 Re: closing file in adjust_data_dir
Previous Message Pavel Borisov 2022-11-16 13:35:50 Re: [PoC] configurable out of disk space elog level