Re: pg_checksums (or checksums in general) vs tableam

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_checksums (or checksums in general) vs tableam
Date: 2019-07-11 09:17:02
Message-ID: CABUevEyvObKHt7JBPL6qT=xN+ycg=S2U85Ug4xUuB+9rjHXV4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 11, 2019 at 2:30 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Wed, Jul 10, 2019 at 09:19:03AM -0700, Andres Freund wrote:
> > On July 10, 2019 9:12:18 AM PDT, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> >> That would be fine, if we actually knew. Should we (or have we already?)
> >> defined a rule that they are not allowed to use the same naming standard
> >> unless they have the same type of header?
> >
> > No, don't think we have already. There's the related problem of
> > what to include in base backups, too.
>
> Yes. This one needs a careful design and I am not sure exactly what
> that would be. At least one new callback would be needed, called from
> basebackup.c to decide if a given file should be backed up or not
> based on a path.

That wouldn't be at all enough, of course. We have to think of everybody
who uses the pg_start_backup/pg_stop_backup functions (including the
deprecated versions we don't want to get rid of :P). So whatever it is it
has to be externally reachable. And just calling something before you start
your backup won't be enough, as there can be files showing up during the
backup etc.

Having a strict naming standard would help a lot with that, then you'd just
need the metadata. For example, one could say that each non-default storage
engine has to put all their files in a subdirectory, and inside that
subdirectory they can name them whatever they want. If we do that, then all
a backup tool would need to know about is all the possible subdirectories
in the current installation (and *that* doesn't change frequently).

> But then how do you make sure that a path applies to
> one table AM or another, by using a regex given by all table AMs to
> see if there is a match? How do we handle conflicts? I am not sure
> either that it is a good design to restrict table AMs to have a given
> format for paths as that actually limits the possibilities when it
> comes to split across data across multiple files for attributes and/or
> tablespaces. (I am a pessimistic guy by nature.)
>

As long as the restriction contains enough wildcards, it should hopefully
be enough :) E.g. data/base/1234/zheap/whatever.they.like.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Binguo Bao 2019-07-11 09:23:24 Re: [proposal] de-TOAST'ing using a iterator
Previous Message Sergei Kornilov 2019-07-11 09:08:20 Re: pg_stat_statements vs. SELECT FOR UPDATE