Re: Displaying and dumping of table access methods

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Displaying and dumping of table access methods
Date: 2019-01-08 02:08:58
Message-ID: 20190108020858.GE2528@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > > > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > > > Over in [1] we're discussing the development of the pluggable storage
> > > > > patchset, which allows different ways of storing table data.
> > > > >
> > > > > One thing I'd like to discuss with a wider audience than the
> > > > > implementation details is psql and pg_dump handling of table access
> > > > > methods.
> > > > >
> > > > > Currently the patchset neither dumps nor displays table access
> > > > > methods . That's clearly not right.
> > > >
> > > > Agreed.
> > > >
> > > > > The reason for that however is not that it's hard to dump/display
> > > > > code-wise, but that to me the correct behaviour is not obvious.
> > > >
> > > > While it might be a lot of changes to the regression output results, I
> > > > tend to feel that showng the access method is a very important aspect of
> > > > the relation and therefore should be in \d output.
> > >
> > > I don't see how that'd be feasible. Imo it is/was absolutely crucial
> > > for zheap development to be able to use the existing postgres tests.
> >
> > I don't agree with the general assumption that "we did this for
> > development and therefore it should be done that way forever".
> >
> > Instead, I would look at adding tests where there's a difference
> > between the two, or possibly some difference, and make sure that there
> > isn't, and make sure that the code paths are covered.
>
> I think this approach makes no sense whatsover. It's entirely possible
> to encounter bugs in table AM relevant code in places one would not
> think so. But even if one were, foolishly, to exclude those, the pieces
> of code we know are highly affected by the way the AM works are a a
> significant (at the very least 10-20% of tests) percentage of our
> tests. Duplicating them even just between heap and zheap would be a
> major technical debt. DML, ON CONFLICT, just about all isolationtester
> test, etc all are absolutely crucial to test different AMs for
> correctness.

I am generally on board with minimizing the amount of duplicate code
that we have, but we must run those tests independently, so it's really
a question of if we build a system where we can parametize a set of
tests to run and then run them for every AM and compare the output to
either the generalized output or the per-AM output, or if we build on
the existing system and simply have an independent set of tests. It's
not clear to me, at the current point, which will be the lower level of
ongoing effort, but when it comes to the effort required today, it seems
pretty clear to me that whacking around the current test environment to
rerun tests is a larger amount of effort. If that's the requirement,
then so be it and I'm on-board, but I'm also open to considering a
lesser requirement for a completely new feature.

> > > To me the approach you're suggesting is going to lead to an explosion of
> > > redundant tests, which are really hard to maintain, especially for
> > > out-of-tree AMs. Out of tree AMs with the setup I propose can just
> > > install the AM into the template database and set PGOPTIONS, and they
> > > have pretty good coverage.
> >
> > I'm frankly much less interested in out-of-tree AMs as we aren't going
> > to have in-core regression tests for them anyway, because we can't as
> > they aren't in our tree and, ultimately, I don't find them to have
> > anywhere near the same value that in-core AMs have.
>
> I think you must be missing my point: Adding spurious differences due to
> "Table Access Method: heap" vs "Table Access Method: blarg" makes it
> unnecessarily hard to reuse the in-core tests for any additional AM, be
> it in-core or out of core. I fail to see what us not having explicit
> tests for such AMs in core has to do with my point.

I don't think I'm missing your point. If you believe that we should be
swayed by this argument into agreeing to change what we believe the
user-facing psql \d output should be, then I am very hopeful that you're
completely wrong. The psql \d output should be driven by what will be
best for our users, not by what's best by external AMs or, really,
anything else.

> Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> better from a maintainability POV than including the AM in the output.

I'm pretty sure I said in my last reply that I'm alright with psql and
pg_dump not outputting a result for the default value, provided the
default is understood to always really be the default, but, again, what
we should be concerned about here is what the end user is dealing with
and I'm not particularly incensed to support something different even if
it's around a variable of some kind for external AMs, or external
*whatevers*.

I'm also a bit confused as to why we are spending a good bit of time
arguing about external AMs without any discussion or definition of what
they are or what their requirements are. If such things seriously
exist, then let us talk about them and try to come up with solutions for
them; if they don't, then we can talk about them when they come up.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-01-08 02:10:23 Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
Previous Message Michael Paquier 2019-01-08 02:03:00 Re: A few new options for vacuumdb