Re: Displaying and dumping of table access methods

From: Andres Freund <andres(at)anarazel(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Displaying and dumping of table access methods
Date: 2019-01-08 02:31:52
Message-ID: 20190108023152.updoggny2rzvpcar@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-01-07 21:08:58 -0500, Stephen Frost wrote:
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > > I don't agree with the general assumption that "we did this for
> > > development and therefore it should be done that way forever".
> > >
> > > Instead, I would look at adding tests where there's a difference
> > > between the two, or possibly some difference, and make sure that there
> > > isn't, and make sure that the code paths are covered.
> >
> > I think this approach makes no sense whatsover. It's entirely possible
> > to encounter bugs in table AM relevant code in places one would not
> > think so. But even if one were, foolishly, to exclude those, the pieces
> > of code we know are highly affected by the way the AM works are a a
> > significant (at the very least 10-20% of tests) percentage of our
> > tests. Duplicating them even just between heap and zheap would be a
> > major technical debt. DML, ON CONFLICT, just about all isolationtester
> > test, etc all are absolutely crucial to test different AMs for
> > correctness.
>
> I am generally on board with minimizing the amount of duplicate code
> that we have, but we must run those tests independently, so it's really
> a question of if we build a system where we can parametize a set of
> tests to run and then run them for every AM and compare the output to
> either the generalized output or the per-AM output, or if we build on
> the existing system and simply have an independent set of tests. It's
> not clear to me, at the current point, which will be the lower level of
> ongoing effort

Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
again with a different default AM. That's precisely why I'm talking
about this. Just setting PGOPTIONS='-c
default_table_access_method=zheap' in the new makefile target (the ms
run scripts are similar) is sufficient. And we don't need to force
everyone to constantly run tests with e.g. both heap and zheap, it's
sufficient to do so on a few buildfarm machines, and whenever changing
AM level code. Rerunning all the tests with a different AM is just
setting the same environment variable, but running check-world as the
target.

Obviously that does not preclude a few tests that explicitly test
features specific to an AM. E.g. the zheap branch's tests have an
explicit zheap regression file and a few zheap specific isolationtester
spec files that a) test zheap specific beheaviour b) make sure that the
most basic zheap behaviour is tested even when not running the tests
with zheap as the default AM.

And even if you were to successfully argue that it's sufficient during
normal development to only have a few zheap specific additional tests,
we'd certainly want to make it possible to occasionally explicitly run
the rest of the tests under zheap to see whether additional stuff has
been broken - and that's much harder to sift through if there's a lot of
spurious test failures due to \d[+] outputting additional/differing
data.

> ..., but when it comes to the effort required today, it seems pretty
> clear to me that whacking around the current test environment to rerun
> tests is a larger amount of effort.

How did you come to that conclusion? Adding a makefile and vcregress.pl
target is pretty trivial.

> > Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> > HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> > better from a maintainability POV than including the AM in the output.
>
> I'm pretty sure I said in my last reply that I'm alright with psql and
> pg_dump not outputting a result for the default value

I don't see that anywhere in your replies. Are you referring to:

> I don't have any problem with new tests being added which set the
> default AM to zheap, as long as it's clear that such is happening for
> downstream tests.

? If so, that's not addressing the reason why I'm suggesting to have
something like HIDE_TABLE_AMS. The point is that that'd allow us to
cater the default psql output to humans, while still choosing not to
display the AM for regression tests (thereby allowing to run the tests).

> provided the default is understood to always really be the default

What do you mean by that? Are you arguing that it should be impossible
in test scenarios to override default_table_access_method? Or that
pg_dump/psql should check for a hardcoded 'heap' AM (via a macro or
whatnot)?

> I'm also a bit confused as to why we are spending a good bit of time
> arguing about external AMs without any discussion or definition of what
> they are or what their requirements are. If such things seriously
> exist, then let us talk about them and try to come up with solutions for
> them; if they don't, then we can talk about them when they come up.

We are working seriously hard on making AMs pluggable. Zheap is not yet,
and won't be that soon, part of core. The concerns for an in-core zheap
(which needs to maintain the test infrastructure during the remainder of
its out-of-core development!) and out-of-core AMs are pretty aligned. I
don't get your confusion.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-01-08 03:37:50 Re: could recovery_target_timeline=latest be the default in standby mode?
Previous Message Michael Paquier 2019-01-08 02:10:23 Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table