From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2016-12-04 03:23:40 |
Message-ID: | 1480821820.11897.28.camel@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 2016-12-03 at 18:37 -0800, Peter Geoghegan wrote:
> On Sat, Dec 3, 2016 at 5:45 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)
> com> wrote:
> >
> > I don't think a patch must necessarily consider all possible uses
> > that
> > the new feature may have. If we introduce parallel index creation,
> > that's great; if pg_restore doesn't start using it right away,
> > that's
> > okay. You, or somebody else, can still patch it later. The patch
> > is
> > still a step forward.
> While I agree, right now pg_restore will tend to use or not use
> parallelism for CREATE INDEX more or less by accident, based on
> whether or not pg_class.reltuples has already been set by something
> else (e.g., an earlier CREATE INDEX against the same table in the
> restoration). That seems unacceptable. I haven't just suppressed the
> use of parallel CREATE INDEX within pg_restore because that would be
> taking a position on something I have a hard time defending any
> particular position on. And so, I am slightly concerned about the
> entire ecosystem of tools that could implicitly use parallel CREATE
> INDEX, with undesirable consequences. Especially pg_restore.
>
> It's not so much a hard question as it is an awkward one. I want to
> handle any possible objection about there being future compatibility
> issues with going one way or the other ("This paints us into a corner
> with..."). And, there is no existing, simple way for pg_restore and
> other tools to disable the use of parallelism due to the cost model
> automatically kicking in, while still allowing the proposed new index
> storage parameter ("parallel_workers") to force the use of
> parallelism, which seems like something that should happen. (I might
> have to add a new GUC like "enable_maintenance_paralleism", since
> "max_parallel_workers_maintenance = 0" disables parallelism no matter
> how it might be invoked).
I do share your concerns about unpredictable behavior - that's
particularly worrying for pg_restore, which may be used for time-
sensitive use cases (DR, migrations between versions), so unpredictable
changes in behavior / duration are unwelcome.
But isn't this more a deficiency in pg_restore, than in CREATE INDEX?
The issue seems to be that the reltuples value may or may not get
updated, so maybe forcing ANALYZE (even very low statistics_target
values would do the trick, I think) would be more appropriate solution?
Or maybe it's time add at least some rudimentary statistics into the
dumps (the reltuples field seems like a good candidate).
Trying to fix this by adding more GUCs seems a bit strange to me.
>
> In general, I have a positive outlook on this patch, since it appears
> to compete well with similar implementations in other systems
> scalability-wise. It does what it's supposed to do.
>
+1 to that
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Joseph Brenner | 2016-12-04 03:38:11 | Re: Select works only when connected from login postgres |
Previous Message | Peter Geoghegan | 2016-12-04 02:37:49 | Re: Parallel tuplesort (for parallel B-Tree index creation) |