Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-12-05 05:08:00
Message-ID: CAJrrPGfJZVkZHXrK3T4KocOF1HH9GLROcwC4m4ncw0iuX_OYAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 5, 2016 at 7:44 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > I do share your concerns about unpredictable behavior - that's
> > particularly worrying for pg_restore, which may be used for time-
> > sensitive use cases (DR, migrations between versions), so unpredictable
> > changes in behavior / duration are unwelcome.
>
> Right.
>
> > But isn't this more a deficiency in pg_restore, than in CREATE INDEX?
> > The issue seems to be that the reltuples value may or may not get
> > updated, so maybe forcing ANALYZE (even very low statistics_target
> > values would do the trick, I think) would be more appropriate solution?
> > Or maybe it's time add at least some rudimentary statistics into the
> > dumps (the reltuples field seems like a good candidate).
>
> I think that there is a number of reasonable ways of looking at it. It
> might also be worthwhile to have a minimal ANALYZE performed by CREATE
> INDEX directly, iff there are no preexisting statistics (there is
> definitely going to be something pg_restore-like that we cannot fix --
> some ETL tool, for example). Perhaps, as an additional condition to
> proceeding with such an ANALYZE, it should also only happen when there
> is any chance at all of parallelism being used (but then you get into
> having to establish the relation size reliably in the absence of any
> pg_class.relpages, which isn't very appealing when there are many tiny
> indexes).
>
> In summary, I would really like it if a consensus emerged on how
> parallel CREATE INDEX should handle the ecosystem of tools like
> pg_restore, reindexdb, and so on. Personally, I'm neutral on which
> general approach should be taken. Proposals from other hackers about
> what to do here are particularly welcome.
>
>
Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-12-05 05:09:28 Re: sequence data type
Previous Message Haribabu Kommi 2016-12-05 05:06:50 Re: Parallel Index Scans