Re: Suggestion for concurrent index creation using a single full scan operation

From: Tim Kane <tim(dot)kane(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for concurrent index creation using a single full scan operation
Date: 2013-07-24 13:22:44
Message-ID: CADVWZZJge1BiaBiVV1RAOKA=ygA7Uz_9wyu8WGgm0=nxNraCsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Wow.. thanks guys, really appreciate the detailed analysis.

Tim

On Wed, Jul 24, 2013 at 4:08 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote:
> > I haven't given this a lot of thought, but it struck me that when
> > rebuilding tables (be it for a restore process, or some other operational
> > activity) - there is more often than not a need to build an index or two,
> > sometimes many indexes, against the same relation.
> >
> > It strikes me that in order to build just one index, we probably need to
> > perform a full table scan (in a lot of cases). If we are building
> > multiple indexes sequentially against that same table, then we're
> probably
> > performing multiple sequential scans in succession, once for each index.
>
> Check.
>
> > Could we architect a mechanism that allowed multiple index creation
> > statements to execute concurrently, with all of their inputs fed directly
> > from a single sequential scan against the full relation?
> >
> > From a language construct point of view, this may not be trivial to
> > implement for raw/interactive SQL - but possibly this is a candidate for
> > the custom format restore?
>
> As Greg Stark mentioned, pg_restore can already issue index build commands
> in
> parallel. Where applicable, that's probably superior to having one backend
> build multiple indexes during a single heap scan. Index builds are
> CPU-intensive, and the pg_restore approach takes advantage of additional
> CPU
> cores in addition to possibly saving I/O.
>
> However, the pg_restore method is not applicable if you want CREATE INDEX
> CONCURRENTLY, and it's not applicable for implicit index building such as
> happens for ALTER TABLE rewrites and for VACUUM FULL. Backend-managed
> concurrent index builds could shine there.
>
> > I presume this would substantially increase the memory overhead required
> to
> > build those indexes, though the performance gains may be advantageous.
>
> The multi-index-build should respect maintenance_work_mem overall.
> Avoiding
> cases where that makes concurrent builds slower than sequential builds is a
> key challenge for such a project:
>
> - If the index builds each fit in maintenance_work_mem when run
> sequentially
> and some spill to disk when run concurrently, expect concurrency to lose.
> - If the heap is small enough to stay in cache from one index build to the
> next, performing the builds concurrently is probably a wash or a loss.
> - Concurrency should help when a wide-row table large enough to exhaust OS
> cache has narrow indexes that all fit in maintenance_work_mem. I don't
> know
> whether concurrency would help for a huge-table scenario where the
> indexes
> do overspill maintenance_work_mem. You would have N indexes worth of
> external merge files competing for disk bandwidth; that could cancel out
> heap I/O savings.
>
> Overall, it's easy to end up with a loss. We could punt by having an
> index_build_concurrency GUC, much like pg_restore relies on the user to
> discover a good "-j" value. But if finding cases where concurrency helps
> is
> too hard, leaving the GUC at one would become the standard advice.
>
> > Apologies in advance if this is not the correct forum for suggestions..
>
> It's the right forum.
>
> Thanks,
> nm
>
> --
> Noah Misch
> EnterpriseDB http://www.enterprisedb.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-07-24 13:54:48 Re: [bug fix] PITR corrupts the database cluster
Previous Message Heikki Linnakangas 2013-07-24 13:15:59 Re: [bug fix] PITR corrupts the database cluster