Re: Parallel Seq Scan

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-07-06 17:24:52
Message-ID: 1436203492.4369.141.camel@jeff-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote:

> Or the other way to look at it could be separate out fields which are
> required for parallel scan which is done currently by forming a
> separate structure ParallelHeapScanDescData.
>
I was suggesting that you separate out both the normal scan fields and
the partial scan fields, that way we're sure that rs_nblocks is not
accessed during a parallel scan.

Or, you could try wrapping the parts of heapam.c that are affected by
parallelism into new static functions.

> The reason why partial scan can't be mixed with sync scan is that in
> parallel
> scan, it performs the scan of heap by synchronizing blocks (each
> parallel worker
> scans a block and then asks for a next block to scan) among parallel
> workers.
> Now if we try to make sync scans work along with it, the
> synchronization among
> parallel workers will go for a toss. It might not be impossible to
> make that
> work in some way, but not sure if it is important enough for sync
> scans to work
> along with parallel scan.

I haven't tested it, but I think it would still be helpful. The block
accesses are still in order even during a partial scan, so why wouldn't
it help?

You might be concerned about the reporting of a block location, which
would become more noisy with increased parallelism. But in my original
testing, sync scans weren't very sensitive to slight deviations, because
of caching effects.

> tqueue.c is mainly designed to pass tuples between parallel workers
> and currently it is used in Funnel operator to gather the tuples
> generated
> by all the parallel workers. I think we can use it for any other
> operator
> which needs tuple communication among parallel workers.

Some specifics of the Funnel operator seem to be a part of tqueue, which
doesn't make sense to me. For instance, reading from the set of queues
in a round-robin fashion is part of the Funnel algorithm, and doesn't
seem suitable for a generic tuple communication mechanism (that would
never allow order-sensitive reading, for example).

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Lelarge 2015-07-06 17:28:46 Re: Bypassing SQL lexer and parser
Previous Message Данила Поярков 2015-07-06 17:14:13 Bypassing SQL lexer and parser