Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-11 03:39:26
Message-ID: CA+TgmobBZ=0n=JcS28hBxVBaSXeZHBQCnxVzCTUSPMe1zsuGdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 8, 2015 at 6:42 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Are we sure that in such cases we will consume work_mem during
> execution? In cases of parallel_workers we are sure to an extent
> that if we reserve the workers then we will use it during execution.
> Nonetheless, I have proceded and integrated the parallel_seq scan
> patch with v0.3 of parallel_mode patch posted by you at below link:
> http://www.postgresql.org/message-id/CA+TgmoYmp_=XcJEhvJZt9P8drBgW-pDpjHxBhZA79+M4o-CZQA@mail.gmail.com

That depends on the costing model. It makes no sense to do a parallel
sequential scan on a small relation, because the user backend can scan
the whole thing itself faster than the workers can start up. I
suspect it may also be true that the useful amount of parallelism
increases the larger the relation gets (but maybe not).

> 2. To enable two types of shared memory queue's (error queue and
> tuple queue), we need to ensure that we switch to appropriate queue
> during communication of various messages from parallel worker
> to master backend. There are two ways to do it
> a. Save the information about error queue during startup of parallel
> worker (ParallelMain()) and then during error, set the same (switch
> to error queue in errstart() and switch back to tuple queue in
> errfinish() and errstart() in case errstart() doesn't need to
> propagate
> error).
> b. Do something similar as (a) for tuple queue in printtup or other
> place
> if any for non-error messages.
> I think approach (a) is slightly better as compare to approach (b) as
> we need to switch many times for tuple queue (for each tuple) and
> there could be multiple places where we need to do the same. For now,
> I have used approach (a) in Patch which needs some more work if we
> agree on the same.

I don't think you should be "switching" queues. The tuples should be
sent to the tuple queue, and errors and notices to the error queue.

> 3. As per current implementation of Parallel_seqscan, it needs to use
> some information from parallel.c which was not exposed, so I have
> exposed the same by moving it to parallel.h. Information that is required
> is as follows:
> ParallelWorkerNumber, FixedParallelState and shm keys -
> This is used to decide the blocks that needs to be scanned.
> We might change it in future the way parallel scan/work distribution
> is done, but I don't see any harm in exposing this information.

Hmm. I can see why ParallelWorkerNumber might need to be exposed, but
the other stuff seems like it shouldn't be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-11 03:40:58 Re: Parallel Seq Scan
Previous Message Jim Nasby 2015-01-11 01:40:23 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)