Re: Parallel Seq Scan

From: Thom Brown <thom(at)linux(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-02 10:39:29
Message-ID: CAA-aLv52KyeSzV+25QsAoCw-CeQMUOFevgCWcUcXodytahYYhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 January 2015 at 10:34, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> > Running it again, I get the same issue. This is with
> parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.
> >
> > This doesn't happen if I set the pgbench scale to 50. I suspect this is
> a OOM issue. My laptop has 16GB RAM, the table is around 13GB at scale
> 100, and I don't have swap enabled. But I'm concerned it crashes the whole
> instance.
> >
>
> Isn't this a backend crash due to OOM?
> And after that server will restart automatically.
>

Yes, I'm fairly sure it is. I guess what I'm confused about is that 8
parallel sequential scans in separate sessions (1 per session) don't cause
the server to crash, but in a single session (8 in 1 session), they do.

>
> > I also notice that requesting BUFFERS in a parallel EXPLAIN output
> yields no such information.
> > --
>
> Yeah and the reason for same is that all the work done related
> to BUFFERS is done by backend workers, master backend
> doesn't read any pages, so it is not able to accumulate this
> information.
>
> > Is that not possible to report?
>
> It is not impossible to report such information, we can develop some
> way to share such information between master backend and workers.
> I think we can do this if required once the patch is more stablized.
>

Ah great, as I think losing such information to this feature would be
unfortunate.

Will there be a GUC to influence parallel scan cost? Or does it take into
account effective_io_concurrency in the costs?

And will the planner be able to decide whether or not it'll choose to use
background workers or not? For example:

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=89584.00..89584.05 rows=5 width=4) (actual
time=228.222..228.224 rows=5 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
Buffers: shared hit=83334
-> Seq Scan on public.pgbench_accounts (cost=0.00..88334.00
rows=500000 width=4) (actual time=0.008..136.522 rows=500000 loops=1)
Output: bid
Buffers: shared hit=83334
Planning time: 0.071 ms
Execution time: 228.265 ms
(9 rows)

This is a quick plan, but if we tell it that it's allowed 8 background
workers:

# set parallel_seqscan_degree = 8;
SET
Time: 0.187 ms

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12291.75..12291.80 rows=5 width=4) (actual
time=603.042..603.042 rows=1 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
-> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..11041.75
rows=500000 width=4) (actual time=2.445..529.284 rows=500000 loops=1)
Output: bid
Number of Workers: 8
Number of Blocks Per Workers: 10416
Planning time: 0.049 ms
Execution time: 663.103 ms
(9 rows)

Time: 663.437 ms

It's significantly slower. I'd hope the planner would anticipate this and
decide, "I'm just gonna perform a single scan in this instance as it'll be
a lot quicker for this simple case." So at the moment
parallel_seqscan_degree seems to mean "You *must* use this many threads if
you can parallelise." Ideally we'd be saying "can use up to if necessary".

Thanks

Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-01-02 11:13:54 Re: Parallel Seq Scan
Previous Message Amit Kapila 2015-01-02 10:36:23 Re: Parallel Seq Scan