Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-07-22 15:44:06
Message-ID: CA+TgmoZKn13DzP=p=FmY=L9CcxVeOdnhib5qQ2ZJnCUPOn+KQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 6, 2015 at 8:49 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> I ran some performance tests on a 16 core machine with large shared
> buffers, so there is no IO involved.
> With the default value of cpu_tuple_comm_cost, parallel plan is not
> getting generated even if we are selecting 100K records from 40
> million records. So I changed the value to '0' and collected the
> performance readings.
>
> Here are the performance numbers:
>
> selectivity(millions) Seq scan(ms) Parallel scan
> 2 workers
> 4 workers 8 workers
> 0.1 11498.93 4821.40
> 3305.84 3291.90
> 0.4 10942.98 4967.46
> 3338.58 3374.00
> 0.8 11619.44 5189.61
> 3543.86 3534.40
> 1.5 12585.51 5718.07
> 4162.71 2994.90
> 2.7 14725.66 8346.96
> 10429.05 8049.11
> 5.4 18719.00 20212.33 21815.19
> 19026.99
> 7.2 21955.79 28570.74 28217.60
> 27042.27
>
> The average table row size is around 500 bytes and query selection
> column width is around 36 bytes.
> when the query selectivity goes more than 10% of total table records,
> the parallel scan performance is dropping.

Thanks for doing this testing. I think that is quite valuable. I am
not too concerned about the fact that queries where more than 10% of
records are selected do not speed up. Obviously, it would be nice to
improve that, but I think that can be left as an area for future
improvement.

One thing I noticed that is a bit dismaying is that we don't get a lot
of benefit from having more workers. Look at the 0.1 data. At 2
workers, if we scaled perfectly, we would be 3x faster (since the
master can do work too), but we are actually 2.4x faster. Each
process is on the average 80% efficient. That's respectable. At 4
workers, we would be 5x faster with perfect scaling; here we are 3.5x
faster. So the third and fourth worker were about 50% efficient.
Hmm, not as good. But then going up to 8 workers bought us basically
nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-07-22 16:11:41 Re: optimizing vacuum truncation scans
Previous Message Jim Nasby 2015-07-22 15:28:56 Re: [PROPOSAL] VACUUM Progress Checker.