Re: very slow queries when max_parallel_workers_per_gather is higher than zero

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Guilherme Pereira <guilherme(at)guilherme-pereira(dot)com>, Guilherme Pereira <guilherme(dot)pereira(at)gooddata(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Pavel Stěhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: very slow queries when max_parallel_workers_per_gather is higher than zero
Date: 2018-04-16 19:58:19
Message-ID: 8f788568-cb24-9eaf-6772-3d6a7ce5e02b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks. So we now have a trivial query demonstrating the issue. IMHO
this is not really a costing issue, but due to a misestimate.

Essentially, the problem is that the two sides of the join mismatch,
causing this:

-> Bitmap Heap Scan on dwh_dm ... d (... rows=7 width=4) (...)

-> Bitmap Heap Scan on f_ticketupdate_aad5jtwal0ayaax f
(cost=1633.90..214617.67 rows=87472 width=4)
(actual time=0.003..0.003 rows=0 loops=7)
Recheck Cond: (dt_event_id = d.id)

-> Bitmap Index Scan on f_ticketupdate_aad5jtwal0ayaa ...
(cost=0.00..1612.03 rows=87472 width=0)
(actual time=0.003..0.003 rows=0 loops=7)
Index Cond: (dt_event_id = d.id)

I.e. the database believes the bitmap index scan will match 87k rows.
But in fact it matches 0, which makes the bitmap heap scan entirely
unnecessary (thus costing nothing, because it's skipped).

Of course, the parallel plan is structured slightly differently, and
does not allow this skipping because it places the f_ table on the outer
side of the join (and scans it using sequential scan).

Now, try changing the parameters (particularly id_euweek) so that the
bitmap index scan actually matches something. I'm pretty sure that will
make the non-parallel case much more expensive.

Increasing the parallel_setup_cost makes the parallel plan a bit more
expensive, enough to switch to the non-parallel plan. But that's mostly
a fluke and not particularly principled way to fix this - if the cost
difference gets a bit larger (or if you increase the number of parallel
workers) it's probably going to use the parallel plan again.

Obviously, PostgreSQL 9.5 doesn't have parallel queries, so it does not
have a chance of making this mistake.

regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-16 20:51:33 Re: Instability in partition_prune test?
Previous Message Tomas Vondra 2018-04-16 19:31:08 Re: Standby corruption after master is restarted