Re: Performance regressions found using sqlfuzz

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: "Jung\, Jinho" <jinho(dot)jung(at)gatech(dot)edu>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-performance\(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance regressions found using sqlfuzz
Date: 2019-02-15 16:20:07
Message-ID: 87o97dkqm7.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>>>> "Jung" == Jung, Jinho <jinho(dot)jung(at)gatech(dot)edu> writes:

Jung> select distinct
Jung> ref_0.i_im_id as c0,
Jung> ref_1.ol_dist_info as c1
Jung> from
Jung> public.item as ref_0 right join
Jung> public.order_line as ref_1
Jung> on (ref_0.i_id = 5)

Jung> - Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups())

Jung> - Our analysis: We believe that this regression is related to the
Jung> new logic for estimating the number of distinct values in the
Jung> optimizer. This is affecting even queries with point lookups
Jung> (ref_0.i_id = 5) in the TPC-C benchmark.

So what's happening here is that the old plan was mis-estimating the
result, believed incorrectly that it would fit into work_mem, and
generated a hashaggregate plan accordingly; it ran fast because
hashaggregate doesn't spill to disk but silently overflows work_mem.

The new plan correctly estimates the result size, and therefore is
forbidden from generating the hashaggregate plan at the default work_mem
setting; it generates a sort plan, and the sort of course spills to disk
since work_mem is exceeded.

Had the value of work_mem been set to something appropriate for the
workload, then the query plan would not have changed.

So the problem (from an automated testing perspective) is that an actual
_improvement_ in the code is being reported as a regression.

Jung> ####### QUERY 3:

Jung> select
Jung> cast(ref_1.ol_i_id as int4) as c0
Jung> from
Jung> public.stock as ref_0
Jung> left join public.order_line as ref_1
Jung> on (ref_1.ol_number is not null)
Jung> where ref_1.ol_number is null

Jung> - Commit: 77cd477 (Enable parallel query by default.)

Jung> - Our analysis: We believe that this regression is due to
Jung> parallel queries being enabled by default. Surprisingly, we found
Jung> that even on a larger TPC-C database (scale factor of 50, roughly
Jung> 4GB), parallel scan is still slower than the non-parallel one in
Jung> the old version, when the query is not returning any tuples.

The problem here is not actually with parallel scans as such, but rather
the omission of a Materialize node in the parallel plan, and what looks
like some rather serious mis-costing of the nestloop antijoin.

Jung> ####### QUERY 4:

Jung> select
Jung> ref_0.s_dist_06 as c0
Jung> from
Jung> public.stock as ref_0
Jung> where (ref_0.s_w_id < cast(least(0, 1) as int8))

Jung> - Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans)

Jung> - Our analysis: We believe that this regression has to do with
Jung> two factors: 1) conditional expression (e.g., LEAST or NULLIF)
Jung> are not reduced to constants unlike string functions (e.g.,
Jung> CHAR_LENGTH) 2) change in the cost estimation function for bitmap
Jung> scan. Execution time grows by 3 orders of magnitude. We note that
Jung> this regression is only observed on large databases (e.g., scale
Jung> factor of 50).

Again, this is showing up because of a large database and a small
work_mem. The bitmap scan on stock only becomes lossy if the number of
rows matched in the index is very large relative to work_mem; the lack
of plan-time evaluation of LEAST means that the planner doesn't have any
good way to estimate the selectivity, so it's taking a default estimate.

Jung> ####### QUERY 1:

Jung> select
Jung> ref_0.o_d_id as c0
Jung> from
Jung> public.oorder as ref_0
Jung> where EXISTS (
Jung> select
Jung> 1
Jung> from
Jung> (select distinct
Jung> ref_0.o_entry_d as c0,
Jung> ref_1.c_credit as c1
Jung> from
Jung> public.customer as ref_1
Jung> where (false)
Jung> ) as subq_1
Jung> );

Jung> - Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two)

Jung> - Our analysis: We are not sure about the root cause of this
Jung> regression. This might have to do with grouping logic.

What this query is basically exercising is how fast one can do
ExecReScan on a DISTINCT query, without also considering the performance
effects of actually doing the grouping (the constant-false qual here
means that the grouping comparison is never actually performed). An
optimization tradeoff that speeds up comparisons within a scan at the
cost of a fixed overhead for the scan will therefore make this query
slower, but it still seems a good tradeoff to make (of course it would
be even better to make the overhead per-query rather than per-scan, and
there were other issues with this commit that should have been caught at
the time).

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2019-02-15 16:44:34 Re: Performance regressions found using sqlfuzz
Previous Message Alvaro Herrera 2019-02-15 13:39:55 Re: ERROR: unrecognized parameter "autovacuum_analyze_scale_factor"