Re: BUG #16241: Degraded hash join performance

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Butz <tbutz(at)optitool(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16241: Degraded hash join performance
Date: 2020-02-04 13:44:08
Message-ID: 20200204134408.oing4ypvshgr5ykx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2020-02-04 10:02:45 +0100, Thomas Butz wrote:
> I've executed the query twice and attached the generated reports.

Interesting! The no-children one clearly shows that a lot of the the
time is spent evaluating regular expressions (there's other regex
functions in the profile too):

23.36% postgres postgres [.] subcolor

and that this is inside a stack of plpgsql functions:

| | |--18.95%--exec_stmt_block
| | | exec_stmt
| | | plpgsql_exec_function
| | | plpgsql_call_handler
| | | ExecInterpExpr
| | | ExecEvalExpr (inlined)
| | | exec_eval_simple_expr
| | | (inlined)
[...]
| | | plpgsql_exec_function
| | | plpgsql_call_handler
| | | ExecInterpExpr
| | | ExecEvalExpr (inlined)
| | | exec_eval_simple_expr (inlined)
[...]
| | | plpgsql_exec_function
| | | plpgsql_call_handler
| | | ExecInterpExpr
| | | ExecEvalExpr (inlined)
[...]

I'm not aware of any relevant regular expression evaluation changes
between 11 and 12. Tom, does this trigger anything?

Thomas, could you provide a EXPLAIN (ANALYZE, VERBOSE) for both
versions? Clearly the hash join is where the time is spent:

-> Hash Join (cost=1.000..6436.030 rows=44 width=517) (actual time=5.532..3383.213 rows=2779 loops=1)
Buffers: shared hit=9077
-> Index Scan using planet_osm_hstore_line_way_idx on planet_osm_hstore_line planet_osm_hstore_line (cost=0.550..6419.380 rows=491 width=352) (actual time=0.423..19.147 rows=2779 loops=1)
Index Cond: (planet_osm_hstore_line.way && '01030000A0110F00000100000005000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000AF78D2F4ECA63241E4EED5138D3958410000000000000000D3CF6EE3EBD13241E4EED5138D3958410000000000000000D3CF6EE3EBD132411CD92E58CD2E58410000000000000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000'::geometry)
Filter: ((((planet_osm_hstore_line.tags -> 'name'::text) IS NOT NULL) OR ((planet_osm_hstore_line.tags -> 'oneway'::text) = ANY ('{yes,-1}'::text[])) OR ((planet_osm_hstore_line.tags -> 'junction'::text) = 'roundabout'::text)) AND ((planet_osm_hstore_line.tags -> 'highway'::text) = ANY ('{motorway,motorway_link,trunk,trunk_link,primary,primary_link,secondary,secondary_link,tertiary,tertiary_link,residential,unclassified,road,service,pedestrian,raceway,living_street,construction}'::text[])))
Buffers: shared hit=9077
-> Hash (cost=0.230..0.230 rows=18 width=32) (actual time=0.029..0.030 rows=18 loops=1)
-> Values Scan (cost=0.000..0.230 rows=18 width=32) (actual time=0.003..0.017 rows=18 loops=1)

but I don't think it's the hash condition itself - there's no plpgsql
functions necessary for the join as far as I can tell. A VERBOSE EXPLAIN
might help up with knowing what the function being called actually is.

It seems quite possible that all that time is spent inside plpgsql
functions that are part of postgis. And that the postgis version
difference between 2.5 and 3.0 is the relevant part here, not the
postgres version difference. Or it's just in the application code:

The sort step shows among others the following column to be sorted on:
osml10n_get_streetname_from_tags(planet_osm_hstore_line.tags, true, false, ' - '::text, 'de'::text, planet_osm_hstore_line.way, NULL::text)) DESC
which is actually likely computed as the output of the HashJoin
node. And is the only obvious candidate for a plpgsql function
potentially taking a lot of the time.

Is that function defined exactly the same between the version? Is the
underlying dataset roughly the same size? I suggest running the query
without it, and checking how long it takes.

Regards,

Andres

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Butz 2020-02-04 14:49:10 Re: BUG #16241: Degraded hash join performance
Previous Message Thomas Butz 2020-02-04 09:02:45 Re: BUG #16241: Degraded hash join performance