|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Subject:||Less-silly selectivity for JSONB matching operators|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
While looking at a recent complaint about bad planning, I was
reminded that jsonb's @> and related operators use "contsel"
as their selectivity estimator. This is really bad, because
(a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators
as five times more selective than equality, which is surely pretty
There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values. That code is
actually 100% generic except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.
Hence, the attached draft patch moves that logic into a generic
function in selfuncs.c, and then invents "matchsel" and "matchjoinsel"
generic estimators that have a default estimate of twice DEFAULT_EQ_SEL.
(I'm not especially wedded to that number, but it seemed like a
reasonable starting point.)
There were a couple of other operators that seemed to be inappropriately
using contsel, so I changed all of these to use matchsel:
@>(tsquery,tsquery) | tsq_mcontains
<@(tsquery,tsquery) | tsq_mcontained
@@(text,text) | ts_match_tt
@@(text,tsquery) | ts_match_tq
-|-(anyrange,anyrange) | range_adjacent
@>(jsonb,jsonb) | jsonb_contains
?(jsonb,text) | jsonb_exists
?|(jsonb,text) | jsonb_exists_any
?&(jsonb,text) | jsonb_exists_all
<@(jsonb,jsonb) | jsonb_contained
@?(jsonb,jsonpath) | jsonb_path_exists_opr
@@(jsonb,jsonpath) | jsonb_path_match_opr
Note: you might think that we should just shove this generic logic
into contsel itself, and maybe areasel and patternsel while at it.
However, that would be pretty useless for these functions' intended
usage with the geometric operators, because we collect neither MCV
nor histogram stats for the geometric data types, making the extra
complexity worthless. Pending somebody putting some effort into
estimation for the geometric data types, I think we should just get
out of the business of having non-geometric types relying on these
This patch is not complete, because I didn't look at changing
the contrib modules, and grep says at least some of them are using
contsel for non-geometric data types. But I thought I'd put it up
for discussion at this stage.
regards, tom lane
|Next Message||Andres Freund||2020-02-27 20:26:36||Re: [HACKERS] Doubt in pgbench TPS number|
|Previous Message||Stephen Frost||2020-02-27 19:31:52||Re: [GsoC] Read/write transaction-level routing in Odyssey Project Idea|