Re: Use extended statistics to estimate (Var op Var) clauses

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use extended statistics to estimate (Var op Var) clauses
Date: 2021-12-22 00:28:25
Message-ID: 4DC32175-8CC0-438A-B602-57BBAD7F11DC@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 12, 2021, at 6:21 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> <0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patch>

+ * It it's (variable = variable) with the same variable on both sides, it's

s/It it's/If it's/

0001 lacks regression coverage.

> <0002-simplification.patch>

Changing comments introduced by patch 0001 in patch 0002 just creates git churn:

- * estimate the selectivity as 1.0 (or 0.0 if it's negated).
+ * estimate the selectivity as 1.0 (or 0.0 when it's negated).

and:

* matching_restriction_variable
- * Examine the args of a restriction clause to see if it's of the
- * form (variable op variable) with the same variable on both sides.
+ * Check if the two arguments of a restriction clause refer to the same
+ * variable, i.e. if the condition is of the form (variable op variable).
+ * We can deduce selectivity for such (in)equality clauses.

0002 also lacks regression coverage.

> <0003-relax-the-restrictions.patch>

0003 also lacks regression coverage.

> <0004-main-patch.patch>

Ok.

> <0005-Don-t-treat-Var-op-Var-as-simple-clauses.patch>

0005 again lacks regression coverage.

There might be a problem in how selectivity thinks about comparison between identical columns from the NEW and OLD pseudotables. To show this, add an Assert to see where matching_restriction_variables() might return true:

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4952,6 +4952,8 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
return false;

/* The two variables need to match */
+ Assert(!equal(left, right));
+
return equal(left, right);

This results in the regression tests failing on "update rtest_emp set ename = 'wiecx' where ename = 'wiecc';". It may seem counterintuitive that matching_restriction_variables() would return true for a where-clause with only one occurrence of variable "ename", until you read the rule defined in rules.sql:

create rule rtest_emp_upd as on update to rtest_emp where new.salary != old.salary do
insert into rtest_emplog values (new.ename, current_user,
'honored', new.salary, old.salary);

I think what's really happening here is that "new.salary != old.salary" is being processed by matching_restriction_variables() and returning that the new.salary refers to the same thing that old.salary refers to.

Here is the full stack trace, for reference:

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
frame #0: 0x00007fff6d8fd33a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff6d9b9e60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x00007fff6d884808 libsystem_c.dylib`abort + 120
frame #3: 0x00000001048a6c31 postgres`ExceptionalCondition(conditionName="!equal(left, right)", errorType="FailedAssertion", fileName="selfuncs.c", lineNumber=4955) at assert.c:69:2
frame #4: 0x000000010481e733 postgres`matching_restriction_variables(root=0x00007fe65e02b2d0, args=0x00007fe65e02bf38, varRelid=0) at selfuncs.c:4955:2
frame #5: 0x000000010480f63c postgres`eqsel_internal(fcinfo=0x00007ffeebb9aeb8, negate=true) at selfuncs.c:265:6
frame #6: 0x000000010481040a postgres`neqsel(fcinfo=0x00007ffeebb9aeb8) at selfuncs.c:565:2
frame #7: 0x00000001048b420c postgres`FunctionCall4Coll(flinfo=0x00007ffeebb9af38, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1212:11
frame #8: 0x00000001048b50e6 postgres`OidFunctionCall4Coll(functionId=102, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1448:9
* frame #9: 0x0000000104557e4d postgres`restriction_selectivity(root=0x00007fe65e02b2d0, operatorid=901, args=0x00007fe65e02bf38, inputcollid=0, varRelid=0) at plancat.c:1828:26
frame #10: 0x00000001044d4c76 postgres`clause_selectivity_ext(root=0x00007fe65e02b2d0, clause=0x00007fe65e02bfe8, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000, use_extended_stats=true) at clausesel.c:902:10
frame #11: 0x00000001044d4186 postgres`clauselist_selectivity_ext(root=0x00007fe65e02b2d0, clauses=0x00007fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000, use_extended_stats=true) at clausesel.c:185:8
frame #12: 0x00000001044d3f97 postgres`clauselist_selectivity(root=0x00007fe65e02b2d0, clauses=0x00007fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000) at clausesel.c:108:9
frame #13: 0x00000001044de192 postgres`set_baserel_size_estimates(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640) at costsize.c:4931:3
frame #14: 0x00000001044d16be postgres`set_plain_rel_size(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640, rte=0x00007fe65e02ac40) at allpaths.c:584:2
frame #15: 0x00000001044d0a79 postgres`set_rel_size(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640, rti=1, rte=0x00007fe65e02ac40) at allpaths.c:413:6
frame #16: 0x00000001044cd990 postgres`set_base_rel_sizes(root=0x00007fe65e02b2d0) at allpaths.c:324:3
frame #17: 0x00000001044cd660 postgres`make_one_rel(root=0x00007fe65e02b2d0, joinlist=0x00007fe65e02ccd8) at allpaths.c:186:2
frame #18: 0x0000000104511b3b postgres`query_planner(root=0x00007fe65e02b2d0, qp_callback=(postgres`standard_qp_callback at planner.c:3071), qp_extra=0x00007ffeebb9b648) at planmain.c:276:14
frame #19: 0x0000000104514dfc postgres`grouping_planner(root=0x00007fe65e02b2d0, tuple_fraction=0) at planner.c:1448:17
frame #20: 0x00000001045138fb postgres`subquery_planner(glob=0x00007fe66e010cb0, parse=0x00007fe66e0107a8, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:1025:2
frame #21: 0x0000000104511faa postgres`standard_planner(parse=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at planner.c:406:9
frame #22: 0x0000000104511d3a postgres`planner(parse=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at planner.c:277:12
frame #23: 0x00000001046957fb postgres`pg_plan_query(querytree=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at postgres.c:848:9
frame #24: 0x00000001046959d2 postgres`pg_plan_queries(querytrees=0x00007fe65e02b170, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at postgres.c:940:11
frame #25: 0x0000000104698d0d postgres`exec_simple_query(query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';") at postgres.c:1134:19
frame #26: 0x0000000104697f4c postgres`PostgresMain(dbname="regression", username="mark.dilger") at postgres.c:4498:7
frame #27: 0x000000010459b859 postgres`BackendRun(port=0x00007fe65dc043e0) at postmaster.c:4594:2
frame #28: 0x000000010459adba postgres`BackendStartup(port=0x00007fe65dc043e0) at postmaster.c:4322:3
frame #29: 0x0000000104599ade postgres`ServerLoop at postmaster.c:1802:7
frame #30: 0x0000000104597206 postgres`PostmasterMain(argc=8, argv=0x00007fe66dc06300) at postmaster.c:1474:11
frame #31: 0x000000010446be4f postgres`main(argc=8, argv=0x00007fe66dc06300) at main.c:198:3
frame #32: 0x00007fff6d7b5cc9 libdyld.dylib`start + 1

In frame 9, operatorid=901. Checking pg_operator.dat, that's:

{ oid => '901', descr => 'not equal',
oprname => '<>', oprleft => 'money', oprright => 'money', oprresult => 'bool',
oprcom => '<>(money,money)', oprnegate => '=(money,money)',
oprcode => 'cash_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },

Looking at frame 10, restrict_selectivity is being called with clause={OPEXPR :opno 901 :opfuncid 889 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 2 :vartype 790 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 3 :varattnosyn 2 :location -1} {VAR :varno 1 :varattno 2 :vartype 790 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location -1}) :location -1}

Maybe there is some reason this is ok. Any thoughts about it?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-12-22 00:38:27 Re: parallel vacuum comments
Previous Message Peter Smith 2021-12-22 00:05:40 Re: row filtering for logical replication