Re: Implement missing join selectivity estimation for range types

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, LUO Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Implement missing join selectivity estimation for range types
Date: 2024-01-05 10:37:17
Message-ID: CALDaNm29NMhpPvoM49qqfYzVcRJq0CPTd-xSUcTF7RZQO2TASQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 21 Nov 2023 at 01:47, Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be> wrote:
>
> On 14/11/2023 20:46, Tom Lane wrote:
> > I took a brief look through this very interesting work. I concur
> > with Tomas that it feels a little odd that range join selectivity
> > would become smarter than scalar inequality join selectivity, and
> > that we really ought to prioritize applying these methods to that
> > case. Still, that's a poor reason to not take the patch.
>
> Indeed, we started with ranges as this was the simpler case (no MCV) and
> was the topic of a course project.
> The idea is to later write a second patch that applies these ideas to
> scalar inequality while also handling MCV's correctly.
>
> > I also agree with the upthread criticism that having two identical
> > functions in different source files will be a maintenance nightmare.
> > Don't do it. When and if there's a reason for the behavior to
> > diverge between the range and multirange cases, it'd likely be
> > better to handle that by passing in a flag to say what to do.
>
> The duplication is indeed not ideal. However, there are already 8 other
> duplicate functions between the two files.
> I would thus suggest to leave the duplication in this patch and create a
> second one that removes all duplication from the two files, instead of
> just removing the duplication for our new function.
> What are your thoughts on this? If we do this, should the function
> definitions go in rangetypes.h or should we create a new
> rangetypes_selfuncs.h header?
>
> > But my real unhappiness with the patch as-submitted is the test cases,
> > which require rowcount estimates to be reproduced exactly.
>
> > We need a more forgiving test method. Usually the
> > approach is to set up a test case where the improved accuracy of
> > the estimate changes the planner's choice of plan compared to what
> > you got before, since that will normally not be too prone to change
> > from variations of a percent or two in the estimates.
>
> I have changed the test method to produce query plans for a 3-way range
> join.
> The plans for the different operators differ due to the computed
> selectivity estimation, which was not the case before this patch.

One of the tests was aborted at [1], kindly post an updated patch for the same:
[04:55:42.797] src/tools/ci/cores_backtrace.sh linux /tmp/cores
[04:56:03.640] dumping /tmp/cores/postgres-6-24094.core for
/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/postgres

[04:57:24.199] Core was generated by `postgres: old_node: postgres
regression [local] EXPLAIN '.
[04:57:24.199] Program terminated with signal SIGABRT, Aborted.
[04:57:24.199] #0 __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
[04:57:24.199] Download failed: Invalid argument. Continuing without
source file ./signal/../sysdeps/unix/sysv/linux/raise.c.
[04:57:26.803]
[04:57:26.803] Thread 1 (Thread 0x7f121d9ec380 (LWP 24094)):
[04:57:26.803] #0 __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
[04:57:26.803] set = {__val = {4194304, 0, 4636737291354636288,
4636737291354636288, 0, 0, 64, 64, 128, 128, 192, 192, 256, 256, 0,
0}}
[04:57:26.803] pid = <optimized out>
[04:57:26.803] tid = <optimized out>
[04:57:26.803] ret = <optimized out>
[04:57:26.803] #1 0x00007f122003d537 in __GI_abort () at abort.c:79
...
...
[04:57:38.774] #6 0x00007f357ad95788 in __asan::__asan_report_load1
(addr=addr(at)entry=107477261711120) at
../../../../src/libsanitizer/asan/asan_rtl.cpp:117
[04:57:38.774] bp = 140731433585840
[04:57:38.774] pc = <optimized out>
[04:57:38.774] local_stack = 139867680821632
[04:57:38.774] sp = 140731433585832
[04:57:38.774] #7 0x000055d5b155c37c in range_cmp_bound_values
(typcache=typcache(at)entry=0x629000030b60, b1=b1(at)entry=0x61c000017708,
b2=b2(at)entry=0x61c0000188b8) at rangetypes.c:2090
[04:57:38.774] No locals.
[04:57:38.774] #8 0x000055d5b1567bb2 in calc_hist_join_selectivity
(typcache=typcache(at)entry=0x629000030b60,
hist1=hist1(at)entry=0x61c0000188b8, nhist1=nhist1(at)entry=101,
hist2=hist2(at)entry=0x61c0000170b8, nhist2=nhist2(at)entry=101) at
rangetypes_selfuncs.c:1298
[04:57:38.774] i = 0
[04:57:38.774] j = 101
[04:57:38.774] selectivity = <optimized out>
[04:57:38.774] cur_sel1 = <optimized out>
[04:57:38.774] cur_sel2 = <optimized out>
[04:57:38.774] prev_sel1 = <optimized out>
[04:57:38.774] prev_sel2 = <optimized out>
[04:57:38.774] cur_sync = {val = <optimized out>, infinite =
<optimized out>, inclusive = <optimized out>, lower = <optimized out>}
[04:57:38.774] #9 0x000055d5b1569190 in rangejoinsel
(fcinfo=<optimized out>) at rangetypes_selfuncs.c:1495

[1] - https://cirrus-ci.com/task/5507789477380096

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-01-05 10:41:59 Re: [17] CREATE SUBSCRIPTION ... SERVER
Previous Message vignesh C 2024-01-05 10:32:24 Re: Make mesage at end-of-recovery less scary.