Re: selectivity function

From: Greg Hennessy <greg(dot)hennessy(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: selectivity function
Date: 2022-05-27 16:04:14
Message-ID: e23e9771-463f-c189-8502-f2c3c748f8fb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Can you do anything useful with attaching selectivity estimates
> to the functions it references, instead?
I may have been doing down a bad path before. The function I'm
working to improve has five argument, the last being "degrees", which
is the match radius. Obviously a larger match radius  should cause more
matches.

For a small value of a match radius (0.005 degrees):

q3c_test=# explain (analyze, buffers) select * from test as a, test1 as
b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=5 width=32) (actual
time=7.799..10758.566 rows=31 loops=1)
  Buffers: shared hit=8005684
  ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16)
(actual time=0.008..215.570 rows=1000000 loops=1)
        Buffers: shared hit=5406
  ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250
width=16) (actual time=0.009..0.009 rows=0 loops=1000000)

(note: I deleted some of the output, since I think I'm keeping the
important bits)

So, the cost of the query is calculated as 2e10, where it expect five rows,
found 31, and a hot cache of reading 8 million units of disk space, I'd have
to check the fine manual to remind myself of the units of that.

When I do the same sort of query on a much larger match radius (5 deg) I
get:
q3c_test=# explain (analyze, buffers) select * from test as a, test1 as
b where q3c_join(a.ra,a.dec,b.ra,b.dec,5);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=4766288 width=32) (actual
time=0.086..254995.691 rows=38051626 loops=1)
  Buffers: shared hit=104977026
  ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16)
(actual time=0.008..261.425 rows=1000000 loops=1)
        Buffers: shared hit=5406
  ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250
width=16) (actual time=0.053..0.247 rows=38 loops=1000000)

The "total cost" is the same identical 2e10, this time the number of
rows expectd
is 4.7 million, the number of rows delivered is 38 million (so the
calculation is off
by a factor of 8, I'm not sure that is important), but the io is now 104
million units.
So while we are doing a lot more IO, and dealing with a lot more rows, the
calculated cost is identical. That seems strange me me. Is that a normal
thing?
Is it possible that the cost calculation isn't including the selectivity
calculation?

Greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-05-27 16:15:18 Re: Support logical replication of DDLs
Previous Message Andres Freund 2022-05-27 15:55:02 Re: suboverflowed subtransactions concurrency performance optimize