Quick Links

Sort operation displays more tuples than it contains its subnode

From:	"a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Sort operation displays more tuples than it contains its subnode
Date:	2024-05-22 20:31:21
Message-ID:	9f4a159b-f527-465f-b82e-38b4b7df812f@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi!

I faced the issue, when the sorting node in the actual information
shows a larger number of tuples than it actually is. And I can not
understand why?

I attached the dump file with my database and run this query that
consists underestimation and it works fine.

Unfortunately, I could not find the approach how I can improve statistic
information here, so I did it in the code.

I needed to better cardinality in IndexScan and MergeJoin nodes. I
highlighted them in bold.

postgres=# set enable_hashjoin =off;
SET
postgres=# set enable_nestloop =off;
SET

explain analyze select cname, avg(degree) from course, student,score
join broke_down_course on
(score.cno=broke_down_course.cno and score.sno=broke_down_course.sno)
where score.sno = student.sno group by (cname);

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10000001523.70..10000001588.95 rows=10
width=250) (actual time=262.903..322.973 rows=10 loops=1)
   Group Key: course.cname
   -> Sort (cost=10000001523.70..10000001545.41 rows=8684 width=222)
(actual time=256.221..283.710 rows=77540 loops=1)
         Sort Key: course.cname
         Sort Method: external merge Disk: 4656kB
         -> Nested Loop (cost=10000000614.18..10000000955.58
rows=8684 width=222) (actual time=7.518..160.518 rows=77540 loops=1)
*-> Merge Join (cost=614.18..845.96 rows=868 width=4) (actual
time=7.497..126.632 rows=7754 loops=1)*
                     Merge Cond: ((score.sno = broke_down_course.sno)
AND (score.cno = broke_down_course.cno))
*-> Merge Join (cost=0.70..1297.78 rows=29155 width=16) (actual
time=0.099..99.329 rows=29998 loops=1)*
                           Merge Cond: (score.sno = student.sno)
*-> Index Scan using score_idx1 on score (cost=0.42..10125.41
rows=29998 width=12) (actual time=0.045..74.427 rows=29998 loops=1)*
                           -> Index Only Scan using student_pkey on
student (cost=0.28..89.28 rows=3000 width=4) (actual time=0.045..2.170
rows=3000 loops=1)
                                 Heap Fetches: 0
                     -> Sort (cost=613.48..632.86 rows=7754 width=8)
(actual time=7.378..9.626 rows=7754 loops=1)
                           Sort Key: broke_down_course.sno,
broke_down_course.cno
                           Sort Method: quicksort Memory: 374kB
                           -> Seq Scan on broke_down_course
(cost=0.00..112.54 rows=7754 width=8) (actual time=0.028..1.428
rows=7754 loops=1)
               -> Materialize (cost=0.00..1.15 rows=10 width=218)
(actual time=0.000..0.001 rows=10 loops=7754)
                     -> Seq Scan on course (cost=0.00..1.10 rows=10
width=218) (actual time=0.012..0.017 rows=10 loops=1)
Planning Time: 124.591 ms
Execution Time: 326.547 ms

When I run this query again I see that the Sort node shows more actual
data than it was in SeqScan (I highlighted it).

postgres=# explain analyze select cname, avg(degree) from course,
student,score join broke_down_course on
(score.cno=broke_down_course.cno and score.sno=broke_down_course.sno)
where score.sno = student.sno group by (cname);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10000001746.28..10000001811.53 rows=10
width=250) (actual time=553.428..615.028 rows=10 loops=1)
   Group Key: course.cname
   -> Sort (cost=10000001746.28..10000001767.99 rows=8684 width=222)
(actual time=546.531..574.223 rows=77540 loops=1)
         Sort Key: course.cname
         Sort Method: external merge Disk: 4656kB
         -> Merge Join (cost=10000000614.18..10000001178.16 rows=8684
width=222) (actual time=7.892..448.889 rows=77540 loops=1)
               Merge Cond: ((score.sno = broke_down_course.sno) AND
(score.cno = broke_down_course.cno))
               -> Merge Join (cost=10000000000.70..10000002146.57
rows=291550 width=234) (actual time=0.137..318.345 rows=299971 loops=1)
                     Merge Cond: (score.sno = student.sno)
                     -> Index Scan using score_idx1 on score
(cost=0.42..10125.41 rows=29998 width=12) (actual time=0.046..76.505
rows=29998 loops=1)
                     -> Materialize
(cost=10000000000.28..10000000540.41 rows=30000 width=222) (actual
time=0.082..76.345 rows=299964 loops=1)
                           -> Nested Loop
(cost=10000000000.28..10000000465.41 rows=30000 width=222) (actual
time=0.077..16.543 rows=30000 loops=1)
                                 -> Index Only Scan using student_pkey
on student (cost=0.28..89.28 rows=3000 width=4) (actual
time=0.045..2.774 rows=3000 loops=1)
                                       Heap Fetches: 0
                                 -> Materialize (cost=0.00..1.15
rows=10 width=218) (actual time=0.000..0.002 rows=10 loops=3000)
                                       -> Seq Scan on course
(cost=0.00..1.10 rows=10 width=218) (actual time=0.023..0.038 rows=10
loops=1)
*-> Sort (cost=613.48..632.86 rows=7754 width=8) (actual
time=7.612..21.214 rows=77531 loops=1)*
                     Sort Key: broke_down_course.sno, broke_down_course.cno
                     Sort Method: quicksort Memory: 374kB
                     -> Seq Scan on broke_down_course
(cost=0.00..112.54 rows=7754 width=8) (actual time=0.016..1.366
rows=7754 loops=1)
Planning Time: 96.685 ms
Execution Time: 618.538 ms
(22 rows)

Maybe, my reproduction looks questionable, sorry for that, but I
seriously don't understand why we have so many tuples here in Sort node.

Attachment	Content-Type	Size
correct_rows.patch	text/x-patch	1.2 KB
db_dump.sql	application/sql	3.3 MB

Responses

Re: Sort operation displays more tuples than it contains its subnode at 2024-05-22 21:08:35 from David Rowley
Re: Sort operation displays more tuples than it contains its subnode at 2024-05-22 21:17:48 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2024-05-22 21:08:35	Re: Sort operation displays more tuples than it contains its subnode
Previous Message	Pavel Stehule	2024-05-22 19:09:54	Re: Schema variables - new implementation for Postgres 15