Re: POC: GROUP BY optimization

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-02-21 15:08:54
Message-ID: CACG=ezaYM1tr6Lmp8PRH1aeZq=rBKXEoTwgzMcLaD5MPhfW0Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Another issue on test introduced in 0452b461bc405. I think it may be
unstable in some circumstances.
For example, if we'll try to use different BLCKSZ. See, I've made a little
change in the number of tuples to be inserted:

$ git diff
diff --git a/src/test/regress/sql/aggregates.sql
b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;

And the bulk extension is kicked, so we got zeroed pages in the relation.
The plane is also changed,
switched to seq scan from index scan:
@@ -2734,7 +2734,7 @@
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;
-- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@

-- Engage incremental sort
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+------------------------------
Group
Group Key: x, y, z, w
- -> Incremental Sort
+ -> Sort
Sort Key: x, y, z, w
- Presorted Key: x, y
- -> Index Scan using btg_x_y_idx on btg
-(6 rows)
+ -> Seq Scan on btg
+(5 rows)
... and so on.

So, my proposal is simple. I think we need not just "ANALYZE btg", but
"VACUUM ANALYZE btg", to get rid of zeroed pages in this particular
case. PFA corresponding patch.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
0001-Force-VACUUM-to-avoid-zeroed-pages-from-bulk-extensi.patch application/octet-stream 1.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-02-21 15:14:02 Re: Patch: Add parse_type Function
Previous Message Jelte Fennema-Nio 2024-02-21 15:03:31 Improve readability by using designated initializers when possible