Seeking reason behind performance gain in 12 with HashAggregate

From: Shira Bezalel <shira(at)sfei(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Seeking reason behind performance gain in 12 with HashAggregate
Date: 2020-01-13 16:29:05
Message-ID: CAE0KEwGnDspRJjX=cdvGC69=F=76jbhiViDp=T5pPYR0xF2NAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi All,

I'm testing an upgrade from Postgres 9.6.16 to 12.1 and seeing a
significant performance gain in one specific query. This is really great,
but I'm just looking to understand why. Reading through the release notes
across all the new versions (10, 11, 12) hasn't yielded an obvious cause,
but maybe I missed something. Also, I realize it could be related to other
factors (config parameters, physical hosts, etc), but the systems are
pretty similar so just wondering about Postgres changes.

The query is the following:

SELECT pvc.value, SUM(pvc.count) AS sum
FROM
(SELECT (ST_ValueCount(cv.rast, 1)).*
FROM calveg_whrtype_20m AS cv) AS pvc
GROUP BY pvc.value

Here is the EXPLAIN (ANALYZE ON, BUFFERS ON) output from both systems:

9.6 plan <https://explain.depesz.com/s/W8HN>
12.1 plan <https://explain.depesz.com/s/lIRS>

In the 9.6 plan, the Seq Scan node produced 15,812 rows.
In the 12 plan, the Seq Scan produced 2,502 rows, and then the ProjectSet
node produced 15,812 rows.

Note that the table (calveg_whrtype_20m) in the two databases have the same
number of rows (2,502).

So it seems something about the introduction of the ProjectSet node between
the Seq Scan and HashAggregate is optimizing things...? Is this the right
conclusion to draw and if so, why might this be happening? Is there
something that was changed/improved in either 10, 11 or 12 that this
behavior can be attributed to?

Two more notes --

1. If I run the inner subquery without the outer sum/group by, the plans
between the two systems are identical.

2. As the calgeg_whrtype_20m table is a raster, I started my question on
the PostGIS list, but there was no obvious answer that the gain is related
to a change in the PostGIS code so I'm now turning to this list.

Thank you,
Shira

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2020-01-13 16:42:20 Re: Seeking reason behind performance gain in 12 with HashAggregate
Previous Message Jeff Janes 2020-01-10 17:12:52 Re: Bad query plan when you add many OR conditions