RE: Unique index prohibits partial aggregates

From: "Bos, Fred" <fbos(at)huisman-nl(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: Unique index prohibits partial aggregates
Date: 2022-06-28 12:45:42
Message-ID: DB9PR02MB72092BF364375A18F4D6B1B3E7B89@DB9PR02MB7209.eurprd02.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for your reply.

I turned force_parallel_mode off.

Adding a BRIN index on this expression does indeed improve the performance for
this particular expression (it triggers the faster partial hash aggregate
again). However, my expression is variable, so "t/(1000*3600*24)" actually
is "t/dt", where dt changes frequently. When dt is changed, the query planner
reverts to GroupAggregate again (because the BRIN index is not suitable).

Also, when postgres doesn't know anything about the table, just after
importing, it has shown that it is capable of doing the partial hash
aggregate operation on the same table very quickly.

To elaborate on this:

If I setup the table with this query,

CREATE TABLE IF NOT EXISTS table (
t BIGINT NOT NULL,
v REAL NULL
);

then insert the rows,

INSERT INTO table .....;
about 30million rows,

and then directly execute the query,

SELECT t/dt as time, avg(v)
FROM table
GROUP BY time
ORDER BY time;

the query executes in under 2 seconds for any dt.

However, the query runs in about 20 seconds when I do the any of following:

- VACCUM ANALYZE and, then running the same query,
- or, CREATE UNIQUE INDEX ON table USING btree (t), and then running the
same query.

So it is possible to run the query much faster but postgres won't do it.

Is this because when the statistics are unknown, it expects a relatively low
amount of groups and opts for a partial plan?

Finally, is there a way to force postgres to do the partial hash aggregate,
either by changing a setting or by influencing the expected amount of output
groups for each query?

Regards,
Fred

---------------------------------------------------------------------------
The contents of this e-mail (including any attachments) are for the intended recipients only. If you are not an intended recipient but have received this email in error, we kindly request you to inform the sender of such error and delete this email and any attachments. If you open any attachments of this email, please understand that you do so at your own risk. We have made all reasonable efforts to keep this email and its attachments free from any bugs, viruses or the like, but cannot accept any responsibility for it.
Huisman Equipment B.V. and its affiliated companies cannot take any responsibility with regard to the accuracy or completeness of the content of this email and any attachments. Please note that this email and any attachments may contain information that is considered confidential, privileged and subject to copyright or other intellectual property rights. We kindly request (and insofar legally possible, demand) you to keep the content of this email and any attachments confidential and abide to the restrictions following from such protection.
---------------------------------------------------------------------------

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message vignesh C 2022-06-28 16:47:31 Re: Support logical replication of DDLs
Previous Message Karl Denninger 2022-06-28 12:16:25 Re: Libpq question related to allocated resources