Re: Estimate maintenance_work_mem for CREATE INDEX

From: Alex Shulgin <alex(dot)shulgin(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, pgsql-admin(at)lists(dot)postgresql(dot)org, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Estimate maintenance_work_mem for CREATE INDEX
Date: 2017-12-19 22:32:33
Message-ID: CAM-UEKRo6jSVNMQP=P-O6YsBFfjP9E2AZeOmJrU4GYCYt8Du5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Tue, Dec 19, 2017 at 3:15 PM Greg Stark <stark(at)mit(dot)edu> wrote:

> On 19 December 2017 at 10:00, Oleksandr Shulgin
> <oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
> > If there would be an option in the database itself to provide those
> > estimation, we wouldn't even need to figure out estimation queries.
> > "EXPLAIN CREATE INDEX" anyone?
>
> You're not the first to propose something like that. I think an
> EXPLAIN ALTER TABLE would also be very handy -- it's currently
> impossible to tell without carefully reading the source code whether a
> given DDL change will require a full table scan, a full table rewrite,
> or just a quick meta data update (and even in that case what strength
> lock will be required). I think there are other utility statements
> that make interesting heuristic decisions that would be nice to be
> able to have some visibility into -- CLUSTER comes to mind.
>

Yes, that would be pretty handy.

I'm not clear how you would determine how much memory is needed to
> sort a table without actually doing the sort though. So that would be
> more of an EXPLAIN ANALYZE wouldn't it?
>

My idea would be to use statistic. So that EXPLAIN CREATE INDEX (or
whatever the actual interface could be like) would benefit from up-to-date
statistic produced by ANALYZE.

Based on the estimated number of rows in the table, average width of
column(s) to index and taking into account the bookkeeping structures one
should be able to arrive at a good guess for the amount of memory the
backend would end up allocating (assuming it is available).

Having done that, as the first step, and using statistic again we could
also infer (though, probably with less accuracy) memory requirements for
building partial indexes. Functional indexes would be harder to tackle, I
would think this is only possible if the return type(s) of the function(s)
has all fixed width.

I didn't look in the code, but I imagine the procedure to read -> sort
-> spill to tapes, if needed -> merge sort the tapes is generic to all
index types, so this shouldn't be a breaking change for any user-defined
indexes (is this already a thing?). OK, maybe it's only generic for B-Tree
and BRIN, but not for GIN and GiST, to name a few. Damn, I gotta look in
the code at some point. ;-)

To let me fantasize a little more, what I would also love to see is the
estimated on-disk size for the resulting index, before starting to create
it. This is obviously dependent on the actual index type and options, such
as fill-factor, etc.

Cheers,
--
Alex

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Michael Paquier 2017-12-20 02:31:43 Re: Estimate maintenance_work_mem for CREATE INDEX
Previous Message Francis Santiago 2017-12-19 18:05:14 Re:

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-12-19 22:35:28 TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals
Previous Message Andrew Dunstan 2017-12-19 22:10:14 Re: File name as application name in regression tests and elsewhere