Re: VACUUM's ancillary tasks

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Vik Fearing <vik(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM's ancillary tasks
Date: 2017-01-30 19:16:40
Message-ID: 20170130191640.2johoyume5v2dbbq@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro wrote:

> About BRIN indexes: I couldn't find an explanation of why BRIN
> indexes don't automatically create new summary tuples when you insert
> a new tuple in an unsummarised page range. Is it deferred until
> VACUUM time in order to untangle some otherwise unresolvable
> interlocking or crash safety problem, or could that one day be done?

The reason is performance for the bulk insert case, which we don't want
to slow down; the range summarization is done at a later time by a
background process so that the inserting process is not slowed down by
having to repeatedly re-compute the summary tuple for each heap
insertion. I think the ideal mechanism would be that a summarization is
signalled somehow (to another process) as soon as an insertion occupies
a block just past the previous unsummarized range. (If there are many
readers, perhaps it's better to summarize when the range is half full or
something like that.)

We could have a reloption that switches from this behavior to the other
obvious possibility which is to insert a new summary tuple upon the
first heap insertion to an unsummarized range, but ISTM that that
behavior is pessimal.

> Counting inserts seems slightly bogus because you can't tell whether
> those were inserts into an existing summarised block which is
> self-maintaining or not. At first glance it looks a bit like
> unsummarised ranges can only appear at the end of the table, is that
> right? If so, couldn't you detect the number of unsummarised BRIN
> blocks just by comparing the highest summarised BRIN block and the
> current heap size?

We don't have mechanism to invalidate the summary of a range thus far,
so yeah we could try to detect it directly as you suggest.

I would like to be able to invalidate these tuples though, for the case
where many tuples are removed from a range (a fresh summarization could
produce tighter limits). I think this problem does not necessarily
invalidate the idea above.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-01-30 19:22:04 Re: patch: function xmltable
Previous Message Tomas Vondra 2017-01-30 19:12:35 Re: multivariate statistics (v19)