Re: BRIN index and aborted transaction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BRIN index and aborted transaction
Date: 2015-07-21 20:47:00
Message-ID: CA+Tgmoa=j9J8gGwbxttuKWk=KOqJNkTCo9djVhbLAmO1t390-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 18, 2015 at 5:11 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Yeah, that's a bit of an open problem: we don't have any mechanism to
> mark a block range as needing resummarization, yet. I don't have any
> great ideas there, TBH. Some options that were discussed but never led
> anywhere:
>
> 1. whenever a heap tuple is deleted that's minimum or maximum for a
> column, mark the index tuple as needing resummarization. One a future
> vacuuming pass the index would be updated. (I think this works for
> minmax, but I don't see how to apply it to inclusion).
>
> 2. have block ranges be resummarized randomly during vacuum.
>
> 3. Have index tuples last for only X number of transactions, marking the
> as needing summarization when that expires.
>
> 4. Have a user-invoked function that re-runs summarization. That way
> the user can implement any of the above policies, or others.

Maybe I'm confused here, but it seems like the only time
re-summarization can be needed is when tuples are pruned. The mere
act of deleting a tuple, even if the delete goes on to commit, doesn't
create a scenario where re-summarization can work out to a win,
because there may still be snapshots that can see it. At the point
where we prune the tuple, though, there might well be a benefit in
re-summarizing, because now a newly-computed summary value won't need
to cover a value that previously had to be there.

But it seems obviously impractical to re-summarize when we HOT-prune,
so it seems like the obvious thing to do is make vacuum do it. We
know during phase one of vacuum whether we saw any dead tuples in page
range X-Y; if yes, re-summarize. The only reason not to do this is if
it causes us to do a lot of resummarization that frequently fails to
produce a smaller range. Do you have any experimental data suggesting
that this is or is not a problem?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-07-21 20:52:04 Re: Arguable RLS security bug, EvalPlanQual() paranoia
Previous Message Robert Haas 2015-07-21 20:24:22 Re: [PROPOSAL] VACUUM Progress Checker.