Re: Not HOT enough

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Not HOT enough
Date: 2011-11-23 15:15:55
Message-ID: CA+TgmobreJ17SP=GaALYu029MKQ_XcZYkiPhbezLBiXBYUgy7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 23, 2011 at 9:25 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Yeah.  This change would have the disadvantage of disabling HOT
>> cleanup for shared catalogs; I'm not sure whether that's a good
>> decision.
>
> No, it disables cleanup when being read. They are still VACUUMed normally.
>
> Note that non-MVCC snapshots never did run HOT page-level cleanup, so
> this hardly changes anything.
>
> And it effects shared catalogs only, which are all low traffic anyway.

I think "low traffic" is the key point. I understand that you're not
changing the VACUUM behavior, but you are making heap_page_prune_opt()
not do anything when a shared catalog is involved. That would be
unacceptable if we expected shared catalogs to be updated frequently,
either now or in the future, but I guess we don't expect that.

I suppose we could compute two RecentGlobalXmin values, one for all
databases and one for just the current database. But that would make
GetSnapshotData() slower, which would almost certainly be a cure worse
than the disease.

>> But now that you mention it, something seems funky about the other bit
>> you mention, too:
>>
>> +                       /* MVCC snapshots ignore other databases */
>> +                       if (!allDbs &&
>> +                               proc->databaseId != MyDatabaseId &&
>> +                               proc->databaseId != 0)          /* always include WalSender */
>> +                               continue;
>> +
>>
>> It doesn't make sense for the RecentGlobalXmin calculation to depend
>> on whether or not the current snapshot is an MVCC snapshot, because
>> RecentGlobalXmin is a global variable not related to any particular
>> snapshot.  I don't believe it's safe to assume that RecentGlobalXmin
>> will only ever be used in conjunction with the most-recently-taken
>> snapshot.
>
> Why would that matter exactly? RecentGlobalXmin is used in 4 places
> and this works with them all.
>
> This changes the meaning of that variable from what it was previously,
> but so what? It's backend local.

I don't object to changing the meaning of the variable, but I don't
understand how it can be correct to include backends from other
databases in the RecentGlobalXmin calculation when using an MVCC
snapshot, but not otherwise. Come to think of it, when does
GetSnapshotData() get called with a non-MVCC snapshot anyway?

> The huge benefit is that we clean up data in normal tables much better
> than we did before in cases where people use multiple databases, which
> is a common case.

I understand the benefit; I just want to make sure we're not going to
break anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-11-23 15:20:44 Re: Not HOT enough
Previous Message Simon Riggs 2011-11-23 14:48:22 Re: Inlining comparators as a performance optimisation