Re: Freeze avoidance of very large table.

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-04-06 13:17:04
Message-ID: 55228750.9040604@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/6/15 1:46 AM, Sawada Masahiko wrote:
> On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>>>
>>> On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>>>>
>>>> + case HEAPTUPLE_LIVE:
>>>> + case HEAPTUPLE_RECENTLY_DEAD:
>>>> + case HEAPTUPLE_INSERT_IN_PROGRESS:
>>>> + case HEAPTUPLE_DELETE_IN_PROGRESS:
>>>> + if
>>>> (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
>>>> +
>>>> mxactcutoff, &frozen[nfrozen]))
>>>> + frozen[nfrozen++].offset
>>>> = offnum;
>>>> + break;
>>>
>>>
>>> This doesn't seem safe enough to me. Can't there be tuples that are still
>>> new enough that they can't be frozen, and are still live?
>>
>>
>> Yep. I've set a table to read only while it contained unfreezable tuples,
>> and the tuples remain unfrozen yet the read-only action claims to have
>> succeeded.
>>
>>
>>>
>>> Somewhat related... instead of forcing the freeze to happen synchronously,
>>> can't we set this up so a table is in one of three states? Read/Write, Read
>>> Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
>>> appropriate state, and all the vacuum infrastructure would continue to
>>> process those tables as it does today. lazy_vacuum_rel would become
>>> responsible for tracking if there were any non-frozen tuples if it was also
>>> attempting a freeze. If it discovered there were none, AND the table was
>>> marked as ReadOnly, then it would change the table state to Frozen and set
>>> relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
>>> AT_SetReadWrite could change relfrozenxid to it's own Xid as an
>>> optimization. Doing it that way leaves all the complicated vacuum code in
>>> one place, and would eliminate concerns about race conditions with still
>>> running transactions, etc.
>>
>>
>> +1 here as well. I might want to set tables to read only for reasons other
>> than to avoid repeated freezing. (After all, the name of the command
>> suggests it is a general purpose thing) and wouldn't want to automatically
>> trigger a vacuum freeze in the process.
>>
>
> Thank you for comments.
>
>> Somewhat related... instead of forcing the freeze to happen synchronously, can't we set this up so a table is in one of three states? Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to > the appropriate state, and all the vacuum infrastructure would continue to process those tables as it does today. lazy_vacuum_rel would become responsible for tracking if there were any non-frozen tuples if it was also attempting > a freeze. If it discovered there were none, AND the table was marked as ReadOnly, then it would change the table state to Frozen and set relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own Xid as an optimization. Doing it that way leaves all the complicated vacuum code in one place, and would eliminate concerns about race conditions with still running transactions, etc.
>
> I agree with 3 status, Read/Write, ReadOnly and Frozen.
> But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
> I think that the any changes to table are completely
> ignored/restricted if table is marked as ReadOnly table,
> and it's accompanied by freezing tuples, just mark as ReadOnly.
> Frozen table ensures that all tuples of its table completely has been
> frozen, so it also needs to scan whole table as well.
> e.g., we should need to scan whole table at two times. right?

No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on
the table.

The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not
change during vacuum
- Vacuum ensured that there were no un-frozen tuples in the table

That does not necessitate 2 scans.

>> +1 here as well. I might want to set tables to read only for reasons other than to avoid repeated freezing. (After all, the name of the command suggests it is a general purpose thing) and wouldn't want to automatically trigger a
>> vacuum freeze in the process.
>>
>> There is another possibility here, too. We can completely divorce a ReadOnly mode (which I think is useful for other things besides freezing) from the question of whether we need to force-freeze a relation if we create a
>> FrozenMap, similar to the visibility map. This has the added advantage of helping freeze scans on relations that are not ReadOnly in the case of tables that are insert-mostly or any other pattern where most pages stay all-frozen.
>> Prior to the visibility map this would have been a rather daunting project, but I believe this could piggyback on the VM code rather nicely. Anytime you clear the VM you clearly must clear the FrozenMap as well. The logic for
>> setting the FM is clearly different, but that would be entirely self-contained to vacuum. Unlike the VM, I don't see any point to marking special bits in the page itself for FM.
>
> I was thinking this idea (FM) to avoid freezing all tuples actually.
> As you said, it might not be good idea (or overkill) that the reason
> why settings table to read only is avoidance repeated freezing.
> I'm attempting to try design FM to avoid freezing relations as well.
> Is it enough that each bit of FM has information that corresponding
> pages are completely frozen on each bit?

If I'm understanding your implied question correctly, I don't think
there would actually be any relationship between FM and marking
ReadOnly. It would come into play if we wanted to do the Frozen state,
but if we have the FM, marking an entire relation as Frozen becomes a
lot less useful. What's going to happen with a VACUUM FREEZE once we
have FM is that vacuum will be able to skip reading pages if they are
all-visible *and* the FM shows them as frozen, whereas today we can't
use the VM to skip pages if scan_all is true.

For simplicity, I would start out with each FM bit representing a single
page. That means the FM would be very similar in operation to the VM;
the only difference would be when a bit in the FM was set. I would
absolutely split this into 2 patches as well; one for ReadOnly (and skip
the Frozen status for now), and one for FM.

When I looked at the VM code briefly it occurred to me that it might be
quite difficult to have 1 FM bit represent multiple pages. The issue is
the locking necessary between VACUUM and clearing a FM bit. In the VM
that's handled by the cleanup lock, but that will only work at a page
level. We'd need something to ensure that nothing came in and performed
DML while the vacuum code was getting ready to set a FM bit. There's
probably several ways this could be accomplished, but I think it would
be foolish to try and do anything about it in the initial patch.
Especially because it's only supposition that there would be much
benefit to having multiple pages per bit.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-04-06 13:21:07 Re: TABLESAMPLE patch
Previous Message Alvaro Herrera 2015-04-06 13:16:36 Re: initdb -S and tablespaces