Re: Freeze avoidance of very large table.

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-07-02 15:30:07
Message-ID: CAD21AoCwt7szDLPnLEyjz+xq0GXsnrmfvPA0Mr-OUdLa+OmUDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 2, 2015 at 1:06 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Jul 2, 2015 at 12:13 AM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>>>>>> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>>>>>>
>>>>>>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>>>>>>> Agreed, no extra file, and the same write volume as currently. It would
>>>>>>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>>>>>>> reuse some of that code.
>>>>>>>>
>>>>>>>
>>>>>>> Yeah, this approach seems promising. We probably can't reuse code from
>>>>>>> clog because the usage pattern is different (key for clog is xid, while
>>>>>>> for visibility/freeze map ctid is used). But visibility map storage
>>>>>>> layer is pretty simple so it should be easy to extend it for this use.
>>>>>>
>>>>>>
>>>>>> Actually, there may be some bit manipulation functions we could reuse;
>>>>>> things like efficiently counting how many things in a byte are set. Probably
>>>>>> doesn't make sense to fully refactor it, but at least CLOG is a good source
>>>>>> for cut/paste/whack.
>>>>>>
>>>>>
>>>>> I agree with adding a bit that indicates corresponding page is
>>>>> all-frozen into VM, just like CLOG.
>>>>> I'll change the patch as second version patch.
>>>>>
>>>>
>>>> The second patch is attached.
>>>>
>>>> In second patch, I added a bit that indicates all tuples in page are
>>>> completely frozen into visibility map.
>>>> The visibility map became a bitmap with two bit per heap page:
>>>> all-visible and all-frozen.
>>>> The logics around vacuum, insert/update/delete heap are almost same as
>>>> previous version.
>>>>
>>>> This patch lack some point: documentation, comment in source code,
>>>> etc, so it's WIP patch yet,
>>>> but I think that it's enough to discuss about this.
>>>>
>>>
>>> The previous patch is no longer applied cleanly to HEAD.
>>> The attached v2 patch is latest version.
>>>
>>> Please review it.
>>
>> Attached new rebased version patch.
>> Please give me comments!
>
> Now we should review your design and approach rather than code,
> but since I got an assertion error while trying the patch, I report it.
>
> "initdb -D test -k" caused the following assertion failure.
>
> vacuuming database template1 ... TRAP:
> FailedAssertion("!((((PageHeader) (heapPage))->pd_flags & 0x0004))",
> File: "visibilitymap.c", Line: 328)
> sh: line 1: 83785 Abort trap: 6
> "/dav/000_add_frozen_bit_into_visibilitymap_v3/bin/postgres" --single
> -F -O -c search_path=pg_catalog -c exit_on_error=true template1 >
> /dev/null
> child process exited with exit code 134
> initdb: removing data directory "test"

Thank you for bug report, and comments.

Fixed version is attached, and source code comment is also updated.
Please review it.

And I explain again here about what this patch does, current design.

- A additional bit for visibility map.
I added additional bit, say all-frozen bit, which indicates whether
the all pages of corresponding page are frozen, to visibility map.
This structure is similar to CLOG.
So the size of VM grew as twice as today.
Also, the flags of each heap page header might be set PD_ALL_FROZEN,
as well as all-visible

- Set and clear a all-frozen bit
Update and delete and insert(multi insert) operation would clear a bit
of that page, and clear flags of page header at same time.
Only vauum operation can set a bit if all tuple of a page are frozen.

- Anti-wrapping vacuum
We have to scan whole table for XID anti-warring today, and it's
really quite expensive because disk I/O.
The main benefit of this proposal is to reduce and avoid such
extremely large quantity I/O even when anti-wrapping vacuum is
executed.
We have to scan whole table for XID anti-warring today, and it's
really quite expensive.
In lazy_scan_heap() function, I added a such logic for experimental.

There were several another idea on previous discussion such as
read-only table, frozen map. But advantage of this direction is that
we don't need additional heap file, and can use the matured VM
mechanism.

Regards,

--
Sawada Masahiko

Attachment Content-Type Size
000_add_frozen_bit_into_visibilitymap_v4.patch text/x-diff 60.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-07-02 15:40:05 Re: Information of pg_stat_ssl visible to all users
Previous Message Dave Page 2015-07-02 15:16:59 PostgreSQL 9.5 Alpha 1 Released