Re: Reviewing freeze map code

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-09 16:11:15
Message-ID: CA+TgmoZuTo8bZddPHt90paHUZMhHA=T-Xk7VoTVuw8MYMEfK2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2016 at 5:48 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Attached patch implements the above 2 functions. I have addressed the
> comments by Sawada San and you in latest patch and updated the documentation
> as well.

I made a number of changes to this patch. Here is the new version.

1. The algorithm you were using for growing the array size is unsafe
and can easily overrun the array. Suppose that each of the first two
pages have some corrupt tuples, more than 50% of MaxHeapTuplesPerPage
but less than the full value of MaxTuplesPerPage. Your code will
conclude that the array does need to be enlarged after processing the
first page. I switched this to what I consider the normal coding
pattern for such problems.

2. The all-visible checks seemed to me to be incorrect and incomplete.
I made the check match the logic in lazy_scan_heap.

3. Your 1.0 -> 1.1 upgrade script was missing copies of the REVOKE
statements you added to the 1.1 script. I added them.

4. The tests as written were not safe under concurrency; they could
return spurious results if the page changed between the time you
checked the visibility map and the time you actually examined the
tuples. I think people will try running these functions on live
systems, so I changed the code to recheck the VM bits after locking
the page. Unfortunately, there's either still a concurrency-related
problem here or there's a bug in the all-frozen code itself because I
once managed to get pg_check_frozen('pgbench_accounts') to return a
TID while pgbench was running concurrently. That's a bit alarming,
but since I can't reproduce it I don't really have a clue how to track
down the problem.

5. I made various cosmetic improvements.

If there are not objections, I will go ahead and commit this tomorrow,
because even if there is a bug (see point #4 above) I think it's
better to have this in the tree than not. However, code review and/or
testing with these new functions seems like it would be an extremely
good idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
check-visibility-v3.patch binary/octet-stream 19.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-06-09 16:18:23 Re: Reviewing freeze map code
Previous Message Peter Eisentraut 2016-06-09 16:04:59 Re: parallel workers and client encoding