From: | "Wood, Dan" <hexpert(at)amazon(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple |
Date: | 2017-10-05 01:39:52 |
Message-ID: | 8ABEB00F-E19E-4178-A00A-DDA99EA73D94@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.
On 10/4/17, 6:31 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion. Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.
Confirmed, the problem goes away with this patch on 9.3.
> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.
I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)
> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes? It does change with freezing.
>
> /*
> * If xmin isn't what we're expecting, the slot must have been
> * recycled and reused for an unrelated tuple. This implies that
> * the latest version of the row was deleted, so we need do
> * nothing. (Should be safe to examine xmin without getting
> * buffer's content lock, since xmin never changes in an existing
> * tuple.)
> */
> if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
> priorXmax))
Agreed. That's not good.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-10-05 02:15:35 | pgsql: Move SPI error reporting out of ri_ReportViolation() |
Previous Message | Michael Paquier | 2017-10-05 01:31:43 | Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2017-10-05 01:50:19 | Re: Block level parallel vacuum WIP |
Previous Message | Michael Paquier | 2017-10-05 01:31:43 | Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple |