Re: [sqlsmith] Short reads in hash indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andreas Seltenreich <seltenreich(at)gmx(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Short reads in hash indexes
Date: 2016-12-11 06:33:28
Message-ID: CAA4eK1KvpTbT-5WSVpOco4fvexwWSgBG0jt48EBLF4m5ocEkWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 8, 2016 at 2:38 AM, Andreas Seltenreich <seltenreich(at)gmx(dot)de> wrote:
> Andreas Seltenreich writes:
>
>> Amit Kapila writes:
>>
>>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich(at)gmx(dot)de> wrote:
>>>> Amit Kapila writes:
>>>>
>>>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>>> Ok, I'll do testing with the patch applied.
>>
>> Good news: the assertion hasn't fired since the patch is in.
>
> Meh, it fired again today after being silent for 100e6 queries :-/
> I guess I need to add some confidence qualification on such statements.
> Maybe sigmas as they do at CERN…
>

This assertion can be reproduced with Jeff's test as well and the fix
for the same is posted [1].

>> smith=# select * from state_report where sqlstate = 'XX001';
>> -[ RECORD 1 ]------------------------------------------------------------------------------
>> count | 10
>> sqlstate | XX001
>> sample | ERROR: could not read block 1173 in file "base/16384/17256": read only 0 of 8192 bytes
>> hosts | {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>>
>>> Hmm, I am not sure if this is related to previous problem, but it
>>> could be. Is it possible to get the operation and or callstack for
>>> above failure?
>>
>> Ok, will turn the elog into an assertion to get at the backtraces.
>
> Doing so on top of 4212cb7, I caught the backtrace below. Query was:
>
> --8<---------------cut here---------------start------------->8---
> set max_parallel_workers_per_gather = 0;
> select count(1) from
> public.hash_name_heap as ref_2
> join public.rtest_emplog as sample_1
> on (ref_2.random = sample_1.who);
> --8<---------------cut here---------------end--------------->8---
>
> I've put the data directory where it can be reproduced here:
>
> http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)
>

This can happen due to non-marking of the dirty buffer as the index
page where we have deleted the tuples will not be flushed whereas
vacuum would have removed corresponding heap tuples. Next access to
hash index page will bring back the old copy of index page which
contains tuples that were supposed to get deleted by vacuum and
accessing those tuples will give wrong information about heap tuples
and when we try to access deleted heap tuples, it can give us short
reads problem.

Can you please try with the patch posted on hash index thread [1] to
see if you can reproduce any of these problems?

[1] - https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-12-11 07:04:27 Re: Back-patch use of unnamed POSIX semaphores for Linux?
Previous Message Amit Kapila 2016-12-11 06:24:54 Re: Hash Indexes