Re: hash index on unlogged tables doesn't behave as expected

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash index on unlogged tables doesn't behave as expected
Date: 2017-07-03 09:54:30
Message-ID: CAE9k0P=iqPtHzGObikRjhxaz4tWFW+XZmgGnwBwz3=nRgxGSAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> While discussing the behavior of hash indexes with Bruce in the nearby
> thread [1], it has been noticed that hash index on unlogged tables
> doesn't behave as expected. Prior to 10, it has the different set of
> problems (mainly because hash indexes are not WAL-logged) which were
> discussed on that thread [1], however when I checked, it doesn't work
> even for 10. Below are steps to reproduce the problem.
>
> 1. Setup master and standby
> 2. On the master, create unlogged table and hash index.
> 2A. Create unlogged table t1(c1 int);
> 2B. Create hash index idx_t1_hash on t1 using hash(c1);
> 3. On Standby, try selecting data,
> select * from t1;
> ERROR: cannot access temporary or unlogged relations during recovery
> ---Till here everything works as expected
> 4. Promote standby to master (I have just stopped the standby and
> master and removed recovery.conf file from the standby database
> location) and try starting the new master, it
> gives below error and doesn't get started.
> FATAL: could not create file "base/12700/16387": File exists
>
> The basic issue was that the WAL logging for Create Index operation
> was oblivion of the fact that for unlogged tables only INIT forks need
> to be logged. Another point which we need to consider is that while
> replaying the WAL for the create index operation, we need to flush the
> buffer if it is for init fork. This needs to be done only for pages
> that can be part of init fork file (like metapage, bitmappage).
> Attached patch fixes the issue.
>
> Another approach to fix the issue could be that always log complete
> pages for the create index operation on unlogged tables (in
> hashbuildempty). However, the logic for initial hash index pages
> needs us to perform certain other actions (like update metapage after
> the creation of bitmappage) which make it difficult to follow that
> approach.
>

Thanks for sharing the steps to reproduce the issue in detail. I could
easily reproduce this issue. I also had a quick look into the patch and the
fix looks fine to me. However, it would be good if we can add at least one
test for unlogged table with hash index in the regression test suite.

Apart from that i have tested the patch and the patch seems to be working
fine. Here are the steps that i have followed to verify the fix,

With Patch:
========
postgres=# SELECT pg_relation_filepath('t1');
pg_relation_filepath
----------------------
base/14915/16384
(1 row)

postgres=# SELECT pg_relation_filepath('idx_t1_hash');
pg_relation_filepath
----------------------
base/14915/16387
(1 row)

Master:
=====
[ashu(at)localhost bin]$ ls -l ../master/base/14915/1638*
-rw-------. 1 ashu ashu 18128896 Jul 3 14:29 ../master/base/14915/16384
-rw-------. 1 ashu ashu 24576 Jul 3 14:29 ../master/base/14915/16384_fsm
-rw-------. 1 ashu ashu 0 Jul 3 14:28
../master/base/14915/16384_init
-rw-------. 1 ashu ashu 22339584 Jul 3 14:29 ../master/base/14915/16387
-rw-------. 1 ashu ashu 32768 Jul 3 14:28
../master/base/14915/16387_init

Standby:
======
[ashu(at)localhost bin]$ ls -l ../standby/base/14915/1638*
-rw-------. 1 ashu ashu 0 Jul 3 14:28 ../standby/base/14915/16384_init
-rw-------. 1 ashu ashu 32768 Jul 3 14:28 ../standby/base/14915/16387_init

Without patch:
==========
postgres=# SELECT pg_relation_filepath('t1');
pg_relation_filepath
----------------------
base/14915/16384
(1 row)

postgres=# SELECT pg_relation_filepath('idx_t1_hash');
pg_relation_filepath
----------------------
base/14915/16387
(1 row)

Master:
=====
[ashu(at)localhost bin]$ ls -l ../master/base/14915/1638*
-rw-------. 1 ashu ashu 18128896 Jul 3 14:36 ../master/base/14915/16384
-rw-------. 1 ashu ashu 24576 Jul 3 14:36 ../master/base/14915/16384_fsm
-rw-------. 1 ashu ashu 0 Jul 3 14:35
../master/base/14915/16384_init
-rw-------. 1 ashu ashu 22339584 Jul 3 14:36 ../master/base/14915/16387
-rw-------. 1 ashu ashu 32768 Jul 3 14:35
../master/base/14915/16387_init

Standby:
======
[ashu(at)localhost bin]$ ls -l ../standby/base/14915/1638*
-rw-------. 1 ashu ashu 0 Jul 3 14:35 ../standby/base/14915/16384_init
*-rw-------. 1 ashu ashu 73728 Jul 3 14:35 ../standby/base/14915/16387*
-rw-------. 1 ashu ashu 24576 Jul 3 14:35 ../standby/base/14915/16387_init

As seen from the test results above, it is clear that without patch, both
main fork and init fork were getting WAL logged as the create hash index
WAL logging was being done without checking forkNum type.

_hash_init()
{
................
................
log_newpage(&rel->rd_node,
forkNum,
blkno,
BufferGetPage(buf),
true);
_hash_relbuf(rel, buf);
....................
....................
}

I have also ran the WAL consistency check tool to verify the things and
didn't find any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-07-03 09:54:38 Re: UPDATE of partition key
Previous Message Amit Langote 2017-07-03 09:32:24 Re: Multi column range partition table