_hash_alloc_buckets() safety

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: _hash_alloc_buckets() safety
Date: 2016-09-13 12:21:41
Message-ID: CAA4eK1LMYM8cmdOCgUdz+UzdRs5pe4pq-2BsQpR_BjdN57Mm=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working on write-ahead-logging of hash indexes, I noticed that
this function allocates buckets in batches and the mechanism it uses
is that it initialize the last page of batch with zeros and expect
that the filesystem will ensure the intervening pages read as zeroes
too.

I think to make it WAL enabled, we need to initialize the page header
(using PageInit() or equivalent) instead of initializing it with
zeroes as some part of our WAL replay machinery expects that the page
should not be new as indicated by me in other thread [1]. I think WAL
consistency check tool [2] also uses same part of replay functions and
will show this as problem, if we don't initialize the page header.

The point which is not clear to me is that whether it is okay as-is or
shall we try to initialize each page of batch during
_hash_alloc_buckets() considering now we are trying to make hash
indexes WAL enabled. Offhand, I don't see any problem with just
initializing the last page and write the WAL for same with
log_newpage(), however if we try to initialize all pages, there could
be some performance penalty on split operation.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com
[2] - https://commitfest.postgresql.org/10/741/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-09-13 12:59:57 Re: cost_sort() may need to be updated
Previous Message Robert Haas 2016-09-13 12:16:44 Re: Hash Indexes