From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] [B-Tree] Retail IndexTuple deletion |
Date: | 2018-06-22 20:14:05 |
Message-ID: | CAH2-Wzm7BS151fccSmeP2D8TBOEc5k6RjKtEM_zTeHJRzXi4qA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
>
> Cool.
>
> What is this "race" code about?
I noticed another bug in your patch, when running a
"wal_consistency_checking=all" smoke test. I do this simple, generic
test for anything that touches WAL-logging, actually -- it's a good
practice to adopt.
I enable "wal_consistency_checking=all" on the installation, create a
streaming replica with pg_basebackup (which also has
"wal_consistency_checking=all"), and then run "make installcheck"
against the primary. Here is what I see on the standby when I do this
with v2 of your patch applied:
9524/2018-06-22 13:03:12 PDT LOG: entering standby mode
9524/2018-06-22 13:03:12 PDT LOG: consistent recovery state reached
at 0/30000D0
9524/2018-06-22 13:03:12 PDT LOG: invalid record length at 0/30000D0:
wanted 24, got 0
9523/2018-06-22 13:03:12 PDT LOG: database system is ready to accept
read only connections
9528/2018-06-22 13:03:12 PDT LOG: started streaming WAL from primary
at 0/3000000 on timeline 1
9524/2018-06-22 13:03:12 PDT LOG: redo starts at 0/30000D0
9524/2018-06-22 13:03:32 PDT FATAL: inconsistent page found, rel
1663/16384/1259, forknum 0, blkno 0
9524/2018-06-22 13:03:32 PDT CONTEXT: WAL redo at 0/3360B00 for
Heap2/CLEAN: remxid 599
9523/2018-06-22 13:03:32 PDT LOG: startup process (PID 9524) exited
with exit code 1
9523/2018-06-22 13:03:32 PDT LOG: terminating any other active server processes
9523/2018-06-22 13:03:32 PDT LOG: database system is shut down
I haven't investigated this at all, but I assume that the problem is a
simple oversight. The new ItemIdSetDeadRedirect() concept that you've
introduced probably necessitates changes in both the WAL logging
routines and the redo/recovery routines. You need to go make those
changes. (By the way, I don't think you should be using the constant
"3" with the ItemIdIsDeadRedirection() macro definition.)
Let me know if you get stuck on this, or need more direction.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2018-06-22 20:54:52 | Re: bug with expression index on partition |
Previous Message | Alvaro Herrera | 2018-06-22 20:03:51 | Re: [PATCH] Include application_name in "connection authorized" log message |