Re: [WIP] [B-Tree] Retail IndexTuple deletion

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date: 2018-06-29 06:39:49
Message-ID: CAGz5QC+auxOVFwRJBMW6shF0HKy7EKdb2hPKuWLBjNSUbB2nvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 29, 2018 at 11:04 AM, Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 29.06.2018 10:00, Kuntal Ghosh wrote:
>>
>> On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>>
>>> I prepare third version of the patches. Summary:
>>> 1. Mask DEAD tuples at a page during consistency checking (See comments
>>> for
>>> the mask_dead_tuples() function).
>>
>>
>> - ItemIdSetDead(lp);
>> + if (target_index_deletion_factor > 0)
>> + ItemIdMarkDead(lp);
>> + else
>> + ItemIdSetDead(lp);
>> IIUC, you want to hold the storage of DEAD tuples to form the ScanKey
>> which is required for the index scan in the second phase of quick
>> vacuum strategy. To achieve that, you've only marked the tuple as DEAD
>> during pruning. Hence, PageRepairFragmentation cannot claim the space
>> for DEAD tuples since they still have storage associated with them.
>> But, you've WAL-logged it as DEAD tuples having no storage. So, when
>> the WAL record is replayed in standby(or crash recovery), the tuples
>> will be marked as DEAD having no storage and their space may be
>> reclaimed by PageRepairFragmentation. Now, if you do byte-by-byte
>> comparison with wal_consistency tool, it may fail even for normal
>> tuple as well. Please let me know if you feel the same way.
>>
> Thanks for your analysis.
> In this development version of the patch I expect the same prune() strategy
> on a master and standby (i.e. target_index_deletion_factor is equal for
> both).
> In this case storage of a DEAD tuple holds during replay or recovery in the
> same way.
Okay. That's acceptable for now.

> On some future step of development I plan to use more flexible prune()
> strategy. This will require to append a 'isDeadStorageHold' field to the WAL
> record.
That sounds interesting. I'll be waiting for your next patches.

Few minor comments:
+ qsort((void *)vacrelstats->dead_tuples,
vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
tid_comparator);
+ ivinfo.isSorted = true;
My understanding is that vacrelstats->dead_tuples are already sorted
based on their tids. I'm referring to the following part in
lazy_scan_heap(),
for (blk=0 to nblocks)
{
for (offset=1 to max offset)
{
if certain conditions are met store the tuple in
vacrelstats->dead_tuples using lazy_record_dead_tuple();
}
}
So, you don't have to sort them again, right?

+ slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel));
+ econtext->ecxt_scantuple = slot;
+ ExecDropSingleTupleTableSlot(slot);
You don't have to do this for every tuple. Before storing the tuple,
you can just call MemoryContextReset(econtext->ecxt_per_tuple_memory);
Perhaps, you can check IndexBuildHeapRangeScan for the details.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-06-29 07:05:28 Re: partitioning - changing a slot's descriptor is expensive
Previous Message Daniel Gustafsson 2018-06-29 06:39:01 Re: CREATE TABLE .. LIKE .. EXCLUDING documentation