Re: Vacuum: allow usage of more than 1GB of work mem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2017-01-25 16:54:08
Message-ID: CAD21AoDz3FCdgoLP2HmHzDdgz_sW6eQD7yXSVBDibFCYQptS5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 24, 2017 at 1:49 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Fri, Jan 20, 2017 at 6:24 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Thu, Jan 19, 2017 at 8:31 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> On Thu, Jan 19, 2017 at 6:33 AM, Anastasia Lubennikova
>>> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>>>> 28.12.2016 23:43, Claudio Freire:
>>>>
>>>> Attached v4 patches with the requested fixes.
>>>>
>>>>
>>>> Sorry for being late, but the tests took a lot of time.
>>>
>>> I know. Takes me several days to run my test scripts once.
>>>
>>>> create table t1 as select i, md5(random()::text) from
>>>> generate_series(0,400000000) as i;
>>>> create index md5_idx ON t1(md5);
>>>> update t1 set md5 = md5((random() * (100 + 500))::text);
>>>> vacuum;
>>>>
>>>> Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass,
>>>> while for old version it took three passes (1GB+1GB+0.9GB).
>>>> Vacuum duration results:
>>>>
>>>> vanilla:
>>>> LOG: duration: 4359006.327 ms statement: vacuum verbose t1;
>>>> patched:
>>>> LOG: duration: 3076827.378 ms statement: vacuum verbose t1;
>>>>
>>>> We can see 30% vacuum speedup. I should note that this case can be
>>>> considered
>>>> as favorable to vanilla vacuum: the table is not that big, it has just one
>>>> index
>>>> and disk used is a fast fusionIO. We can expect even more gain on slower
>>>> disks.
>>>>
>>>> Thank you again for the patch. Hope to see it in 10.0.
>>>
>>> Cool. Thanks for the review and the tests.
>>>
>>
>> I encountered a bug with following scenario.
>> 1. Create table and disable autovacuum on that table.
>> 2. Make about 200000 dead tuples on the table.
>> 3. SET maintenance_work_mem TO 1024
>> 4. VACUUM
>>
>> @@ -729,7 +759,7 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>> * not to reset latestRemovedXid since we want
>> that value to be
>> * valid.
>> */
>> - vacrelstats->num_dead_tuples = 0;
>> + lazy_clear_dead_tuples(vacrelstats);
>> vacrelstats->num_index_scans++;
>>
>> /* Report that we are once again scanning the heap */
>>
>> I think that we should do vacrelstats->dead_tuples.num_entries = 0 as
>> well in lazy_clear_dead_tuples(). Once the amount of dead tuples
>> reached to maintenance_work_mem, lazy_scan_heap can never finish.
>
> That's right.
>
> I added a test for it in the attached patch set, which uncovered
> another bug in lazy_clear_dead_tuples, and took the opportunity to
> rebase.
>
> On Mon, Jan 23, 2017 at 1:06 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> I pushed this patch after rewriting it rather completely. I added
>> tracing notices to inspect the blocks it was prefetching and observed
>> that the original coding was failing to prefetch the final streak of
>> blocks in the table, which is an important oversight considering that it
>> may very well be that those are the only blocks to read at all.
>>
>> I timed vacuuming a 4000-block table in my laptop (single SSD disk;
>> dropped FS caches after deleting all rows in table, so that vacuum has
>> to read all blocks from disk); it changes from 387ms without patch to
>> 155ms with patch. I didn't measure how much it takes to run the other
>> steps in the vacuum, but it's clear that the speedup for the truncation
>> phase is considerable.
>>
>> ĄThanks, Claudio!
>
> Cool.
>
> Though it wasn't the first time this idea has been floating around, I
> can't take all the credit.
>
>
> On Fri, Jan 20, 2017 at 6:25 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> FWIW, I think this patch is completely separate from the maint_work_mem
>> patch and should have had its own thread and its own commitfest entry.
>> I intend to get a look at the other patch next week, after pushing this
>> one.
>
> That's because it did have it, and was left in limbo due to lack of
> testing on SSDs. I just had to adopt it here because otherwise tests
> took way too long.

Thank you for updating the patch!

+ /*
+ * Quickly rule out by lower bound (should happen a lot) Upper bound was
+ * already checked by segment search
+ */
+ if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0)
+ return false;

I think that if the above result is 0, we can return true as itemptr
matched lower bound item pointer in rseg->dead_tuples.

+typedef struct DeadTuplesSegment
+{
+ int num_dead_tuples; /* # of
entries in the segment */
+ int max_dead_tuples; /* # of
entries allocated in the segment */
+ ItemPointerData last_dead_tuple; /* Copy of the last
dead tuple (unset
+
* until the segment is fully
+
* populated) */
+ unsigned short padding;
+ ItemPointer dead_tuples; /* Array of dead tuples */
+} DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+ int num_entries; /* current # of entries */
+ int max_entries; /* total # of slots
that can be allocated in
+ * array */
+ int num_segs; /* number of
dead tuple segments allocated */
+ int last_seg; /* last dead
tuple segment with data (or 0) */
+ DeadTuplesSegment *dead_tuples; /* array of num_segs segments */
+} DeadTuplesMultiArray;

It's a matter of personal preference but some same dead_tuples
variables having different meaning confused me.
If we want to access first dead tuple location of first segment, we
need to do 'vacrelstats->dead_tuples.dead_tuples.dead_tuples'. For
example, 'vacrelstats->dead_tuples.dt_segment.dt_array' is better to
me.

+ nseg->num_dead_tuples = 0;
+ nseg->max_dead_tuples = 0;
+ nseg->dead_tuples = NULL;
+ vacrelstats->dead_tuples.num_segs++;
+ }
+ seg = DeadTuplesCurrentSegment(vacrelstats);
+ }
+ vacrelstats->dead_tuples.last_seg++;
+ seg = DeadTuplesCurrentSegment(vacrelstats);

Because seg is always set later I think the first line starting with
"seg = ..." is not necessary. Thought?

--
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-01-25 16:57:54 Re: COPY as a set returning function
Previous Message Robert Haas 2017-01-25 16:38:50 Re: pg_ls_dir & friends still have a hard-coded superuser check