Re: Small improvement to compactify_tuples

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Small improvement to compactify_tuples
Date: 2017-11-03 14:37:46
Message-ID: CAGTBQpYtaiZfN+=EExrm_oXvypMmKeEjue33ZkzGohDr-jbQyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru> writes:
>> [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ]
>
> I started to review this patch. I spent a fair amount of time on
> beautifying the code, because I found it rather ugly and drastically
> undercommented. Once I had it to the point where it seemed readable,
> I went to check the shellsort algorithm against Wikipedia's entry,
> and found that this appears to be an incorrect implementation of
> shellsort: where pg_shell_sort_pass has
>
> for (_i = off; _i < _n; _i += off) \
>
> it seems to me that we need to have
>
> for (_i = off; _i < _n; _i += 1) \
>
> or maybe just _i++. As-is, this isn't h-sorting the whole file,
> but just the subset of entries that have multiple-of-h indexes
> (ie, the first of the h distinct subfiles that should get sorted).
> The bug is masked by the final pass of plain insertion sort, but
> we are not getting the benefit we should get from the earlier passes.
>
> However, I'm a bit dubious that it's worth fixing that; instead
> my inclination would be to rip out the shellsort implementation
> entirely. The code is only using it for the nitems <= 48 case
> (which makes the first three offset steps certainly no-ops) and
> I am really unconvinced that it's worth expending the code space
> for a shellsort rather than plain insertion sort in that case,
> especially when we have good reason to think that the input data
> is nearly sorted.

I actually noticed that and benchmarked some variants. Neither
made any noticeable difference in performance, so I decided not
to complain about them.

I guess the same case can be made for removing the shell sort.
So I'm inclined to agree.

> BTW, the originally given test case shows no measurable improvement
> on my box.

I did manage to reproduce the original test and got a consistent improvement.

> I was eventually able to convince myself by profiling
> that the patch makes us spend less time in compactify_tuples, but
> this test case isn't a very convincing one.
>
> So, quite aside from the bug, I'm not excited about committing the
> attached as-is. I think we should remove pg_shell_sort and just
> use pg_insertion_sort. If somebody can show a test case that
> provides a measurable speed improvement from the extra code,
> I could be persuaded to reconsider.

My tests modifying the shell sort didn't produce any measurable
difference, but I didn't test removing it altogether.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-11-03 14:53:30 Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Tom Lane 2017-11-03 14:35:20 Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM