Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <heikki(dot)linnakangas(at)enterprisedb(dot)com>,<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 18:27:05
Message-ID: 4E8472290200002500041918@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Tom Lane wrote:

>> Hmm, why is that patch the one posted for review, when several
>> better versions were already discussed? See thread starting here:
>> http://archives.postgresql.org/pgsql-hackers/2011-07/msg00028.php
>
> The patch I reviewed was added to the CF app before the post you
> cite was written. I didn't see it in following the links
> (probably because it crossed a month boundary). Thanks for
> pointing that out; I'll update the CF app and review the later
> versions.

The first patch Tom posted was a clear improvement on Heikki's
original patch, and performed slightly better.

The second patch Tom posted makes the patch more robust in the face
of changes to the structure, but reduces the performance benefit on
the dictionary, and performs very close to the unpatched version for
samples of English text. If anything, it seems to be slower with
real text, but the difference is so small it might be noise. (The
dictionary tests are skewed toward longer average word length than
typically found in actual readable text.) I wonder whether the code
for the leading, unaligned portion of the data could be written such
that it was effectively unrolled and the length resolved at compile
time rather than figured out at run time for every call to the
function. That would solve the robustness issue without hurting
performance. If we don't do something like that, this patch doesn't
seem worth applying.

Heikki's second version, a more radical revision optimized for 64
bit systems, blows up on a 32 bit compile, writing off the end of
the structure. Personally, I'd be OK with sacrificing some
performance for 32 bit systems to get better performance on 64 bit
systems, since people who care about performance generally seem to
be on 64 bit builds these days -- but it has to run. Given Tom's
reservations about this approach, I don't know whether Heikki is
interested in fixing the crash so it can be benchmarked. Heikki?

I will do a set of more carefully controlled performance tests on
whatever versions are still in the running after we sort out the
issues above.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-09-29 20:30:32 Re: Updated version of pg_receivexlog
Previous Message Florian Pflug 2011-09-29 18:22:19 Re: Hot Backup with rsync fails at pg_clog if under load