Quick Links

Re: Aggregating tsqueries

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Alexander Hill <alex(at)hill(dot)net(dot)au>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Aggregating tsqueries
Date:	2014-09-18 07:20:41
Message-ID:	541A87C9.4030002@vmware.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 09/17/2014 07:56 AM, Alexander Hill wrote:
> Hello,
>
> I have a table of tree nodes with a tsquery column. To get a subtree's
> tsquery, I need to OR all of its nodes' tsqueries together.
>
> I defined a custom aggregate using tsquery_or:
>
> CREATE AGGREGATE tsquery_or_agg (tsquery)
> (
> sfunc = tsquery_or,
> stype = tsquery
> );
>
> but I've found that
>
> tsquery_or_agg(query)
>
> is about a hundred times slower than this:
>
> ('(' || string_agg(query::text, ')|(') || ')')::tsquery
>
> That works perfectly so I'm happy to continue doing it, but I'm curious to
> know why the difference is so great and if anything can be done about it?

string_agg's state transition function uses a buffer that's expanded as
needed. At every step, the next string is appended to the buffer. Your
custom aggregate is less efficient, because it constructs a new tsquery
object at every step. In every step, a new tsquery object is allocated
and the old result and the next source tsquery are copied to it. That's
much more expensive.

If you're not shy of writing C code, you could write a more efficient
version of tsquery_or_agg too, using a similar technique.

- Heikki

In response to

Aggregating tsqueries at 2014-09-17 04:56:31 from Alexander Hill

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mkrtchyan, Tigran	2014-09-18 09:58:21	postgres 9.3 vs. 9.4
Previous Message	Josh Berkus	2014-09-18 00:11:33	Yet another abort-early plan disaster on 9.3