Re: Pattern match against array elements?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Israel Brewster <israel(at)ravnalaska(dot)net>, "pgsql-general(at)postgresql(dot)org general" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Pattern match against array elements?
Date: 2015-10-12 19:50:26
Message-ID: CAMkU=1wi40zSpeVgercthoFDP_ZNxewpAHtd+qkvNUK-UucLYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Oct 12, 2015 at 11:39 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > On Mon, Oct 12, 2015 at 10:58 AM, Israel Brewster <israel(at)ravnalaska(dot)net
> >
> > wrote:
> >> My first thought was to do something like this:
> >>
> >> SELECT * FROM (SELECT lognum,array_agg(flightnum) as flightnums FROM
> logs
> >> GROUP BY lognum) s1 WHERE '8%' like ANY(flightnums);
> >>
> >> But while this doesn't give an error, it also doesn't return any
> results.
> >> I'm guessing that this is because the wildcard is on the left of the
> >> operator, and needs to be on the right.
>
> > Right. The LIKE operator does not have a commutator by default. (And if
> > you created one for it, it could not use an index in this case.)
>
> Well, it couldn't use an index anyway, given that the query as written
> wants to collect groups if *any* member is LIKE '8%', rather than
> restricting the data to such flightnums before aggregation occurs.
>

I was jumping ahead a bit here. I was thinking of the case where the
aggregation query was made a materialized view with a gin index on the
aggregated column. It is not obvious that a "scalar operator
ANY(flightnums)" can't use an index on array_column, but it can't. My
interest was more in the % operator from pg_trgm, but also the normal text
= operator would be nice to use here (as opposed to the much uglier <@ or
@> in which the scalar needs to be wrapped into a degenerate array.)

> Personally I'd suggest building a commutator operator (just need a
> one-liner SQL or plpgsql function as infrastructure) and away you go.
>

Right, something like:

create function like_rev (text, text) returns boolean as $$ select $2 like
$1 $$ language SQL;

create operator ~~~~ (procedure = like_rev, leftarg=text, rightarg=text);

You can explicitly specify the commutator but it doesn't seem to be
necessary to do so:

create operator ~~~~ (procedure = like_rev, leftarg=text,
rightarg=text,commutator = ~~ );

> > I think you're best bet is to do a subquery against the unaggregated
> table.
>
> > select * from aggregated a where exists
> > (select 1 from unaggregated ua where a.lognum=ua.lognum and flightnum
> > like '8%')
>
> That would work too, but not sure about performance relative to the other
> way.
>

In my experience, if the subselect can use an index on the LIKE and if '8%'
is rare, then using the subselect will be vastly better. And if it is
indexable or not rare, it is still likely to be better, or at least not
worse by much. Disaggregating every array for every row to do the ANY is
pretty inefficient. Particularly if you are not using a materialized view,
and so have to first aggregate it.

Of course good enough is good enough, so if scalar ~~~~ ANY(array) is good
enough...

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2015-10-12 20:34:59 Re: Understanding "seq scans"
Previous Message Israel Brewster 2015-10-12 19:07:08 Re: Pattern match against array elements?