Re: which is better: using OR clauses or UNION?

From: "Igor Neyman" <ineyman(at)perceptron(dot)com>
To: "adam_pgsql" <adam_pgsql(at)witneyweb(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-sql" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: which is better: using OR clauses or UNION?
Date: 2011-08-17 20:26:59
Message-ID: F4C27E77F7A33E4CA98C19A9DC6722A207F18A52@EXCHANGE.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> -----Original Message-----
> From: adam_pgsql [mailto:adam_pgsql(at)witneyweb(dot)org]
> Sent: Tuesday, August 16, 2011 11:56 AM
> To: Tom Lane
> Cc: pgsql-sql
> Subject: Re: which is better: using OR clauses or UNION?
>
>
> On 16 Aug 2011, at 15:09, Tom Lane wrote:
>
> > adam_pgsql <adam_pgsql(at)witneyweb(dot)org> writes:
> >> I have a query hitting a table of 25 million rows. The table has a
> >> text field ('identifier') which i need to query for matching rows.
> The
> >> question is if i have multiple strings to match against this field
I
> >> can use multiple OR sub-statements or multiple statements in a
> >> UNION. The UNION seems to run quicker.... is this to be expected?
> >
> > Your test cases don't seem exactly comparable; in particular I think
> the
> > second one is benefiting from the first one having already read and
> > cached the relevant disk blocks. Notice how you've got, eg,
> >
> >> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347
> rows=318 loops=1)
> >> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> >
> > versus
> >
> >> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.178..0.178 rows=318 loops=1)
> >> Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
> >
> > Those are the exact same subplan, so any honest comparison should be
> > finding them to take the same amount of time. When the actual
> readings
> > are different by a factor of several hundred, there's something
wrong
> > with your measurement process.
> >
> > In the end this comes down to whether duplicates will be eliminated
> more
> > efficiently by a BitmapOr step or by sort/uniq on the resulting
rows.
> > I'd have to bet on the BitmapOr myself, but it's likely that this is
> > down in the noise compared to the actual disk accesses in any
> > not-fully-cached scenario. Also, if you don't expect the sub-
> statements
> > to yield any duplicates, or don't care about seeing the same row
> twice
> > in the output, you should consider UNION ALL instead of UNION.
>
>
> Thanks guys, I'll give some of those options a try and see which ones
> improve performance
>
> (Tom, yes i ran those queries after each other so there was caching
> going on. However, I had noticed a difference in performance when
> spacing the queries before and after a few other big queries to help
> clear the cache).
>
> adam

Adam,

Did you verify that your cache is "cleared"? Like using pg_buffercache
contrib. module?
Besides, there is also OS cache...

Regards,
Igor Neyman

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Kidd, David M 2011-08-18 14:53:18 Cursor names in a self-nested function
Previous Message Jaime Casanova 2011-08-16 23:35:42 Re: [SQL] parsing audit table