Quick Links

Re: improve 'where not exists' query..

From:	Noel Whelan <noel(dot)whelan(at)gmail(dot)com>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	List <pgsql-general(at)postgresql(dot)org>
Subject:	Re: improve 'where not exists' query..
Date:	2005-10-27 16:18:56
Message-ID:	1eb3081c0510270918p14ec35d9oec8e6e7c344663a4@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I've created an index on contacts.cempid (I'd not even checked whether one
existed); and the query is certainly improved. I'm wondering..I get the
impression that the hash index is ideal in this case, technically, because I
only intend to query with '='; but the btree index is a bit faster. I'll
need to look further into that..

Thank you for the input on this.
- Noel

On 10/27/05, Richard Huxton <dev(at)archonet(dot)com> wrote:
>
> Noel Whelan wrote:
> > I executed the following:
> >
> > EXPLAIN ANALYZE SELECT cwit.cempid
> > FROM "cwItems" cwit
> > WHERE (NOT (EXISTS (SELECT con.cempid
> > FROM contacts con
> > WHERE (con.cempid = cwit.cempid))));
> >
> > It comes back with:
> >
> > Seq Scan on "cwItems" cwit (cost=0.00..8929702.11 rows=5132 width=8)
> (actual
> > time=132218.29..148623.27 rows=31 loops=1)
> > Filter: (NOT (subplan))
> > SubPlan
> > -> Seq Scan on contacts con (cost=0.00..870.77 rows=1 width=8) (actual
> time=
> > 11.82..11.82 rows=1 loops=12528)
> > Filter: (cempid = $0)
> > Total runtime: 148623.54 msec
> >
> > I'm not certain it's an issue with the query itself going wrong. I
> consider
> > I'm basically telling it: for each cempid in 'cwItems', check whether or
> not
> > it exists as a cempid in 'contacts', which could be inherently
> inefficient
> > in itself.
>
> Well, as you say it's never going to be possible without checking all
> items. However, you've got two seq-scans there and I'd expect a seq-scan
> and looping over an index (or a hash-based plan perhaps).
>
> Now, looking at the values it's only seeing 31 rows in cwItems and 1 row
> (!) in contacts, so that would explain the scans. However, in that case
> I'd expect it to be much faster than it is.
>
> Hmm - I'd take the following steps:
>
> 1. VACUUM FULL VERBOSE ANALYSE <table> on both those tables and then see
> what happens. Make a note of how many removable/non-removable rows it
> finds. Rerun the explain analyse and see what happens.
>
> 2. Check that you have an index on contact.cempid and that the types of
> cempid match in both tables. Then issue "SET enable_seqscan=true" and
> run the explain analyse again - are things faster?
>
> Let us know what happens, oh and don't forget to cc: the list, you were
> lucky I read this.
>
> > I just wondered if there'd be a way to improve on it or not.
> > Installation is postgres-7.3.4.
>
> Upgrade to the latest 7.3.x version as soon as is convenient - lots of
> bug fixes to be had.
>
> --
> Richard Huxton
> Archonet Ltd
>

In response to

Re: improve 'where not exists' query.. at 2005-10-27 09:10:51 from Richard Huxton

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andrus	2005-10-27 16:30:07	Re: alt+F not working after calling pg_dump
Previous Message	Bruno Wolff III	2005-10-27 16:16:00	Re: SHA1 authentication