Quick Links

Re: slow joins?

From:	Joe Van Dyk <joe(at)tanga(dot)com>
To:	Greg Williamson <gwilliamson39(at)yahoo(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: slow joins?
Date:	2013-04-06 02:56:02
Message-ID:	CACfv+pLTFEau=z5SVc1=93KxvjHux3UD2UZK2Cpj3AP8mYvhQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39(at)yahoo(dot)com>wrote:

> Joe --
>
> >________________________________
> > From: Joe Van Dyk <joe(at)tanga(dot)com>
> >To: pgsql-performance(at)postgresql(dot)org
> >Sent: Friday, April 5, 2013 6:42 PM
> >Subject: Re: [PERFORM] slow joins?
> >
> >
> >(
> https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txtshows a non-wrapped version of the queries and plan)
> >
> >
> >
> >
> >On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe(at)tanga(dot)com> wrote:
> >
> >On 9.2.4, running two identical queries except for the value of a column
> in the WHERE clause. Postgres is picking very different query plans, the
> first is much slower than the second.
> >>
> >>
> >>Any ideas on how I can speed this up? I have btree indexes for all the
> columns used in the query.
> >>
> >>explain analyze
>
> >>SELECT COUNT(*)
>
> >>FROM purchased_items pi
>
> >>inner join line_items li on li.id = pi.line_item_id
>
> >>inner join products on products.id = li.product_id
>
> >>WHERE products.drop_shipper_id = 221;
> >>
> >> Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
> time=2425.225..2425.225 rows=1 loops=1)
> >> -> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual
> time=726.612..2424.206 rows=8413 loops=1)
> >> Hash Cond: (pi.line_item_id = li.id)
> >> -> Seq Scan on purchased_items pi (cost=0.00..60912.39
> rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
> >> -> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual
> time=726.231..726.231 rows=8178 loops=1)
> >> Buckets: 4096 Batches: 4 Memory Usage: 73kB
> >> -> Hash Join (cost=1684.33..77937.19 rows=56499
> width=4) (actual time=1.270..723.222 rows=8178 loops=1)
> >> Hash Cond: (li.product_id = products.id)
> >> -> Seq Scan on line_items li (cost=0.00..65617.18
> rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
> >> -> Hash (cost=1676.60..1676.60 rows=618 width=4)
> (actual time=0.835..0.835 rows=618 loops=1)
> >> Buckets: 1024 Batches: 1 Memory Usage: 22kB
> >> -> Bitmap Heap Scan on products
> (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618
> loops=1)
> >> Recheck Cond: (drop_shipper_id = 221)
> >> -> Bitmap Index Scan on
> index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0)
> (actual time=0.125..0.125 rows=618 loops=1)
> >> Index Cond: (drop_shipper_id =
> 221)
> >> Total runtime: 2425.302 ms
> >>
> >>
> >>explain analyze
>
> >>SELECT COUNT(*)
>
> >>FROM purchased_items pi
>
> >>inner join line_items li on li.id = pi.line_item_id
>
> >>inner join products on products.id = li.product_id
>
> >>WHERE products.drop_shipper_id = 2;
>
> >>
>
> >>
> >> Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
> time=0.906..0.906 rows=1 loops=1)
> >> -> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual
> time=0.029..0.877 rows=172 loops=1)
> >> -> Nested Loop (cost=0.00..16011.70 rows=1737 width=4)
> (actual time=0.021..0.383 rows=167 loops=1)
> >> -> Index Scan using index_products_on_drop_shipper_id on
> products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074
> rows=70 loops=1)
> >> Index Cond: (drop_shipper_id = 2)
> >> -> Index Scan using index_line_items_on_product_id on
> line_items li (cost=0.00..835.70 rows=279 width=8) (actual
> time=0.002..0.004 rows=2 loops=70)
> >> Index Cond: (product_id = products.id)
> >> -> Index Only Scan using purchased_items_line_item_id_idx on
> purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
> time=0.002..0.003 rows=1 loops=167)
> >> Index Cond: (line_item_id = li.id)
> >> Heap Fetches: 5
> >> Total runtime: 0.955 ms
> >>(11 rows)
> >>
> >
>
>
> Does drop_shipper+id have a much larger number of rows which is making the
> scanner want to avoid an indexed scan or otherwise prefer a sequential scan
> on products and on line_items ?
>

Assuming you mean products.drop_shipper_id? There are more rows matched for
the first one vs the second one.
70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.

> What are the stats settings for these tables ?
>

Whatever the defaults are.

>
> HTH,
>
> Greg WIlliamson
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: slow joins? at 2013-04-06 01:54:10 from Greg Williamson

Responses

Re: slow joins? at 2013-04-06 03:32:56 from Greg Williamson

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Williamson	2013-04-06 03:32:56	Re: slow joins?
Previous Message	Greg Williamson	2013-04-06 01:54:10	Re: slow joins?