Quick Links

Re: Forcing index scan on query produces 16x faster

From:	Dave Crooke <dcrooke(at)gmail(dot)com>
To:	"Eger, Patrick" <peger(at)automotive(dot)com>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Forcing index scan on query produces 16x faster
Date:	2010-03-19 00:08:49
Message-ID:	ca24673e1003181708n146cb150n6b7eceb18530193a@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I've also observed the same behaviour on a very large table (200GB data,
170GB for 2 indexes) ....

I have a table which has 6 small columns, let's call them (a, b, c, d, e, f)
and about 1 billion rows. There is an index on (a, b, c, d) - not my idea,
Hibernate requires primary keys for every table.

If I do the following query:

*select max(c) from tbl where a=[constant literal] and b=[other constant
literal];*

.... then with maxed out analysis histograms, and no changes to any of the
page_cost type stuff, it still deparately wants toi do a full table scan,
which is ... kinda slow.

Of course, a billion row table is also rather suboptimal (our app collects a
lot more data than it used to) and so I'm bypassing Hibernate, and sharding
it all by time, so that the tables and indexes will be a manageable size,
and will also be vacuum-free as my aging out process is now DROP TABLE :-)

Cheers
Dave

On Wed, Mar 17, 2010 at 8:01 PM, Eger, Patrick <peger(at)automotive(dot)com> wrote:

> I'm running 8.4.2 and have noticed a similar heavy preference for
> sequential scans and hash joins over index scans and nested loops. Our
> database is can basically fit in cache 100% so this may not be
> applicable to your situation, but the following params seemed to help
> us:
>
> seq_page_cost = 1.0
> random_page_cost = 1.01
> cpu_tuple_cost = 0.0001
> cpu_index_tuple_cost = 0.00005
> cpu_operator_cost = 0.000025
> effective_cache_size = 1000MB
> shared_buffers = 1000MB
>
>
> Might I suggest the Postgres developers reconsider these defaults for
> 9.0 release, or perhaps provide a few sets of tuning params for
> different workloads in the default install/docs? The cpu_*_cost in
> particular seem to be way off afaict. I may be dead wrong though, fwiw
> =)
>
> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Christian
> Brink
> Sent: Wednesday, March 17, 2010 2:26 PM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Forcing index scan on query produces 16x faster
>
> I am running into a problem with a particular query. The execution plan
> cost shows that the Seq Scan is a better bet (cost=54020.49..54020.55)
> over the forced index 'enable_seqscan = false'
> (cost=1589703.87..1589703.93). But when I run the query both ways I get
> a vastly different result (below). It appears not to want to bracket the
>
> salesitems off of the 'id' foreign_key unless I force it.
>
> Is there a way to rewrite or hint the planner to get me the better plan
> without resorting to 'enable_seqscan' manipulation (or am I missing
> something)?
>
> postream=> select version();
> version
> ------------------------------------------------------------------------
> -------------------------------------------------
> PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)
>
>
> postream=> SET enable_seqscan = false;
> SET
> postream=> EXPLAIN ANALYZE
> postream-> SELECT si.group1_id as name, sum(si.qty) as count,
> sum(si.amt) as amt
> postream-> FROM salesitems si, sales s, sysstrings
> postream-> WHERE si.id = s.id
> postream-> AND si.group1_id != ''
> postream-> AND si.group1_id IS NOT NULL
> postream-> AND NOT si.void
> postream-> AND NOT s.void
> postream-> AND NOT s.suspended
> postream-> AND s.tranzdate >= (cast('2010-02-15' as date) +
> cast(sysstrings.data as time))
> postream-> AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
> cast(sysstrings.data as time))
> postream-> AND sysstrings.id='net/Console/Employee/Day End Time'
> postream-> GROUP BY name;
>
> QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ----------------------------------------------------
> HashAggregate (cost=1589703.87..1589703.93 rows=13 width=35) (actual
> time=33.414..33.442 rows=12 loops=1)
> -> Nested Loop (cost=0.01..1588978.22 rows=96753 width=35) (actual
>
> time=0.284..22.115 rows=894 loops=1)
> -> Nested Loop (cost=0.01..2394.31 rows=22530 width=4)
> (actual time=0.207..4.671 rows=225 loops=1)
> -> Index Scan using sysstrings_pkey on sysstrings
> (cost=0.00..5.78 rows=1 width=175) (actual time=0.073..0.078 rows=1
> loops=1)
> Index Cond: (id = 'net/Console/Employee/Day End
> Time'::text)
> -> Index Scan using sales_tranzdate_index on sales s
> (cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464
> rows=225 loops=1)
> Index Cond: ((s.tranzdate >= ('2010-02-15'::date +
>
> ("outer".data)::time without time zone)) AND (s.tranzdate <
> ('2010-02-16'::date + ("outer".data)::time without time zone)))
> Filter: ((NOT void) AND (NOT suspended))
> -> Index Scan using salesitems_pkey on salesitems si
> (cost=0.00..70.05 rows=30 width=39) (actual time=0.026..0.052 rows=4
> loops=225)
> Index Cond: (si.id = "outer".id)
> Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
> NULL) AND (NOT void))
> Total runtime: 33.734 ms
> (12 rows)
>
> postream=> SET enable_seqscan = true;
> SET
> postream=> EXPLAIN ANALYZE
> postream-> SELECT si.group1_id as name, sum(si.qty) as count,
> sum(si.amt) as amt
> postream-> FROM salesitems si, sales s, sysstrings
> postream-> WHERE si.id = s.id
> postream-> AND si.group1_id != ''
> postream-> AND si.group1_id IS NOT NULL
> postream-> AND NOT si.void
> postream-> AND NOT s.void
> postream-> AND NOT s.suspended
> postream-> AND s.tranzdate >= (cast('2010-02-15' as date) +
> cast(sysstrings.data as time))
> postream-> AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
> cast(sysstrings.data as time))
> postream-> AND sysstrings.id='net/Console/Employee/Day End Time'
> postream-> GROUP BY name;
>
> QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ----------------------------------------------------------
> HashAggregate (cost=54020.49..54020.55 rows=13 width=35) (actual
> time=5564.929..5564.957 rows=12 loops=1)
> -> Hash Join (cost=2539.63..53294.84 rows=96753 width=35) (actual
> time=5502.324..5556.262 rows=894 loops=1)
> Hash Cond: ("outer".id = "inner".id)
> -> Seq Scan on salesitems si (cost=0.00..30576.60
> rows=885215 width=39) (actual time=0.089..3099.453 rows=901249 loops=1)
> Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
> NULL) AND (NOT void))
> -> Hash (cost=2394.31..2394.31 rows=22530 width=4) (actual
> time=3.329..3.329 rows=0 loops=1)
> -> Nested Loop (cost=0.01..2394.31 rows=22530 width=4)
>
> (actual time=0.217..2.749 rows=225 loops=1)
> -> Index Scan using sysstrings_pkey on
> sysstrings (cost=0.00..5.78 rows=1 width=175) (actual time=0.077..0.085
>
> rows=1 loops=1)
> Index Cond: (id = 'net/Console/Employee/Day
> End Time'::text)
> -> Index Scan using sales_tranzdate_index on
> sales s (cost=0.01..1825.27 rows=22530 width=12) (actual
> time=0.074..1.945 rows=225 loops=1)
> Index Cond: ((s.tranzdate >=
> ('2010-02-15'::date + ("outer".data)::time without time zone)) AND
> (s.tranzdate < ('2010-02-16'::date + ("outer".data)::time without time
> zone)))
> Filter: ((NOT void) AND (NOT suspended))
> Total runtime: 5565.262 ms
> (13 rows)
>
>
> --
> Christian Brink
>
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: Forcing index scan on query produces 16x faster at 2010-03-18 01:01:16 from Eger, Patrick

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ivan Voras	2010-03-19 00:32:11	Re: mysql to postgresql, performance questions
Previous Message	Justin Pitts	2010-03-18 21:45:53	Re: Building multiple indexes concurrently