Re: Index usage with slow query

From: Rebecca Clarke <r(dot)clarke83(at)gmail(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Index usage with slow query
Date: 2014-07-25 16:20:57
Message-ID: CAMChtdctzbRWbrPNdjY0916moXPRdW1_FaOmZWW5YyXZn8Vvfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Bill,

Thanks for the reply. Here's the EXPLAIN output of a couple of the queries:

SELECT the_geom,oid from mytable
where the_geom && ST_GeomFromText('POLYGON((529342.334095833
180696.221733333,529342.334095833 181533.44595,530964.336820833
181533.44595,530964.336820833 180696.221733333,529342.334095833
180696.221733333))',find_srid('','mytable','the_geom'))
and (floor = 'gf' AND source_id = '16701' AND class = 'General')

Bitmap Heap Scan on mytable (cost=1212.62..1580.71 rows=177 width=612)
Recheck Cond: ((source_id = 16701) AND (the_geom &&
'0103000020346C00000100000005000000'::geometry))
Filter: (((floor)::text = 'gf'::text) AND ((class)::text =
'General'::text))
-> BitmapAnd (cost=1212.62..1212.62 rows=184 width=0)
-> Bitmap Index Scan on idx_source_id (cost=0.00..433.25
rows=33149 width=0)
Index Cond: (source_id = 16701)
-> Bitmap Index Scan on idx_the_geom_gist (cost=0.00..779.03
rows=38654 width=0)
Index Cond: (the_geom &&
'0103000020346C00000100000005000000'::geometry)

SELECT the_geom,oid from mytable
where the_geom && ST_GeomFromText('POLYGON((415995.148624997
433101.445479165,415995.148624997 433326.320145835,416466.572625003
433326.320145835,416466.572625003 433101.445479165,415995.148624997
433101.445479165))',find_srid('','mytable','the_geom'))
and (floor = 'gf' AND source_id = '20427' AND class = 'General')

Bitmap Heap Scan on mytable (cost=21.41..23.44 rows=1 width=612)
Recheck Cond: ((the_geom &&
'0103000020346C00000100000005000000'::geometry) AND (source_id = 20427))
Filter: (((floor)::text = 'gf'::text) AND ((class)::text =
'General'::text))
-> BitmapAnd (cost=21.41..21.41 rows=1 width=0)
-> Bitmap Index Scan on idx_the_geom_gist (cost=0.00..4.18
rows=141 width=0)
Index Cond: (the_geom &&
'0103000020346C00000100000005000000'::geometry)
-> Bitmap Index Scan on idx_source_id (cost=0.00..16.97
rows=1112 width=0)
Index Cond: (source_id = 20427)

There is no unique field in the table. And unfortunately the performance is
unacceptable. The logs show that the first query when it was execute took
70466.757 ms where as the second one took 11032.459 ms.

I've begun to create a duplicate environment to play with so hopefully I'm
able to weed out a solution.

On Wed, Jul 23, 2014 at 3:57 PM, Bill Moran <wmoran(at)potentialtech(dot)com>
wrote:

> On Wed, 23 Jul 2014 10:45:56 +0100
> Rebecca Clarke <r(dot)clarke83(at)gmail(dot)com> wrote:
>
> > Hi all,
> >
> > Looking for some advice regarding a slow query I have and indexing.
> >
> > I'm using postgresql 9.1 and this is my table that has around 6800000
> rows:
> >
> > CREATE TABLE mytable
> > (
> > class character varying,
> > floor character varying,
> > source_id integer,
> > the_geom geometry
> > )
> > WITH (
> > OIDS=TRUE
> > );
> >
> >
> > INDEX idx_source_id
> > USING btree
> > (source_id);
> >
> > INDEX idx_the_geom_gist
> > USING gist
> > (the_geom);
> >
> >
> > This table is constantly hit with the below query (not always the same
> > values in the where). The only difference between queries are the values
> in
> > the where clause:
> >
> > SELECT the_geom,oid from mytable
> > WHERE
> > the_geom && ST_GeomFromText('POLYGON((529342.334095833
> > 180696.221733333,529342.334095833 181533.44595,530964.336820833
> > 181533.44595,530964.336820833 180696.221733333,529342.334095833
> > 180696.221733333))',find_srid('','mytable','the_geom'))
> > AND
> > (floor = 'gf' AND source_id = '689' AND class = 'General')
> >
> >
> > As the table has increased in size, this query has become slower, so I
> made
> > this index:
> >
> >
> > INDEX idx_floor_sourceid_class
> > USING btree
> > (floor, source_id, class);
> >
> >
> > When I run an EXPLAIN and EXPLAIN ANALYZE the query isn't using this new
> > index.
> >
> > Sometimes it uses just idx_the_geom_gist
> >
> > other times it uses idx_the_geom_gist and idx_source_id
> >
> >
> > I don't understand why it's inconsistent in its use of indexes when the
> > query is always the same structure, just different where clause values,
> and
> > I don't understand why it's not using the new index either.
>
> It depends on the data. The planner will make estimates on what the
> fastest
> way to execute will be based on a lot of things, one of which is how
> helpful
> an index is really expected to be. Since your indexes aren't arranged to
> allow an index-only scan (although I don't remember if 9.1 had index-only
> scans yet ...) it will have to use the index to narrow down the rows, then
> load up the rows and filter them further (you didn't provide explain
> output,
> but I'll bet a dozen nickels that's what it says). So if the values in
> source_id are unique enough that the planner doesn't think that
> idx_floor_sourceid_class will narrow the results any better than
> idx_source_id, it will use the former because it's a smaller index and will
> require less disk fetches to load it.
>
> Of course, without explain output, I'm assuming a lot. But the basic
> operation
> still stands, indexes aren't always guaranteed to be faster than other
> types of
> access. And depending on the distribution of the data, some indexes might
> be
> faster with some fetches than with others.
>
> The key is not whether it's using the index or not, it's whether it's
> getting the
> fastest plan or not. The first step in ensuring that is to make sure the
> table
> is getting analyzed frequently enough, otherwise the stats that the
> planner uses
> to predict will be off and it will often choose poor plans. The next step
> would
> be to isolate specific instances that you're suspicious of and test to see
> if the
> planner really is getting the best plan. Hopefully you have a test
> database where
> you can copy the data and add/remove indexes at will. That type of easter
> egg
> hunt may not be necessary, though. EXPLAIN ANALYZE can often tell you if
> the plan
> is bad by showing you where estimated times vary wildly from actual times.
>
> Hope this helps, but before you worry too much about it, I'd suggest asking
> yourself 1 question: is the performance at an acceptable level, even if
> you don't
> understand the rational behind the planner's choice? Of course, that may
> not be
> important if you're asking the question just to understand better.
>
> --
> Bill Moran <wmoran(at)potentialtech(dot)com>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vasudevan, Ramya 2014-07-25 16:50:37 Re: event triggers in 9.3.4
Previous Message Sergiy Zuban 2014-07-25 15:51:02 Re: postgres_fdw - push down conditionals for ENUMs