Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

From: Timothy Garnett <tgarnett(at)panjiva(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3
Date: 2011-03-15 18:39:34
Message-ID: AANLkTikOJ4xqbQD+o28YPTmAD0hRNsuKW7NwCQv5CmWp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Forgot to include our non-default config settings and server info, not that
it probably makes a difference for this.

from pg_settings:
name | current_setting
version | PostgreSQL 9.0.3 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-46), 64-bit
bytea_output | escape
checkpoint_completion_target | 0.9
checkpoint_segments | 24
effective_cache_size | 24GB
effective_io_concurrency | 2
lc_collate | en_US.utf8
lc_ctype | en_US.utf8
listen_addresses | *
log_checkpoints | on
log_connections | on
log_disconnections | on
log_hostname | on
log_line_prefix | %t
logging_collector | on
maintenance_work_mem | 256MB
max_connections | 120
max_stack_depth | 2MB
port | 5432
server_encoding | UTF8
shared_buffers | 4GB
synchronous_commit | off
tcp_keepalives_idle | 180
TimeZone | US/Eastern
track_activity_query_size | 8192
wal_buffers | 16MB
wal_writer_delay | 330ms
work_mem | 512MB

This is a dual dual-core 64bit intel machine (hyperthreaded so 8 logical
cpus) with 24GB of memory running basically just the db against a raid 5
diskarray.

Tim

On Tue, Mar 15, 2011 at 2:23 PM, Timothy Garnett <tgarnett(at)panjiva(dot)com>wrote:

> Hi all,
>
> We added an index to a table (to support some different functionality) then
> ran into cases where the new index (on month, bl_number in the schema below)
> made performance of some existing queries ~20,000 times worse. While we do
> have a workaround (using a CTE to force the proper index to be used) that
> gets us down to ~twice the original performance (i.e. without the new
> index), I'm wondering if there's a better workaround that can get us closer
> to the original performance. It also seems like kind of a strange case so
> I'm wondering if there's something weird going on in the optimizer. The #
> of rows estimates are pretty accurate so it's guessing that about right, but
> the planner seems to be putting way too much weight on using a sorted index
> vs. looking up. This is all after an analyze.
>
> Near as I can guess the planner seems to be weighting scanning what should
> be an expected 100k rows (though in practice it will have to do about 35
> million, because the assumption of independence between columns is
> incorrect) given an expected selectivity of 48K rows out of 45 million over
> scanning ~48k rows (using the index) and doing a top-n 100 sort on them
> (actual row count is 43k so pretty close on that). Even giving the
> optimizer the benefit of column independence I don't see how that first plan
> could possibly come out ahead. It would really help if explain would print
> out the number of rows it expects to scan and analyze would print out the
> number of rows it actually scanned (instead of just the number that matched
> the filter/limit), see the expensive query explain analyze output below.
>
> At the bottom I have some info on the contents and probability.
>
>
> ## The original Query:
>
> explain analyze
> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
> NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
> ORDER BY month DESC LIMIT 100 OFFSET 0;
> -----------------------------------
> Limit (cost=184626.64..184626.89 rows=100 width=908) (actual
> time=102.630..102.777 rows=100 loops=1)
> -> Sort (cost=184626.64..184748.19 rows=48623 width=908) (actual
> time=102.628..102.683 rows=100 loops=1)
> Sort Key: month
> Sort Method: top-N heapsort Memory: 132kB
> -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30
> rows=48623 width=908) (actual time=5.809..44.832 rows=43352 loops=1)
> Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
> -> Bitmap Index Scan on index_customs_records_on_buyer_id
> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.588..4.588 rows=43352
> loops=1)
> Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id =
> ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
> Total runtime: 102.919 ms
>
>
> ## Same query after adding the new index
> ### NOTE - it would be very useful here if explain would print out the
> number of rows it expects to scan in the index and analyze dumped out the
> number of rows actually scanned. Instead analyze is printing the rows
> actually outputed and explain appears to be outputting the number of rows
> expected to match the filter ignoring the limit... (it exactly matches the
> row count in the query above)
> ##
>
> explain analyze
> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
> NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
> ORDER BY month DESC LIMIT 100 OFFSET 0;
> --------------------------------------------
> Limit (cost=0.00..161295.58 rows=100 width=908) (actual
> time=171344.185..3858893.743 rows=100 loops=1)
> -> Index Scan Backward using
> index_customs_records_on_month_and_bl_number on customs_records
> (cost=0.00..78426750.74 rows=48623 width=908) (actual
> time=171344.182..3858893.588 rows=100 loops=1)
> Filter: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
> Total runtime: 3858893.908 ms
>
>
> ############################################################
> My workaround is to use a CTE query to force the planner to not use the
> month index for sorting (using a subselect is not enough since the planner
> is too smart for that). However this is still twice as slow as the original
> query...
> ############################################################
>
> explain analyze
> with foo as (select customs_records.* FROM "customs_records" WHERE
> (((buyer_id IS NOT NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))))
> select * from foo order by month desc limit 100 ;
> -----------------------------------------------------------
> Limit (cost=185599.10..185599.35 rows=100 width=5325) (actual
> time=196.968..197.105 rows=100 loops=1)
> CTE foo
> -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30
> rows=48623 width=908) (actual time=5.765..43.489 rows=43352 loops=1)
> Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
> -> Bitmap Index Scan on index_customs_records_on_buyer_id
> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.544..4.544 rows=43352
> loops=1)
> Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
> -> Sort (cost=2830.80..2952.35 rows=48623 width=5325) (actual
> time=196.966..197.029 rows=100 loops=1)
> Sort Key: foo.month
> Sort Method: top-N heapsort Memory: 132kB
> -> CTE Scan on foo (cost=0.00..972.46 rows=48623 width=5325)
> (actual time=5.770..153.322 rows=43352 loops=1)
> Total runtime: 207.282 ms
>
>
> #### Table information
>
> Table information - the schema is the table below (with some columns
> removed for succinctness). There are ~45 million rows, the rows are also
> fairly wide about 80 columns total. buyer_id is null ~30% of the time (as is
> supplier_id). A given buyer id maps to between 1 and ~100,000 records (in a
> decreasing distribution, about 1 million unique buyer id values).
> Supplier_id is similar. Note buyer_id and month columns are not always
> independent (for some buyer_ids there is a strong correlation as in this
> case where the buyer_ids are associated with only older months, though for
> others there isn't), though even so I'm still not clear on why it would pick
> the plan that it does. We can consider these table never updated or inserted
> into (updates are done in a new db offline that is periodically swapped in).
>
> Table "public.customs_records"
> Column | Type
> | Modifiers
>
> --------------------------+------------------------+--------------------------------------------------------------
> id | integer | not null default
> nextval('customs_records_id_seq'::regclass)
> ....
> bl_number | character varying(16) |
> ....
> month | date |
> ....
> buyer_id | integer |
> ...
> supplier_id | integer |
> ...
> Indexes:
> "customs_records_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
> "index_customs_records_on_month_and_bl_number" UNIQUE, btree (month,
> bl_number) WITH (fillfactor=100)
> "index_customs_records_on_buyer_id" btree (buyer_id) WITH
> (fillfactor=100) WHERE buyer_id IS NOT NULL
> "index_customs_records_on_supplier_id_and_buyer_id" btree (supplier_id,
> buyer_id) WITH (fillfactor=100) CLUSTER
>
>
> db version =>
> PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
> (enterprise db build)
> ubuntu 8.04 LTS is the host
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Timothy Garnett 2011-03-15 18:42:23 Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3
Previous Message Timothy Garnett 2011-03-15 18:23:17 Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3