Re: Memory usage - indexes

From: Tobias Brox <tobixen(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Memory usage - indexes
Date: 2010-09-29 12:09:44
Message-ID: AANLkTikOuM9WZVW-nniziaamULCUhpMT8iG1gaOeMc7F@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 29 September 2010 10:03, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> > Yeah, I think the idea of trying to
have a few smaller indexes for the 'hot'
> customers is a good idea. However I am wondering if just using single column
> indexes and seeing if the bitmap scan/merge of smaller indexes is actually
> more efficient is worth testing - i.e:
>
> acc_trans(trans_type);
> acc_trans(created);
> acc_trans(customer_id);

My gut feeling tells me that it's not a good idea - consider that we
want to pull out 20 rows from a 60M table. If I'm not mistaken, with
bitmapping it's needed to do operations on the whole indexes - 60M
bits is still 7.5 megabytes. Well, I suppose that nowadays it's
relatively fast to bitmap 7.5 Mb of memory, but probably some orders
of magnitude more than the few milliseconds it takes to pick out the
20 rows directly from the specialized index.

Well, why rely on gut feelings - when things can be measured. I
didn't take those figures from the production database server though,
but at least it gives a hint on what to expect.

First, using the three-key index for "select * from acc_trans where
customer_id=? and trans_type=? order by created desc limit 20". I
chose one of the users with most transactions, and I tested with the
most popular transaction type as well as one transaction type where he
has just a handful of transactions. Both took significantly less than
1 ms to run. Then I deleted all indexes and created the three
suggested indexes. Using the popular transaction type, it took 123
ms. Well, that's 500 times as much time, but still acceptable. Here
is the query plan:

=> explain analyze select * from acc_trans where customer_id=67368
and trans_type=8 order by created desc limit 20;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1486.23..1486.28 rows=20 width=200) (actual
time=123.685..123.687 rows=3 loops=1)
-> Sort (cost=1486.23..1486.34 rows=43 width=200) (actual
time=123.684..123.685 rows=3 loops=1)
Sort Key: created
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on acc_trans (cost=1313.90..1485.08
rows=43 width=200) (actual time=121.350..123.669 rows=3 loops=1)
Recheck Cond: ((trans_type = 8) AND (customer_id = 67368))
-> BitmapAnd (cost=1313.90..1313.90 rows=43 width=0)
(actual time=120.342..120.342 rows=0 loops=1)
-> Bitmap Index Scan on
account_transaction_on_type (cost=0.00..256.31 rows=13614 width=0)
(actual time=12.200..12.200 rows=43209 loops=1)
Index Cond: (trans_type = 8)
-> Bitmap Index Scan on
account_transaction_on_user (cost=0.00..1057.31 rows=56947 width=0)
(actual time=104.578..104.578 rows=59133 loops=1)
Index Cond: (users_id = 67368)
Total runtime: 123.752 ms
(12 rows)

With the most popular trans type it chose another plan and it took
more than 3s (totally unacceptable):

=> explain analyze select * from acc_trans where customer_id=67368
and trans_type=6 order by created desc limit 20;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..44537.82 rows=20 width=200) (actual
time=1746.288..3204.029 rows=20 loops=1)
-> Index Scan Backward using account_transaction_on_created on
acc_trans (cost=0.00..55402817.90 rows=24879 width=200) (actual
time=1746.285..3204.021 rows=20 loops=1)
Filter: ((customer_id = 67368) AND (trans_type = 6))
Total runtime: 3204.079 ms
(4 rows)

Although this customer has several tens of thousands of transactions,
dropping the three-key-index and use an index on users_id,created is
clearly a better option than running out of memory:

=> explain analyze select * from acc_trans where customer_id=67368 and
trans_type=8 order by created desc limit 20;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..98524.88 rows=20 width=200) (actual
time=0.669..197.012 rows=3 loops=1)
-> Index Scan Backward using account_transaction_by_user_ts on
acc_trans (cost=0.00..211828.49 rows=43 width=200) (actual
time=0.668..197.006 rows=3 loops=1)
Index Cond: (customer_id = 67368)
Filter: (trans_type = 8)
Total runtime: 197.066 ms
(5 rows)

0.2s sounds acceptable, it's just that this may be just a small part
of building the web page, so it adds up ... and probably (I didn't
check how profitable this customer is) this is probably exactly the
kind of customer we wouldn't want to get annoyed with several seconds
page load time.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Fabrício dos Anjos Silva 2010-09-29 16:31:20 How does PG know if data is in memory?
Previous Message Mark Kirkwood 2010-09-29 08:03:08 Re: Memory usage - indexes