Re: Index scan prefetch?

From: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index scan prefetch?
Date: 2018-03-27 08:31:05
Message-ID: 49BFF604-37DF-4F84-9AA6-BCE30293175E@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote:

> On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
>> Hi, hackers.
>>
>> I was faced with the following bad performance use case with Postgres: there
>> is a huge append-only table with serial key (ID)
>> which is permanently appended using multithreaded pgloader.
>
> I think this could be similar to the problem I reported here:
> https://www.postgresql.org/message-id/flat/20160524173914(dot)GA11880%40telsasoft(dot)com#20160524173914(dot)GA11880(at)telsasoft(dot)com
>
> The analysis at the time was that, due to "repeated keys", a btree index on the
> timestamp column had non-consecutive heap TIDs (btree insertion uses random
> early insertion point to avoid superlinear lookup cost during such insertions).
>
> But, our case also involved multiple child processes simultaneously inserting
> into same table, and I wonder if "repeated keys" were more or less unrelated to
> the problem. The behavior is maybe caused easily by simultaneous insertions
> "clustered" around the same target: for us: now(), for you: nextval().
>
>> But now effective_io_concurrency parameter is applicable only for bitmap
> ...
>> Will it be useful to support it also for index scan?
>> Or there are some other ways to address this problem?
>
> Does your case perform well with bitmap heap scan (I mean bitmap scan of the
> single index)? It seems to me that prefetch wouldn't help, as it would just
> incur the same random cost you're already seeing; the solution may be to choose
> another plan(bitmap) with sequential access to enable read-ahead,
>
> Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
> its own readahead, negatively affecting some queries:
> https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail(dot)com#20180207054227(dot)GE17521(at)telsasoft(dot)com
>
> What's the pg_stats "correlation" for the table column with index being
> scanned? How many tuples? Would you send explain(analyze,buffers) for the
> problem query, and with SET enable_bitmapscan=off; ?

Thank you very much for your feedback.

Setting random_page_cost to 10 forces Postgres to use bitmap scan:

paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------------—
Sort (cost=269989441.71..270965997.99 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Index Scan using "People_pkey" on people_raw (cost=0.58..88742525.10 rows=390622510 width=351)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(4 rows)

paradoxdb=# set random_page_cost to 10;
SET
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------—
Sort (cost=397271527.57..398248083.84 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Bitmap Heap Scan on people_raw (cost=14905651.30..108450832.96 rows=390622510 width=351)
Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
-> Bitmap Index Scan on "People_pkey" (cost=0.00..14807995.68 rows=390622510 width=0)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(6 rows)

Correlation of "ID" column is 0.978884.

Sorry, it is customer's system and I have limited access to it.
Next load of data is expected during this week, so we will check how bitmap scan can improve performance.
According to IO traffic, I expect up to 100 times improvement.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-03-27 08:45:32 Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
Previous Message Simon Riggs 2018-03-27 08:24:49 Re: [HACKERS] MERGE SQL Statement for PG11