Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache

From: Mark Liberman <mliberman(at)mixedsignals(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Date: 2006-01-06 02:15:36
Message-ID: 200601051815.36181.mliberman@mixedsignals.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
> "Mark Liberman" <mliberman(at)mixedsignals(dot)com> wrote
>
> > First run, after a night of inactivity:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=313.468..313.468 rows=11082
> > loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 313.643 ms
> >
> > Second run, after that:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=2.106..2.106 rows=11082 loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 2.276 ms
>
> It is clear that the first query takes longer time because of the IO time
> of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
> currently there is no easy solution for this situation, unless you could
> predicate which part of relation/index your query will use, then you can
> preload or "warm-up" cache for it.
>
> Regards,
> Qingqing

Thanks Qingqing,

this actually helped me determine that the compound index,
1min_events_file_id_begin_idx, is not the proper index to use as it is based
on file_id and begin_time - the later of which is not involved in the where
clause. It is only using that index to "filter" out the listed file_id.

Now, my follow-up question / assumption. I am assuming that the IO time is
so long on that index because it has to read the entire index (for that
file_id) into memory (because it cannot just scan the rows with a certain
date range because we are not using begin_time in the where clause).

But, if I replaced that compound index with the proper compound index of
file_id / end_time, it would give similar performance results to the scan on
1min_events_end_idx (which was < 1 ms). E.g. the latest rows that were
updated are more likely to be in the cache - and it is smart enough to only
read the index rows that it needs.

Alternatively, I could create a single index on file_id (and rely upon the new
bitmap scan capabilities in 1.2). But, I fear that, although this will be
smaller than the erroneous compound index on file_id / begin_time, it will
still display the same behavior in that it will need to read all rows from
that index for the appropriate file_id - and since the data goes back every
minute for 60 days, that IO might be large.

Obviously, I will be testing this - but it might take a few days, as I haven't
figure out how to simulate the "period of inactivity" to get the data flushed
out of the cache ... so I have to run this each morning. But, any
confirmation / corrections to my assumptions are greatly appreciated. E.g. is
the compound index the way to go, or the solo index on file_id?

Thanks,

Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Lang 2006-01-06 02:50:22 Re: Help in avoiding a query 'Warm-Up' period/shared buffer
Previous Message Jim C. Nasby 2006-01-06 01:08:22 Re: improving write performance for logging