Re: Digesting explain analyze

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Digesting explain analyze
Date: 2010-01-07 06:10:28
Message-ID: 4B457AD4.8090405@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ron Mayer wrote:
>> ...The inner sets are on average 3.000 for
>> both id1 and id2 and a typical limit would be 100, so if I could convince
>> postgresql to not fetch all of them then I would reduce the set retrieved
>> by around 60. The dataset is quite large so the random query is not very
>> likely to be hitting the same part of the dataset again, so there is going
>> to be a fair amount of going to disk.,
>
> If disk seeks are killing you a kinda crazy idea would be to
> duplicate the table - clustering one by (id1) and
> the other one by an index on (id2) and unioning the
> results of each.

That's doubling the disk space needs for the table. Is there any odds
that this would benefit when the intitial table significantly exceeds
available memory by itself?

> Since each of these duplicates of the table will be clustered
> by the column you're querying it on, it should just take one
> seek in each table.
>
> Then your query could be something like
>
> select * from (
> select * from t1 where id1=2067 order by evalue limit 100
> union
> select * from t2 where id2=2067 order by evalue limit 100
> ) as foo order by evalue limit 100;

This is actually what I ended up with as the best performing query, just
still on a single table, because without duplication I can add index and
optimize this one by (id1,evalue) and (id2,evalue). It is still getting
killed quite a lot by disk IO. So I guess I'm up to:

1) By better disk (I need to get an estimate how large it actually is
going to get).
2) Stick with one table, but make sure to have enough activity to get a
large part of the index in the OS-cache anyway. (and add more memory if
nessesary).

The data is seeing a fair amount of growth (doubles in a couple of years
) so it is fairly hard to maintain clustering on them .. I would suspect.

Is it possible to get PG to tell me, how many rows that fits in a
disk-page. All columns are sitting in "plain" storage according to \d+
on the table.

> Hmm.. and I wonder if putting evalue into the criteria to cluster
> the tables too (i.e. cluster on id1,evalue) if you could make it
> so the limit finds the right 100 evalues first for each table....

I didnt cluster it, since clustering "locks everything".

--
Jesper

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-01-07 06:39:31 Re: Digesting explain analyze
Previous Message Radhika S 2010-01-07 04:21:47 Joining on text field VS int