Skip site navigation (1) Skip section navigation (2)

half the query time in an unnecessary(?) sort?

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Subject: half the query time in an unnecessary(?) sort?
Date: 2005-04-25 20:14:25
Message-ID: 426D4FA1.1050000@cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-performance
If I have a freshly CLUSTERed table and queries that want to do a
merge join, it seems to me that quite a bit of time is spent
unnecessarily sorting the already-sorted table. An example such
query I found in my log files is shown below. If I read the
EXPLAIN ANALYZE output correctly, it's saying that roughly half
the time (570-269 = 300 out of 670 ms) was spent sorting the
already sorted data.

=====================
     \d entity_facids;
        Table "public.entity_facids"
       Column   |   Type    | Modifiers
     -----------+-----------+-----------
      entity_id | integer   |
      fac_ids   | integer[] |
     Indexes:
         "entity_facids__entity_id" btree (entity_id)
     fli=# cluster entity_facids__entity_id on entity_facids;
     CLUSTER
     fli=#
     fli=# explain analyze select *  from userfeatures.point_features join entity_facids using (entity_id) where featureid=118;
                                                                              QUERY PLAN
     -------------------------------------------------------------------------------------------------------------------------------------------------------------
      Merge Join  (cost=9299.37..9738.34 rows=1078 width=117) (actual time=536.989..667.648 rows=2204 loops=1)
        Merge Cond: ("outer".entity_id = "inner".entity_id)
        ->  Sort  (cost=37.27..38.45 rows=471 width=85) (actual time=14.289..16.303 rows=2204 loops=1)
              Sort Key: point_features.entity_id
              ->  Index Scan using point_features__featureid on point_features  (cost=0.00..16.36 rows=471 width=85) (actual time=0.030..9.360 rows=2204 loops=1)
                    Index Cond: (featureid = 118)
        ->  Sort  (cost=9262.10..9475.02 rows=85168 width=36) (actual time=518.471..570.038 rows=59112 loops=1)
              Sort Key: entity_facids.entity_id
              ->  Seq Scan on entity_facids  (cost=0.00..2287.68 rows=85168 width=36) (actual time=0.093..268.679 rows=85168 loops=1)
      Total runtime: 693.161 ms
     (10 rows)
     fli=#

====================

I understand that the optimizer can not in general know that
a CLUSTERed table stays CLUSTERed when inserts or updates happen;
but I was wondering if anyone has any clever ideas on how I can
avoid this sort step.


Perhaps in the future, could the table set a bit to remember it
is freshly clustered, and clear that bit the first time any
changes are even attempted in the table?  Or, if not, would
that be possible if Hannu Krosing's read-only-table idea
    http://archives.postgresql.org/pgsql-hackers/2005-04/msg00660.php
happened?

Responses

pgsql-performance by date

Next:From: Andrew DunstanDate: 2005-04-25 20:43:10
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous:From: Josh BerkusDate: 2005-04-25 19:18:26
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group