Re: query performance

From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "pepone(dot) onrez" <pepone(dot)onrez(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: query performance
Date: 2008-01-14 05:53:08
Message-ID: 33c6269f0801132153s7327fd6ft630177a50dea84d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

If you combine it with Tom Lane's suggestion - it will go even better,
something like:

select * from t_documentcontent where _id in (select _id from overview where
_id>x order by _id limit 50);

Alex

On Jan 13, 2008 11:59 PM, pepone. onrez <pepone(dot)onrez(at)gmail(dot)com> wrote:

>
> Thanks Alex
>
> I test your solution and is realy more faster.
>
> Nested Loop (cost=1743.31..2044.58 rows=50 width=908) (actual time=
> 101.695..106.178 rows=50 loops=1)
> -> HashAggregate (cost=1743.31..1743.31 rows=50 width=108) (actual
> time=101.509..101.567 rows=50 loops=1)
> -> Subquery Scan "IN_subquery" (cost=1741.60..1743.19 rows=50
> width=108) (actual time=101.327..101.456 rows=50 loops=1)
> -> Limit (cost=1741.60..1742.69 rows=50 width=108) (actual
> time=101.313..101.383 rows=50 loops=1)
> -> Seq Scan on overview (cost=0.00..3283.07rows=150807 width=108) (actual time=
> 0.036..72.249 rows=80050 loops=1)
> -> Index Scan using i_documentcontent_id on t_documentcontent (cost=
> 0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
> Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
> Total runtime: 106.323 ms
>
> I now need to see what trigers i need to add, and test the insertions.
> Thanks again
>
>
> On Jan 14, 2008 5:54 AM, Alex Turner <armtuk(at)gmail(dot)com> wrote:
>
> > Here is a table I threw together to demonstrate the approximate speed of
> > a materialized view in this case:
> >
> > trend=# explain analyze select property_id from overview order by
> > property_id limit 50 offset 50000;
> > QUERY
> > PLAN
> >
> > ----------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=19112.75..19112.88 rows=50 width=8) (actual time=
> > 446.048..446.125 rows=50 loops=1)
> > -> Sort (cost=18987.75..19400.49 rows=165094 width=8) (actual time=
> > 384.788..424.433 rows=50050 loops=1)
> > Sort Key: property_id
> > -> Seq Scan on overview (cost=0.00..2501.94 rows=165094
> > width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
> > Total runtime: 447.578 ms
> > (5 rows)
> >
> > trend=# select count(*) from overview;
> > count
> > --------
> > 173409
> > (1 row)
> >
> > trend=#
> >
> > It's not great - but it's better than 47 seconds (The machine I'm
> > running it on is far from big iron, so these results should be fairly
> > typical for any modern x86 box - also this materialized view is almost
> > certainly in RAM, and therefore IO speed is irrelevant).
> >
> > Tom lane has already suggested another approach, whereby you order your
> > results, then select the next 10 from the set where the id is greater than
> > the greatest of the last one:
> >
> > select id from overview order by id limit 50;
> >
> > x = get row['id'] // for row 1
> > do something
> > x=get row['id'] // for row 2
> > do something
> > ...
> > x=get row['id'] // for row 50
> >
> > select id from overview where id>x order by id limit 50.
> >
> > The order by is relevant and infact imperative, because you must order
> > your results somehow, otherwise your pagination will produce different
> > results each time you try it as database updates will affect the order the
> > rows come back by default without an order by clause.
> >
> > Let me say that again to be clear: The order rows come back if you
> > don't specify an order by can change! so pulling rows without an order by
> > is a REALLY bad idea. This will break your pagination if a database update
> > happens between someone viewing a page and hitting next to view the next
> > page.
> >
> > Alex
> >
> >
> > On Jan 13, 2008 11:43 PM, Alex Turner <armtuk(at)gmail(dot)com > wrote:
> >
> > > If you have to access the data this way (with no where clause at all -
> > > which sometimes you do) then I have already provided a solution that will
> > > work reasonably well. If you create what is essentially a materialized view
> > > of just the id field, the sequence scan will return much fewer pages than
> > > when you do it on the main table. Then you join it to the indexed main
> > > table, and page in just the rows you need. Voila - much faster result. Of
> > > course we haven't really talked about how that will affect insert speed and
> > > delete speed if you trigger then up, but you haven't really talked about any
> > > requirements there.
> > >
> > > Alex
> > >
> > >
> > > On Jan 13, 2008 11:27 PM, pepone. onrez <pepone(dot)onrez(at)gmail(dot)com >
> > > wrote:
> > >
> > > > Sorry Alex i forget mention that i have setscan of in my last test.
> > > >
> > > > now I have set seqscan on and indexscan on and added order by _id
> > > >
> > > > The table has an index in the _id field
> > > >
> > > > CREATE INDEX i_documentcontent_document
> > > > ON t_documentcontent
> > > > USING btree
> > > > (_document);
> > > >
> > > > The database was rencently vacum analyze , but not vacun full
> > > >
> > > > here is the explain of 2 diferent queries , when i put a large
> > > > OFFSET
> > > >
> > > > EXPLAIN ANALYZE SELECT
> > > > t_documentcontent._id AS _id
> > > > FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000
> > > >
> > > > "Limit (cost=137068.24..137068.36 rows=50 width=58) (actual time=
> > > > 41119.702..41119.792 rows=50 loops=1)"
> > > > " -> Sort (cost=136943.24..137320.26 rows=150807 width=58)
> > > > (actual time=41064.802..41100.424 rows=50050 loops=1)"
> > > > " Sort Key: _id"
> > > > " -> Seq Scan on t_documentcontent (cost= 0.00..110772.07rows=150807 width=58) (actual time=
> > > > 106.679..33267.194 rows=150807 loops=1)"
> > > > "Total runtime: 41120.015 ms"
> > > >
> > > > EXPLAIN ANALYZE SELECT
> > > > t_documentcontent._id AS _id
> > > > FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000
> > > >
> > > > "Limit (cost=39839.37..40038.56 rows=50 width=58) (actual time=
> > > > 1172.969..1194.228 rows=50 loops=1)"
> > > > " -> Index Scan using i_documentcontent_id on t_documentcontent
> > > > (cost=0.00..600805.54 rows=150807 width=58) (actual time=
> > > > 0.077..1189.688 rows=10050 loops=1)"
> > > > "Total runtime: 1194.316 ms"
> > > >
> > > > Tom
> > > > i using uuid for the _id field that is the primary key add a WHERE
> > > > id > ? don 't apply
> > > > the cursor aproach is also not suitable for same of my queries
> > > >
> > > > I use this query for paginate contents of a filesysstem with lots of
> > > > documents avoid offset is not posible always
> > >
> > >
> > >
> >
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alex Turner 2008-01-14 05:54:17 Re: query performance
Previous Message Sim Zacks 2008-01-14 05:46:15 Re: 8.2.4 serious slowdown