Re: Performance issues when the number of records are around 10 Million

From: venu madhav <venutaurus539(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance issues when the number of records are around 10 Million
Date: 2010-05-12 12:14:55
Message-ID: AANLkTiliM8miKiQsd5eqtw8JhN7wAm7bR0z40zUweI1D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 12, 2010 at 5:25 PM, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov
> wrote:

> venu madhav wrote:
>
> >>> AND e.timestamp >= '1270449180'
> >>> AND e.timestamp < '1273473180'
> >>> ORDER BY.
> >>> e.cid DESC,
> >>> e.cid DESC
> >>> limit 21
> >>> offset 10539780
>
> > The second column acts as a secondary key for sorting if the
> > primary sorting key is a different column. For this query both of
> > them are same.
>
> Any chance you could just leave the second one off in that case?
>
[Venu] Yes, that can be ignored. But am not sure that removing it would
reduce the time drastically.

>
> > This query is part of an application which allows user to select
> > time ranges and retrieve the data in that interval. Hence the time
> > stamp.
>
> Which, of course, is going to affect the number of rows. Which
> leaves me wondering how you know that once you select and sequence
> the result set you need to read past and ignore exactly 10539780
> rows to get to the last page.
>
[Venu]For Ex: My database has 10539793 records. My application first
calculates the count of number of records in that interval. And then based
on user request to display 10/20/30/40 records in one page, it calculates
how many records to be displayed when the last link is clicked.

>
> > To have it in some particular order we're doing order by.
>
> Which will affect which rows are at any particular offset.
>
[Venu]Yes, by default it has the primary key for order by.

>
> > If the records are more in the interval,
>
> How do you know that before you run your query?
>
[Venu] I calculate the count first.

>
> > we display in sets of 20/30 etc. The user also has the option to
> > browse through any of those records hence the limit and offset.
>
> Have you considered alternative techniques for paging? You might
> use values at the edges of the page to run a small query (limit, no
> offset) when they page. You might generate all the pages on the
> first pass and cache them for a while.
>
> [Venu] If generate all the pages at once, to retrieve all the 10 M records
at once, it would take much longer time and since the request from the
browser, there is a chance of browser getting timed out.

> > When the user asks for the last set of 20 records, this query gets
> > executed.
>
> The DESC on the ORDER BY makes it look like you're trying to use the
> ORDER BY to get to the end, but then your offset tells PostgreSQL to
> skip the 10.5 million result rows with the highest keys. Is the
> "last page" the one with the highest or lowest values for cid?
>
> [Venu] The last page contains the lowest values of cid. By default we get
the records in the decreasing order of cid and then get the last 10/20.

Thank you,
Venu.

> -Kevin
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2010-05-12 13:56:08 Re: Performance issues when the number of records are around 10 Million
Previous Message Kevin Grittner 2010-05-12 11:55:26 Re: Performance issues when the number of records are around 10 Million