Re: selects from large tables

From: Nikk Anderson <Nikk(dot)Anderson(at)parallel(dot)ltd(dot)uk>
To: 'Stephan Szabo' <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: selects from large tables
Date: 2002-11-18 15:31:34
Message-ID: DA1274E682D3734B8802904A9B36124C298A95@nic-nts1.nic.parallel.ltd.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
Thanks for the reply Stephen, the data is 'somewhat' realistic.....

The data in the table is actually synthetic, but the structure is the same
as our live system, and the queries are similar to those we actually carry
out.

As the data was synthetic there was a bit of repetition (19 million rows of
repetition!! ) of the item used in the where clause, meaning that most of
the table was returned by the queries - oops! So, I have done is some more
realistic queries from our live system, and put the time it takes, and the
explain results. Just to note that the explain's estimated number of rows
is way out - its guesses are way too low.

Typically a normal query on our live system returns between 200 and 30000
rows depending on the reports a user wants to generate. In prior testing,
we noted that using SELECT COUNT( .. was slower than other queries, which
is why we though we would test counts first.

Here are some more realistic results, which still take a fair whack of
time........

Starting query 0
Query 0: SELECT * FROM xx WHERE time BETWEEN '2002-11-17 14:08:58.021' AND
'2002-11-18 14:08:58.021' AND job_id = 335
Time taken = 697 ms
Index Scan using http_timejobid on xx (cost=0.00..17.01 rows=4 width=57)
This query returns 500 rows of data

Starting query 1
Query 1: SELECT * FROM xx WHERE time BETWEEN '2002-11-11 14:08:58.021' AND
'2002-11-18 14:08:58.021' AND job_id = 335
Time taken = 15 seconds
Index Scan using http_timejobid on xx (cost=0.00..705.57 rows=175 width=57)
This query return 3582 rows

Starting query 2
Query 2: SELECT * FROM xx WHERE time BETWEEN '2002-10-19 15:08:58.021' AND
'2002-11-18 14:08:58.021' AND job_id = 335;
Time taken = 65 seconds
Index Scan using http_timejobid on xx (cost=0.00..3327.55 rows=832
width=57)
This query returns 15692 rows

Starting query 3
Query 3: SELECT * FROM xx_result WHERE time BETWEEN '2002-08-20
15:08:58.021' AND '2002-11-18 14:08:58.021' AND job_id = 335;
Time taken = 241 seconds
Index Scan using http_timejobid on xx (cost=0.00..10111.36 rows=2547
width=57)
This query returns 48768 rows

Cheers

Nikk

-----Original Message-----
From: Stephan Szabo [mailto:sszabo(at)megazone23(dot)bigpanda(dot)com]
Sent: 18 November 2002 13:02
To: Nikk Anderson
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] selects from large tables

On Mon, 18 Nov 2002, Nikk Anderson wrote:

> Any ideas on how we can select data more quickly from large tables?

Are these row estimates realistic? It's estimating nearly 20 million rows
to be returned by some of the queries (unless I'm misreading the
number - possible since it's 5am here). At that point you almost
certainly want to be using a cursor rather than plain queries since even a
small width result (say 50 bytes) gives a very large (1 gig) result set.

> - Queries and explain plans
>
> select count(*) from table_name;
> NOTICE: QUERY PLAN:
> Aggregate (cost=488700.65..488700.65 rows=1 width=0)
> -> Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=0)
>
> hawkdb=# explain select count(job_id) from table_name;
> NOTICE: QUERY PLAN:
> Aggregate (cost=488700.65..488700.65 rows=1 width=4)
> -> Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=4)
>
> hawkdb=# explain select * from table_name;
> NOTICE: QUERY PLAN:
> Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=57)
>
> hawkdb=# explain select count(*) from table_name where job_id = 13;
> NOTICE: QUERY PLAN:
> Aggregate (cost=537874.18..537874.18 rows=1 width=0)
> -> Seq Scan on table_name (cost=0.00..488700.65 rows=19669412 width=0)
>
> hawkdb=# explain select * from table_name where job_id = 13;
> NOTICE: QUERY PLAN:
> Seq Scan on http_result (cost=0.00..488700.65 rows=19669412 width=57)
>
> hawkdb=# explain select * from table_name where job_id = 1;
> NOTICE: QUERY PLAN:
> Index Scan using http_result_pk on table_name (cost=0.00..5.01 rows=1
> width=57)
>
> hawkdb=#explain select * from table_name where time > '2002-10-10';
> NOTICE: QUERY PLAN:
> Seq Scan on table_name (cost=0.00..488700.65 rows=19649743 width=57)
>
> hawkdb=# explain select * from http_result where time < '2002-10-10';
> NOTICE: QUERY PLAN:
> Index Scan using table_name_time on table_name (cost=0.00..75879.17
> rows=19669 width=57)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nikk Anderson 2002-11-18 15:36:08 Re: selects from large tables
Previous Message Tom Lane 2002-11-18 15:03:33 Re: selects from large tables