- EXPLAIN ANALYZE provides a more useful analysis of a slow query,
because it gives both the estimate and actual times/rows for each step
in the plan.
- The documentation is right: rows with little variation are pretty
useless to index. Indexing is about "selectivity", reducing the amount
of stuff the database has to read off the the disk.
- You only have two things in your WHERE clause, so that is where the
most important indexes reside. How many of your rows have p1=53? How
many of your rows have happened in the last day? If your answer is "a
lot" then the indexes are not going to help: PostgreSQL will be more
efficient scanning every tuple than it will be jumping around the index
structure for a large number of tuples.
- If neither time nor p1 are particularly selective individually, but
they are selective when taken together, try a multi-key index on them both.
Kjell Tore Fossbakk wrote:
> I use FreeBSD 4.11 with PostGreSQL 7.3.8.
> I got a huge database with roughly 19 million records. There is just one
> table, with a time field, a few ints and a few strings.
> table test
> fields time (timestamp), source (string), destination (string), p1 (int),
> p2 (int)
> I have run VACUUM ANALYZE ;
> I have created indexes on every field, but for some reason my postgre
> server wants to use a seqscan, even tho i know a indexed scan would be
> much faster.
> create index test_time_idx on test (time) ;
> create index test_source_idx on test (source) ;
> create index test_destination_idx on test (destination) ;
> create index test_p1_idx on test (p1) ;
> create index test_p2_idx on test (p2) ;
> What is really strange, is that when i query a count(*) on one of the int
> fields (p1), which has a very low count, postgre uses seqscan. In another
> count on the same int field (p1), i know he is giving about 2.2 million
> hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't
> the whole idea of indexing to increase performance in large queries.. To
> make sort of a phonebook for the values, to make it faster to look up what
> ever you need... This just seems opposite..
> Here is a EXPLAIN of my query
> database=> explain select date_trunc('hour', time),count(*) as total from
> test where p1=53 and time > now() - interval '24 hours' group by
> date_trunc order by date_trunc ;
> QUERY PLAN
> Aggregate (cost=727622.61..733143.23 rows=73608 width=8)
> -> Group (cost=727622.61..731303.02 rows=736083 width=8)
> -> Sort (cost=727622.61..729462.81 rows=736083 width=8)
> Sort Key: date_trunc('hour'::text, "time")
> -> Seq Scan on test (cost=0.00..631133.12 rows=736083
> Filter: ((p1 = 53) AND ("time" > (now() - '1
> (6 rows)
> database=> drop INDEX test_<TABULATOR>
> test_source_idx test_destination_idx test_p1_idx
> test_p2_idx test_time_idx
> After all this, i tried to set enable_seqscan to off and
> enable_nestedloops to on. This didnt help much either. The time to run the
> query is still in minutes. My results are the number of elements for each
> hour, and it gives about 1000-2000 hits per hour. I have read somewhere,
> about PostGreSQL, that it can easily handle 100-200million records. And
> with the right tuned system, have a great performance.. I would like to
> learn how :)
> I also found an article on a page
> ( http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):
> Tip #11: Don't bother indexing columns with huge numbers of records and a
> small range of values, such as BOOLEAN columns.
> This tip, regretfully, is perhaps the only tip where I cannot provide a
> good, real-world example from my work. So I'll give you a hypothetical
> situation instead:
> Imagine that you have a database table with a list of every establishment
> vending ice cream in the US. A simple example might look like:
> Where there were almost 1 million rows, but due to simplistic data entry,
> only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and
> 3-OTHER) which are relatively evenly distributed. In this hypothetical
> situation, you might find (with testing using EXPLAIN) that an index on
> type is ignored and the parser uses a "seq scan" (or table scan) instead.
> This is because a table scan can actually be faster than an index scan in
> this situation. Thus, any index on type should be dropped.
> Certainly, the boolean column (active) requires no indexing as it has only
> two possible values and no index will be faster than a table scan.
> Then I ask, what is useful with indexing, when I can't use it on a VERY
> large database? It is on my 15 million record database it takes for ever
> to do seqscans over and over again... This is probably why, as i mentioned
> earlier, the reason (read the quote) why he chooses a full scan and not a
> indexed one...
> So what do I do? :confused:
> I'v used SQL for years, but never in such a big scale. Thus, not having to
> learn how to deal with large number of records. Usually a maximum of 1000
> records. Now, with millions, I need to learn a way to make my sucky
> queries better.
> Im trying to learn more about tuning my system, makeing better queries and
> such. I'v found some documents on the Internet, but far from the best.
> Feedback most appreciated!
> a learning PostGreSQL user
In response to
pgsql-performance by date
|Next:||From: Tobias Brox||Date: 2005-06-21 20:40:57|
|Subject: Re: Prepared statements vs. Stored Procedures|
|Previous:||From: Tobias Brox||Date: 2005-06-21 19:46:39|
|Subject: Re: Limit clause not using index|