Re: Query Optimizer makes a poor choice

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Query Optimizer makes a poor choice
Date: 2011-11-30 23:41:15
Message-ID: 4ED6BF1B.2080604@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 30.11.2011 23:22, Tyler Hains wrote:
>>> I haven't had a chance to experiment with the SET STATISTICS, but
> that
>>> got me going on something interesting...
>>>
>>> Do these statistics look right?
>>>
>>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>>> pg_stats WHERE tablename = 'cards';
>>>
>> ...
>>> "card_set_id" 905
>>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>>
>> This looks promising, because n_distinct is low enough that you can
>> cover almost all values with statistics.
>> raise the statistics and ANALYZE. should help.
>> (NOTE NOTE NOTE: assuming that the distribution is even)
>>
>>
>> ...
>> but one thing we see for sure is that you have not tuned your
>> PostgreSQL instance :-)
>> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>> it covers most important stuff, *including* default_statistics_target.
>>
>>
>>
>> Filip
>>
>
> I just tried the set statistics on our test system with essentially the
> same end result.

Can you describe the problem in a bit more detail? Because maybe you
just have the same problem as the OP.

Because with this (very simple) test case it works just fine.

========================================================================
create table test_tab (id int primary key, val int, txtval text);

insert into test_tab select i, mod(i, 10000), md5(i::text) from
generate_series(1,10000000) s(i);

create index test_tab_idx on test_tab (val);

analyze test_tab;
========================================================================

The table is about 730MB, the indexes are about 214MB each.

========================================================================
explain analyze select * from test_tab where val = 500 order by id;

1st execution (not cached): http://explain.depesz.com/s/1VQ (7786 ms)
2nd execution (cached): http://explain.depesz.com/s/cnt (1 ms)

explain analyze select * from test_tab where val = 500 order by id limit 1;

1st execution (not cached): http://explain.depesz.com/s/nlE (66 ms)
2nd execution (cached): http://explain.depesz.com/s/WNa (0.08 ms)
========================================================================

So in both cases the LIMIT (with index scan) is faster. Sure, there may
be cases when this does not work that well - maybe it's not well cached,
maybe there's some other issue.

But it clearly is not true that LIMIT is evil and should be avoided.

Tomas

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-11-30 23:44:45 Re: Is it possible to make a streaming replication faster using COPY instead of lots of INSERTS?
Previous Message Scott Mead 2011-11-30 23:04:29 Re: Extending the volume size of the data directory volume