Re: Query optimizer 8.0.1 (and 8.0)

From: pgsql(at)mohawksoft(dot)com
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Ron Mayer" <ron(at)cheapcomplexdevices(dot)com>, "Mark Kirkwood" <markir(at)coretech(dot)co(dot)nz>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query optimizer 8.0.1 (and 8.0)
Date: 2005-02-14 14:11:10
Message-ID: 16718.24.91.171.78.1108390270.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Probably off-topic, but I think it's worth to see what astronomers are
> doing with their very big spatial databases. For example, we are working
> with more than 500,000,000 rows catalog and we use some special
> transformation
> of coordinates to integer numbers with preserving objects closeness.
> I hope we could show postgresql is good enough to be used in astronomy
> for very big catalogs. Currently, MS SQL is in use.
> See http://www.sdss.jhu.edu/htm/ for details. We use another technique.

You know, I don't think a lot of people "get" the issues I was describing,
or maybe they don't believe it, I don't know, but, I think that it would
be a useful contrib project to create an 'analyze_special('table',
'column', 'method')' function that does a better job at calculating the
stats for table that contain multiple trend waveforms. A separate function
will probably work well as the trends within the data probably only apply
to specific rows.

It's interesting, because I don't think it needs to calculate a perfect
representation of the data so much as better clue to its nature for the
optimizer.

When I get the time (or can get someone to pay me to do it) I'm going to
try it.

>
>
> Oleg
> On Wed, 9 Feb 2005 pgsql(at)mohawksoft(dot)com wrote:
>
>> I wrote a message caled "One Big trend vs multiple smaller trends in
>> table
>> statistics" that, I think, explains what we've been seeing.
>>
>>
>>> pgsql(at)mohawksoft(dot)com wrote:
>>>>
>>>> In this case, the behavior observed could be changed by altering the
>>>> sample size for a table. I submit that an arbitrary fixed sample size
>>>> is
>>>> not a good base for the analyzer, but that the sample size should be
>>>> based
>>>> on the size of the table or some calculation of its deviation.
>>>>
>>>
>>> Mark,
>>>
>>> Do you have any evidence that the Sample Size had anything to do
>>> with the performance problem you're seeing?
>>
>> Sample size is only a bandaid for the issue, however, more samples
>> always
>> provide more information.
>>
>>
>>>
>>> I also do a lot with the complete Census/TIGER database.
>>>
>>> Every problem I have with the optimizer comes down to the
>>> fact that the data is loaded (and ordered on disk) by
>>> State/County FIPS codes, and then queried by zip-code
>>> or by city name. Like this:
>>>
>>> Alabama 36101 [hundreds of pages with zip's in 36***]
>>> Alaska 99686 [hundreds of pages with zip's in 9****]
>>> Arizona 85701 [hundreds of pages with zip's in 855**]
>>>
>>> Note that the zip codes are *NOT* sequential.
>>
>> Again, read "One Big Trend..." and let me know what you think. I think
>> it
>> describes exactly the problem that we see.
>>
>> For now, the solution that works for me is to seriously up the value of
>> "targrows" in analyze.c. It makes it take longer, and while the stats
>> are
>> not "correct" because they are not designed to detect these sorts of
>> patterns, a larger sample allows them to be "less wrong" enough to give
>> a
>> better hint to the planner.
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 7: don't forget to increase your free space map settings
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-14 16:11:53 Re: Schema name of function
Previous Message Sibtay Abbas 2005-02-14 12:02:19 Re: getting oid of function