On Wed, Mar 31, 2010 at 6:10 AM, Faheem Mitha <faheem(at)email(dot)unc(dot)edu> wrote:
> [If Kevin Grittner reads this, please fix your email address. I am getting
> bounces from your email address.]
> On Tue, 30 Mar 2010, Robert Haas wrote:
>> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem(at)email(dot)unc(dot)edu>
>>> Sure, but define sane setting, please. I guess part of the point is that
>>> trying to keep memory low, and it seems this is not part of the planner's
>>> priorities. That it, it does not take memory usage into consideration
>>> choosing a plan. If that it wrong, let me know, but that is my
>> I don't understand quite why you're confused here. We've already
>> explained to you that the planner will not employ a plan that uses
>> more than the amount of memory defined by work_mem for each sort or
>> Typical settings for work_mem are between 1MB and 64MB. 1GB is enormous.
> I don't think I am confused. To be clear, when I said "it does not take
> memory usage into consideration' I was talking about overall memory usage.
> Let me summarize:
> The planner will choose the plan with the minimum total cost, with the
> constraint that the number of memory used for each of certain steps is less
> than work_mem. In other words with k such steps it can use at most
> memory where k(plan) denotes that k is a function of the plan. (I'm assuming
> here that memory is not shared between the different steps). However,
> k(plan)*work_mem is not itself bounded. I fail to see how reducing work_mem
> significantly would help me. This would mean that the current plans I am
> using would likely be ruled out, and I would be left with plans which, by
> definition, would have larger cost and so longer run times. The current
> runtimes are already quite long - for the PED query, the best I can do with
> work_mem=1 GB is 2 1/2 hrs, and that is after splitting the query into two
> I might actually be better off *increasing* the memory, since then the
> planner would have more flexibility to choose plans where the individual
> steps might require more memory, but the overall memory sum might be lower.
OK, your understanding is correct.
>>>>>> You might need to create some indices, too.
>>>>> Ok. To what purpose? This query picks up everything from the
>>>>> tables and the planner does table scans, so conventional wisdom
>>>>> and indeed my experience, says that indexes are not going to be so
>>>> There are situations where scanning the entire table to build up a
>>>> hash table is more expensive than using an index. Why not test it?
>>> Certainly, but I don't know what you and Robert have in mind, and I'm not
>>> experienced enough to make an educated guess. I'm open to specific
>> Try creating an index on geno on the columns that are being used for the
> Ok, I'll try that. I guess the cols in question on geno are idlink_id and
> anno_id. I thought that I already had indexes on them, but no. Maybe I had
> indexes, but removed them.
> If I understand the way this works, if you request, say an INNER JOIN, the
> planner can choose different ways/algorithms to do this, as in
> http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a hash
> join, or an nested loop join or something else, based on cost. If the
> indexes don't exist that may make the inner loop join more expensive, so tip
> the balance in favor of using a hash join. However, I have no way to control
> which option it chooses, short of disabling eg. the hash join option, which
> is not an option for production usage anyway. Correct?
In response to
pgsql-performance by date
|Next:||From: Ľubomír Varga||Date: 2010-03-31 15:46:38|
|Subject: Some question|
|Previous:||From: Bruce Momjian||Date: 2010-03-31 14:23:29|
|Subject: Re: mysql to postgresql, performance questions|