## Re: experiments in query optimization

From: Robert Haas Faheem Mitha pgsql-performance(at)postgresql(dot)org Re: experiments in query optimization 2010-03-29 19:55:46 603c8f071003291255x151746dif0e33d18124c8bd2@mail.gmail.com (view raw or whole thread) 2010-03-25 19:57:54 from Faheem Mitha  2010-03-29 18:02:03 from Robert Haas   2010-03-29 18:31:44 from Faheem Mitha    2010-03-29 19:55:46 from Robert Haas     2010-03-29 20:22:05 from Faheem Mitha    2010-03-30 16:08:10 from "Kevin Grittner"     2010-03-30 16:30:14 from Faheem Mitha      2010-03-30 19:59:46 from Robert Haas       2010-03-31 10:10:45 from Faheem Mitha        2010-03-31 15:04:13 from Robert Haas        2010-04-01 11:46:40 from Faheem Mitha         2010-04-01 16:31:02 from Eliot Gable          2010-04-01 18:15:13 from Faheem Mitha           2010-04-01 18:50:59 from Robert Haas            2010-04-01 19:01:09 from Faheem Mitha             2010-04-01 19:29:17 from Eliot Gable              2010-04-01 19:32:37 from Faheem Mitha      2010-03-31 09:29:58 from Matthew Wakeling       2010-03-31 09:36:11 from Faheem Mitha     2010-03-30 17:19:40 from Faheem Mitha pgsql-performance
On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha <faheem(at)email(dot)unc(dot)edu> wrote:
>> It's not really too clear to me from reading this what specific
>> questions you're trying to answer.
>
> Quote from opt.{tex/pdf}, Section 1:
>
> "If I have to I can use Section~\ref{ped_hybrid} and
> Section~\ref{tped_hybrid}, but I am left wondering why I get the performance
> I do out of the earlier versions. Specifically, why is
> Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and
> why does the memory usage in Section~\ref{ped_phenoout} blow up relative to
> Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?"

Here and in the document, you refer to section numbers for the
"hybrid" version but I don't see where you define what the "hybrid"
version actually is.  And the differences between your queries are not
real clear either - first you say you took out pheno and sex because
they weren't necessary, but then you decide to put them back.  I don't
know what that means.  If they're not necessary, leave them out.

>> One random thought: WHERE row_number() = 1 is not too efficient.
>> Try using LIMIT or DISTINCT ON instead.
>
> Possibly. However, the CTE that uses
>
> WHERE row_number() = 1
>
> doesn't dominate the runtime or memory usage, so I'm not too concerned

Hmm, you might be right.

>> If you're concerned about memory usage, try reducing work_mem; you've
>> probably got it set to something huge.
>
> work_mem = 1 GB (see diag.{tex/pdf}).
>
> The point isn't that I'm using so much memory. Again, my question is, why
> are these changes affecting memory usage so drastically?

Well each sort or hash can use an amount of memory that is limited
from above by work_mem.  So if you write the query in a way that
involves more sorts or hashes, each one can add up to 1GB to your
memory usage, plus overhead.  However, it doesn't look like any of
your queries including 30 sorts or hashes, so I'm thinking that the
RSS number probably also includes some of the shared memory that has
been mapped into each backend's address space.  RSS is not a terribly
reliable number when dealing with shared memory; it's hard to say what
that really means.

>> You might need to create some indices, too.
>
> Ok. To what purpose? This query picks up everything from the tables and the
> planner does table scans, so conventional wisdom and indeed my experience,
> says that indexes are not going to be so useful.

Well, a hash join is not usually the first thing that pops to mind
when dealing with a table that has 825 million rows (geno).  I don't
know if a nested loop with inner-indexscan would be faster, but it
would almost certainly use less memory.

...Robert



### pgsql-performance by date

 Next: From: Faheem Mitha Date: 2010-03-29 20:22:05 Subject: Re: experiments in query optimization Previous: From: Faheem Mitha Date: 2010-03-29 18:31:44 Subject: Re: experiments in query optimization