Re: [PERFORM] How to read query plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Miroslav Šulc <miroslav(dot)sulc(at)startnet(dot)cz>
Cc: pgsql-performance(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PERFORM] How to read query plan
Date: 2005-03-14 00:08:41
Message-ID: 6715.1110758921@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav(dot)sulc(at)startnet(dot)cz> writes:
>> Is the data confidential? If you'd be willing to send me a pg_dump
>> off-list, I'd like to replicate this test and try to see where the time
>> is going.
>>
> Thank you very much for your offer. The data are partially confidental
> so I hashed some of the text information and changed some values (not
> the values for the JOINs) so I could send it to you. I've checked the
> EXPLAIN ANALYZE if anything changed and the result is merely the same
> (maybe cca 1 sec slower - maybe because the hash caused the text data to
> be longer).

No problem; thank you for supplying the test case. What I find is
rather surprising: most of the runtime can be blamed on disassembling
and reassembling tuples during the join steps. Here are the hot spots
according to gprof:

-----------------------------------------------
1.27 7.38 8277/103737 ExecScan [16]
2.93 17.02 19092/103737 ExecNestLoop <cycle 2> [14]
3.91 22.70 25456/103737 ExecMergeJoin <cycle 2> [13]
7.81 45.40 50912/103737 ExecHashJoin <cycle 2> [12]
[9] 86.3 15.92 92.50 103737 ExecProject [9]
7.65 76.45 8809835/9143692 ExecEvalVar [10]
3.42 4.57 103737/103775 heap_formtuple [17]
0.03 0.24 12726/143737 ExecMakeFunctionResultNoSets [24]
0.02 0.12 103737/290777 ExecStoreTuple [44]
0.01 0.00 2/2 ExecEvalFunc [372]
0.00 0.00 2/22 ExecMakeFunctionResult [166]
-----------------------------------------------
0.00 0.00 42/9143692 ExecEvalFuncArgs [555]
0.05 0.51 59067/9143692 ExecHashGetHashValue [32]
0.24 2.38 274748/9143692 ExecMakeFunctionResultNoSets [24]
7.65 76.45 8809835/9143692 ExecProject [9]
[10] 69.5 7.94 79.34 9143692 ExecEvalVar [10]
79.34 0.00 8750101/9175517 nocachegetattr [11]
-----------------------------------------------

I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above). Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns. So the cost is
roughly O(N^2) in the number of columns.

As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps. In other words make the
later joins add more columns than the earlier, as much as you can.

This is actually good news, because before 8.0 we had much worse
problems than this with extremely wide tuples --- there were O(N^2)
behaviors all over the place due to the old List algorithms. Neil
Conway's rewrite of the List code got rid of those problems, and now
we can see the places that are left to optimize. The fact that there
seems to be just one is very nice indeed.

Since ExecProject operations within a nest of joins are going to be
dealing entirely with Vars, I wonder if we couldn't speed matters up
by having a short-circuit case for a projection that is only Vars.
Essentially it would be a lot like execJunk.c, except able to cope
with two input tuples. Using heap_deformtuple instead of retail
extraction of fields would eliminate the O(N^2) penalty for wide tuples.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-03-14 03:00:37 Re: TODO item: support triggers on columns
Previous Message Kurt Roeckx 2005-03-13 23:54:37 Re: date_trunc problem in HEAD

Browse pgsql-performance by date

  From Date Subject
Next Message Qingqing Zhou 2005-03-14 01:41:30 Re: One tuple per transaction
Previous Message Tom Lane 2005-03-13 20:26:27 Re: How to read query plan