Re: Re: parallel distinct union and aggregate support patch

From: "bucoo(at)sohu(dot)com" <bucoo(at)sohu(dot)com>
To: "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, tgl <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Re: parallel distinct union and aggregate support patch
Date: 2020-11-30 17:27:11
Message-ID: 202011101538306278671@sohu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 1.
> +#define BATCH_SORT_MAX_BATCHES 512
>
> Did you decide this number based on some experiment or is there some
> analysis behind selecting this number?
When there are too few batches, if a certain process works too slowly, it will cause unbalanced load.
When there are too many batches, FD will open and close files frequently.

> 2.
> +BatchSortState* ExecInitBatchSort(BatchSort *node, EState *estate, int eflags)
> +{
> + BatchSortState *state;
> + TypeCacheEntry *typentry;
> ....
> + for (i=0;i<node->numGroupCols;++i)
> + {
> ...
> + InitFunctionCallInfoData(*fcinfo, flinfo, 1, attr->attcollation, NULL, NULL);
> + fcinfo->args[0].isnull = false;
> + state->groupFuns = lappend(state->groupFuns, fcinfo);
> + }
>
> From the variable naming, it appeared like the batch sort is dependent
> upon the grouping node. I think instead of using the name
> numGroupCols and groupFuns we need to use names that are more relevant
> to the batch sort something like numSortKey.
Not all data types support both sorting and hashing calculations, such as user-defined data types.
We do not need all columns to support hash calculation when we batch, so I used two variables.

> 3.
> + if (eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))
> + {
> + /* for now, we only using in group aggregate */
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("not support execute flag(s) %d for group sort", eflags)));
> + }
>
> Instead of ereport, you should just put an Assert for the unsupported
> flag or elog.
In fact, this is an unfinished feature, BatchSort should also support these features, welcome to supplement.

> 4.
> + state = makeNode(BatchSortState);
> + state->ps.plan = (Plan*) node;
> + state->ps.state = estate;
> + state->ps.ExecProcNode = ExecBatchSortPrepare;
>
> I think the main executor entry function should be named ExecBatchSort
> instead of ExecBatchSortPrepare, it will look more consistent with the
> other executor machinery.
The job of the ExecBatchSortPrepare function is to preprocess the data (batch and pre-sort),
and when its work ends, it will call "ExecSetExecProcNode(pstate, ExecBatchSort)" to return the data to the ExecBatchSort function.
There is another advantage of dividing into two functions,
It is not necessary to judge whether tuplesort is now available every time the function is processed to improve the subtle performance.
And I think this code is clearer.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2020-11-30 17:34:21 Re: Notes on physical replica failover with logical publisher or subscriber
Previous Message Bossart, Nathan 2020-11-30 17:17:41 Re: A few new options for CHECKPOINT