Re: [patch] Proposal for \crosstabview in psql

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] Proposal for \crosstabview in psql
Date: 2015-12-23 20:36:51
Message-ID: d4186a6f-4e25-4c62-bd65-4cf3929c21c6@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here's an updated patch that replaces sorted arrays by AVL binary trees
when gathering distinct values for the columns involved in the pivot.
The change is essential for large resultsets. For instance,
it allows to process a query like this (10 million rows x 10 columns):

select x,(random()*10)::int, (random()*1000)::int from
generate_series(1,10000000) as x
\crosstabview

which takes about 30 seconds to run and display on my machine with the
attached patch. That puts it seemingly in the same ballpark than
the equivalent test with the server-side crosstab().

With the previous iterations of the patch, this test would never end,
even with much smaller sets, as the execution time of the 1st step
grew exponentially with the number of distinct keys.
The exponential effect starts to be felt at about 10k values on my low-end
CPU,
and from there quickly becomes problematic.

As a client-side display feature, processing millions of rows like in
the query above does not necessarily make sense, it's pushing the
envelope, but stalling way below 100k rows felt lame, so I'm happy to get
rid of that limitation.

However, there is another one. The above example does not need or request
an additional sort step, but if it did, sorting more than 65535 entries in
the vertical header would error out, because values are shipped as
parameters to PQexecParams(), which only accepts that much.
To avoid the problem, when the rows in the output "grid" exceed 2^16 and
they need to be sorted, the user must let the sort being driven by ORDER BY
beforehand in the query, knowing that the pivot will keep the original
ordering intact in the vertical header.

I'm still thinking about extending this based on Pavel's diff for the
"label" column, so that
\crosstabview [+|-]colV[:colSortH] [+|-]colH[:colSortH]
would mean to use colV/H as grid headers but sort them according
to colSortV/H.
I prefer that syntax over adding more parameters, and also I'd like
to have it work in both V and H directions.

Aside from the AVL trees, there are a few other minor changes in that
patch:
- move non-exportable structs from the .h to the .c
- move code in common.c to respect alphabetical ordering
- if vertical sort is requested, add explicit check against more than 65535
params instead of letting the sort query fail
- report all failure cases of the sort query
- rename sortColumns to serverSort and use less the term "columns" in
comments and variables.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment Content-Type Size
psql-crosstabview-v9.diff text/x-patch 34.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-12-23 21:03:21 Re: Using quicksort for every external sort run
Previous Message Jeff Janes 2015-12-23 20:33:42 Re: GIN data corruption bug(s) in 9.6devel