Re: [patch] Proposal for \crosstabview in psql

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] Proposal for \crosstabview in psql
Date: 2015-12-26 05:01:52
Message-ID: CAFj8pRBZiqvQ=4bWKZLRCZWm+HfMNSTFCYJvEOCRJT5mQ1hwOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-12-23 21:36 GMT+01:00 Daniel Verite <daniel(at)manitou-mail(dot)org>:

> Hi,
>
> Here's an updated patch that replaces sorted arrays by AVL binary trees
> when gathering distinct values for the columns involved in the pivot.
> The change is essential for large resultsets. For instance,
> it allows to process a query like this (10 million rows x 10 columns):
>
> select x,(random()*10)::int, (random()*1000)::int from
> generate_series(1,10000000) as x
> \crosstabview
>
> which takes about 30 seconds to run and display on my machine with the
> attached patch. That puts it seemingly in the same ballpark than
> the equivalent test with the server-side crosstab().
>
> With the previous iterations of the patch, this test would never end,
> even with much smaller sets, as the execution time of the 1st step
> grew exponentially with the number of distinct keys.
> The exponential effect starts to be felt at about 10k values on my low-end
> CPU,
> and from there quickly becomes problematic.
>
> As a client-side display feature, processing millions of rows like in
> the query above does not necessarily make sense, it's pushing the
> envelope, but stalling way below 100k rows felt lame, so I'm happy to get
> rid of that limitation.
>
> However, there is another one. The above example does not need or request
> an additional sort step, but if it did, sorting more than 65535 entries in
> the vertical header would error out, because values are shipped as
> parameters to PQexecParams(), which only accepts that much.
> To avoid the problem, when the rows in the output "grid" exceed 2^16 and
> they need to be sorted, the user must let the sort being driven by ORDER
> BY
> beforehand in the query, knowing that the pivot will keep the original
> ordering intact in the vertical header.
>
> I'm still thinking about extending this based on Pavel's diff for the
> "label" column, so that
> \crosstabview [+|-]colV[:colSortH] [+|-]colH[:colSortH]
> would mean to use colV/H as grid headers but sort them according
> to colSortV/H.
> I prefer that syntax over adding more parameters, and also I'd like
> to have it work in both V and H directions.
>

This syntax is good - simple, readable

Pavel

>
> Aside from the AVL trees, there are a few other minor changes in that
> patch:
> - move non-exportable structs from the .h to the .c
> - move code in common.c to respect alphabetical ordering
> - if vertical sort is requested, add explicit check against more than 65535
> params instead of letting the sort query fail
> - report all failure cases of the sort query
> - rename sortColumns to serverSort and use less the term "columns" in
> comments and variables.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-12-26 08:17:50 Re: [patch] Proposal for \crosstabview in psql
Previous Message Jeff Janes 2015-12-25 22:10:43 9.5rc1 brin_summarize_new_values