Re: [noob] How to optimize this double pivot query?

From: Robert Buck <buck(dot)robert(dot)j(at)gmail(dot)com>
To: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc: Thomas Kellerer <spam_eater(at)gmx(dot)net>, "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [noob] How to optimize this double pivot query?
Date: 2012-10-02 09:45:32
Message-ID: 2E4F715D-9957-4576-94DD-63D42C39ADDC@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi Samuel

Thank you. This may be a bit of a stretch for you, but would it be possible for me to peek at a sanitized version of your cross tab query, for a good example on how to do this for this noob?

This will be pretty common in my case. The biggest tables will get much larger as they are raw metrics feeds, which at some point need to be fed through reporting engines to analyze and spot regressions.

Lastly, am I simply using the wrong tech for data feeds and analytics? The first cut of this used flat files and R and though it scoured thousands of files was much faster than the SQL I wrote here. The big goal was to get this off disk and into a database, but as its highly variable, very sparse, metric data, this is why I chose k-v. SQL databases are internally more politically acceptable, though I am personally agnostic on the matter. In the end it would be nice to directly report off a database, but so long as I can transform to csv I can always perform reporting and analytics in R, and optionally map and reduce natively in Ruby. Sane? Ideas? This is early on, and willing to adjust course and find a better way if suggestions indicate such. I've heard a couple options so far.

Best regards,

Bob

On Oct 2, 2012, at 5:21 AM, Samuel Gendler <sgendler(at)ideasculptor(dot)com> wrote:

>
>
> On Mon, Oct 1, 2012 at 11:46 PM, Thomas Kellerer <spam_eater(at)gmx(dot)net> wrote:
>
> That combined with the tablefunc module (which let's you do pivot queries) might
> make your queries substantially more readable (and maybe faster as well).
>
>
> I woud think that using the crosstab functions in tablefunc would solve the problem without needing a complete change of structure. I've built crosstabs over a whole lot more than 54K rows in far, far less time (and resulting in more than 35 columns, too) than the 11 seconds that was quoted here, without feeling the need to deal with hstore or similar. In fact, wouldn't hstore actually make it more difficult to build a crosstab query than the schema that he has in place now?
>
> --sam
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David Johnston 2012-10-02 13:53:29 Re: Help in accessing array
Previous Message Samuel Gendler 2012-10-02 09:21:53 Re: [noob] How to optimize this double pivot query?