Re: is there a relationship between indexes and temporary file creation?

From: Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: is there a relationship between indexes and temporary file creation?
Date: 2015-03-15 08:25:30
Message-ID: CA+4ThdoxGuWUPgPArg4SfMxdWx2sFw+FU-Gm=rnGWGrAfWfFdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Tom,
This is helpful. I'm unable to share the queries but your reasoning helps
nonetheless.

Another idea that popped into my head is to consider table partitioning. Am
I correct to assume that (inevitable) joins could benefit from partitions
when sub query criteria constraints results to a particular partition?
I know this is all too vague, but even brief opinions would help. I knew
the current approach would converge to infinity in terms of query time :)
My findings support that but I'm trying to write down some potential
remedies and partial improvements as future directions for research.

Best regards
Seref

On Sat, Mar 14, 2015 at 7:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com> writes:
> > I have a table with 10s of millions of rows and I'm running a fairly
> > complex query with a lot of self joins.
>
> > This is an experimental db and the table structure needs to stay as it
> is.
> > I've noticed that some queries are not completed because postgres uses
> all
> > available space on disk for temporary data when doing complex joins.
>
> > The table has no indexes at the moment; I'm measuring performance with
> and
> > without indexes so I don't mind long query times. Running out of disk
> space
> > is a problem though.I can't think of a connection between indexes and the
> > temporary space needed for joins but I hope I can get some input from
> those
> > with (a lot) more knowledge about potgres internals.
>
> > Is there any possibility of indexing decreasing the required temporary
> disk
> > space when performing complex joins?
>
> Hm. Your alternatives don't seem terribly good here. Presumably you're
> running out of space because either a merge or hash join will spill temp
> data to disk if there's too much data. While an indexed nestloop join
> wouldn't have that problem, for the amount of data you're apparently
> dealing with, you might be dead of old age before it finishes :-(.
> A slightly more promising idea is that an indexscan might substitute for
> the sort needed by a mergejoin, so again not needing temp space; but
> this still likely implies a whole lot of random I/O which will take
> forever, unless maybe your data is more or less in order by the join key.
>
> In short I doubt that indexes will be magic bullets for enormous joins.
> You probably need to be smarter about what the queries are asking for.
> But without a lot more detail than you've provided here, we're not going
> to be able to help much about that.
>
> If you'd like additional advice, I'd recommend reviewing
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
> and then taking the discussion to the pgsql-performance list.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert James 2015-03-15 15:13:56 How do I calculate the sum of a field filtered by multiple windows defined by another field?
Previous Message David G. Johnston 2015-03-15 06:18:13 Re: regclass and format('%I')