Re: is there a relationship between indexes and temporary file creation?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: is there a relationship between indexes and temporary file creation?
Date: 2015-03-14 19:28:30
Message-ID: 19655.1426361310@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com> writes:
> I have a table with 10s of millions of rows and I'm running a fairly
> complex query with a lot of self joins.

> This is an experimental db and the table structure needs to stay as it is.
> I've noticed that some queries are not completed because postgres uses all
> available space on disk for temporary data when doing complex joins.

> The table has no indexes at the moment; I'm measuring performance with and
> without indexes so I don't mind long query times. Running out of disk space
> is a problem though.I can't think of a connection between indexes and the
> temporary space needed for joins but I hope I can get some input from those
> with (a lot) more knowledge about potgres internals.

> Is there any possibility of indexing decreasing the required temporary disk
> space when performing complex joins?

Hm. Your alternatives don't seem terribly good here. Presumably you're
running out of space because either a merge or hash join will spill temp
data to disk if there's too much data. While an indexed nestloop join
wouldn't have that problem, for the amount of data you're apparently
dealing with, you might be dead of old age before it finishes :-(.
A slightly more promising idea is that an indexscan might substitute for
the sort needed by a mergejoin, so again not needing temp space; but
this still likely implies a whole lot of random I/O which will take
forever, unless maybe your data is more or less in order by the join key.

In short I doubt that indexes will be magic bullets for enormous joins.
You probably need to be smarter about what the queries are asking for.
But without a lot more detail than you've provided here, we're not going
to be able to help much about that.

If you'd like additional advice, I'd recommend reviewing
https://wiki.postgresql.org/wiki/Slow_Query_Questions
and then taking the discussion to the pgsql-performance list.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Denver Timothy 2015-03-15 00:21:27 Unexpected custom type behavior using ROW(NULL)
Previous Message Миша Тюрин 2015-03-14 19:27:18 pitr archive_command cp fsync