Re: Postgresql 12, 512 partition by hash. Slow select

From: Arya F <arya6000(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql 12, 512 partition by hash. Slow select
Date: 2020-04-05 19:50:18
Message-ID: CAFoK1azuLEF=Q_wmFgXcxo5bkiG_06VzD18svk9gBNCR+YEqeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Apr 5, 2020 at 2:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Arya F <arya6000(at)gmail(dot)com> writes:
> > I have a table with 120 million rows of data spread among 512
> > partitioned by hash table. The id column of the table is a uuid, which
> > is what is being used for the partition hash and it's also the PK for
> > the table.
>
> > The table has a text column, which also has a btree index on it. A
> > select query on an identical non-partitioned table takes 0.144
> > seconds, but on the partitioned table it takes 5.689 seconds.
>
> > Am I missing something in my setup? Or is this expected? I do know
> > having more than 100 partitions in prior versions of PostgreSQL 12
> > would cause a major slow down, but from what I read PostgreSQL 12
> > addresses that now?
>
> You have your expectations calibrated wrongly, I suspect.
>
> Your default expectation with a table with many partitions should be
> that queries will have to hit all those partitions and it will take a
> long time. If the query is such that the system can prove that it
> only needs to access one partition, then it can be fast --- but those
> proof rules are not superlatively bright, and they're especially not
> bright for hash partitioning since that has so little relationship
> to WHERE restrictions that practical queries would use. But if the
> query WHERE isn't restricting the partitioning key at all, as I suspect
> is the case for your query, then there's certainly no chance of not
> having to search all the partitions.
>
> If you showed us the specific table declaration and query you're
> working with, it might be possible to offer more than generalities.
>
> In general though, partitioning should be a last resort when you've
> got so much data that you have no other choice. I doubt that you
> are there at all with 100M rows, and you are certainly not at a point
> where using hundreds of partitions is a good idea. They are not
> cost-free, by a very long shot. And when you do partition, you
> typically need to think hard about what the partitioning rule will be.
> I'm afraid that hash partitioning is more of a shiny trap for novices
> than it is a useful tool, because it doesn't organize the data into
> meaningful sub-groups.
>
> regards, tom lane

The table at some point will have more than 1 billion rows, the
information stored is international residential addresses. Trying to
figure out a way of spreading the data fairly evenly thought out
multiple partitions, but I was unable to come up with a way of
splitting the data so that Postgres does not have to search though all
the partitions.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil 2020-04-05 20:48:49 Re: Postgresql 12, 512 partition by hash. Slow select
Previous Message Tom Lane 2020-04-05 18:55:00 Re: Postgresql 12, 512 partition by hash. Slow select