Re: How to check a table content efficiently? With LIMIT and OFFSET?

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to check a table content efficiently? With LIMIT and OFFSET?
Date: 2011-05-29 21:55:01
Message-ID: BANLkTiktYM9jt4AFTYNKa6hX3sMQ2nw2rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Alban

On 2011/5/29 Alban Hertroys wrote:
> On 29 May 2011, at 19:45, Stefan Keller wrote:
>
>> But I'm hesitating to use ANALYZE for two reasons:
>> 1. It's very slow: it repeadly takes 59000 ms on my machine.
>
> ANALYZE on a single table takes 59s?!? That's _really_ long. How big is that table (it has about 180k rows, you did provide that information, but that's not much at all) and how many indexes are on it? Are you sure you're not overburdening your hardware in some way?
>
> Or are you in fact talking about a different command? For example, ANALYZE (without specifying a table) or VACUUM ANALYZE <table>?

You are right: I used ANALYZE (without specifying a table). But this
still takes about 1 to 3 sec which is about 100 times slower than

SELECT (count(*) = 1) FROM (SELECT osm_id FROM planet_osm_point LIMIT
1 OFFSET 100000) tmp;
or
SELECT reltuples FROM pg_class WHERE relname = 'planet_osm_point';

>> 2. There's an autovacuum background process which already does the
>> job, doesn't it?
>
> Yes, but in its own time. If you know there has been a batch of inserts/deletes you might as well run analyse immediately on that table.

My table is a read-only table after all.
That's another reason why I'm reluctant using ANALYZE <table>.

> Also, on this mailing-list people don't appreciate it if you top-post. It makes the context
> hard to decipher and sometimes even makes it difficult to give an accurate answer
> because the information people want to refer to is far separated from the bit where
> they're trying to reply to something you said/asked. Remember, people aren't here for your sake.

Thank you for the hint, which I didn't know:
Is this really still part of this elderly USENET netiquette here?

Yours, Stefan

>> 2011/5/29 Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>:
>>> On 05/29/2011 05:45 AM, Stefan Keller wrote:
>>>>
>>>> Hi,
>>>>
>>>> That's my solution candidate:
>>>>
>>>> CREATE OR REPLACE FUNCTION isnotempty() RETURNS boolean AS '
>>>>   SELECT (count(*) = 1)
>>>>   FROM (SELECT osm_id FROM planet_osm_point LIMIT 1 OFFSET 100000) tmp
>>>> ' LANGUAGE SQL;
>>>
>>> LIMIT and OFFSET are often no more efficient than count(*). You're still
>>> likely to need a full table scan.
>>>
>>> Here's how I'd do it: I'd ANALYZE the table, then check the table statistics
>>> to see that they looked to be within reasonable bounds. That way you not
>>> only check the import, but in the process you ensure the statistics used by
>>> the query planner are up to date. Since ANALYZE only tests a sampling of
>>> records it does pretty much what you want, something that it's not so easy
>>> to do in SQL.
>>>
>>> --
>>> Craig Ringer
>>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Alban Hertroys
>
> --
> Screwing up is an excellent way to attach something to the ceiling.
>
>
> !DSPAM:1205,4de2b6e411923449910736!
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-05-29 23:27:18 Re: max_connections proposal
Previous Message Seb 2011-05-29 21:27:13 Re: database field list