Re: cyclical redundancy checksum algorithm(s)?

From: John Sidney-Woollett <johnsw(at)wardbrook(dot)com>
To: Karen Hill <karen_hill22(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: cyclical redundancy checksum algorithm(s)?
Date: 2006-09-28 06:09:43
Message-ID: 451B6727.30609@wardbrook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Stepping back a bit...

Why not use an update trigger on the affected tables to record a
lastupdated timestamp value when the record is changed.

Surely this is simpler thanks computing some kind of row hash?

John

Karen Hill wrote:
> Tom Lane wrote:
>> "Karen Hill" <karen_hill22(at)yahoo(dot)com> writes:
>>> Ralph Kimball states that this is a way to check for changes. You just
>>> have an extra column for the crc checksum. When you go to update data,
>>> generate a crc checksum and compare it to the one in the crc column.
>>> If they are same, your data has not changed.
>> You sure that's actually what he said? A change in CRC proves the data
>> changed, but lack of a change does not prove it didn't.
>
>
> On page 100 in the book, "The Data Warehouse Toolkit" Second Edition,
> Ralph Kimball writes the following:
>
> "Rather than checking each field to see if something has changed, we
> instead compute a checksum for the entire row all at once. A cyclic
> redundancy checksum (CRC) algorithm helps us quickly recognize that a
> wide messy row has changed without looking at each of its constituent
> fields."
>
> On page 360 he writes:
>
> "To quickly determine if rows have changed, we rely on a cyclic
> redundancy checksum (CRC) algorithm. If the CRC is identical for the
> extracted record and the most recent row in the master table, then we
> ignore the extracted record. We don't need to check every column to be
> certain that the two rows match exactly."
>
>> People do sometimes use this logic in connection with much wider
>> "summary" functions, such as an MD5 hash. I wouldn't trust it at all
>> with a 32-bit CRC, and not much with a 64-bit CRC. Too much risk of
>> collision.
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message smartdude 2006-09-28 06:12:04 Strange pg_ctl behavior: postmaster shuts down on shell interrupt
Previous Message Gene 2006-09-28 04:21:40 grant select on all tables of schema or database