Re: Connection function

From: Bill House <wch-tech(at)house-grp(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Connection function
Date: 2012-03-26 01:00:45
Message-ID: 4F6FBFBD.9060408@house-grp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 03/25/2012 02:12 PM, Adrian Klaver wrote:
> On 03/25/2012 08:14 AM, Bill House wrote:
>> On 03/24/2012 08:25 PM, Adrian Klaver wrote:
>>> On 03/24/2012 05:03 PM, Bill House wrote:
>>>> Hello all,
>>>>
>>>> I am very new to postgresql and python, but I have written a
>>>> function to
>>>> retrieve some admin data from the server.
>>>>
>>>> I think this is more of a python question rather than a psycopg2
>>>> question, but the context is psycopg2.
>>>>
>>>> I was wondering if this is the right place to post it and ask for a
>>>> critique?
>>>
>>> I would say yes, though I am not sure what the question is:)?
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bill
>>>>
>>>
>>>
>> Thanks,
>>
>> Sometimes it is helpful to know how many records one is working with.
>>
>> From my reading in the psycopg2 documentation (2.4.4) and
>> experimentation, I think that this information is only available if one
>> uses an unnamed cursor (in which case all the data will have been
>> fetched to the client) or with an named cursor (but only after all the
>> data has been fetched).
>>
>> If there is a large amount of data, this may be problematic.
>>
>> Am I wrong about this? If so, please let me know.
>
> See comments in line.
>
>>
>> Anyway, the way I approached this issue was to count the records I was
>> interested in and return the value.
>>
>> The code below works.
>>
>> I am interested in a critique of it from more experienced persons. Are
>> there more elegant or efficient ways to do what I have done?
>>
>> In particular, I am interested in:
>> 1) the python string substitution technique I used
>
> That works, though it also possible to use dictionaries for the
> Psycopg strings see here:
>
> http://initd.org/psycopg/docs/usage.html#query-parameters
>
> I find the dict form easier to use for many parameters because it is
> self documenting.
>
>
> The rest of the string substitution seems to be tied to using exec
> which I do not think is necessary, see below for more.
>
>
>> 2) the choice of the postgresql source of data I was seeking (the
>> record count)
>> 3) if there was an existing function I overlooked which does the
>> same thing (in psycopg2 or postgresql)
>>
>> Regarding # 2 above, I had been relying on the table:
>> pg_stat_user_tables and the n_live_tup value. This item has shown the
>> correct count for many days. Today, after updating my machines and
>> rebooting, the count is 0 and I am not really sure why. Records are
>> still there; and even though I have selected data from the database, the
>> count is still 0.
>
> The above are system tables/views used by the statistics mechanism of
> Postgres. For a starting point on this look at:
>
> http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html
>
> Short version running Analyze will repopulate the tables.
>
>>
>> So I took the more direct but time consuming approach of just counting
>> the records in the table.
>
> That is the only way to get a 'true' count due to the MVCC nature of
> Postgres. There has been a recent optimization of that, but I cannot
> remember what version of Postgres has it.
>
>>
>> This will be better anyway because many times one will not be selecting
>> all the records in a table, so this function can be expanded to include
>> a parameter for the selection criteria so that that identical criteria
>> may be used again in the work portion of any operation.
>>
>> Thanks in advance for your input.
>>
>> Bill
>>
>> code (indent 2):
>> -------------------------------------------------------
>> #!/usr/bin/python
>> # z_psycopg2_row_cnt7f.py
>> # A program to illustrate the methods of creating or supplying runtime
>> variables
>> # to functions using psycopg2 functions
>>
>> import sys
>> import os
>>
>> import psycopg2
>> import psycopg2.extras
>> import psycopg2.extensions
>>
>> #for string generator
>> import string
>> import random
>>
>> con = None
>>
>> def gen_random_str(str_len):
>> """generates a random string str_len long -
>> randomly chooses and assembles a number of characters from the digits
>> and the upper and lower case alphabet """
>> rand_str = ''.join(random.choice(string.ascii_uppercase +
>> string.digits + string.ascii_lowercase) for x in range(str_len))
>> return rand_str
>>
>> def rec_cnt_func(conn, table_name):
>> """Count the records selected from the table, right now all of
>> them"""
>> #Generate a random string to use as a curson name (short right now
>> for
>> debugging purposes)
>> #The commands following will use this string
>> rand_curs_name = gen_random_str(3)
>> #Substitute data into a command string and execute
>> #create the cursor
>> exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" %
>> (rand_curs_name,)
>> #count the records in the table
>> exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name,
>> table_name)
>
> Not sure why you are using exec? Why not?:
>
> cur =
> conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name)
>
> Same for below.
>
>> #fetch the data
>> exec "row = %s.fetchone()" % (rand_curs_name,)
>> #All done now, throw the cursor away
>> exec "%s.close()" % (rand_curs_name,)
>> #return the data to the calling program
>> return row[0]
>>
>>
>> def main():
>> #supply your connection string here
>> current_dsn = "host=xx"
>>
>> try:
>> con = psycopg2.connect(current_dsn)
>>
>> #supply your table name here
>> table_name_in = "xxxx"
>>
>> #Call the function which will retrieve the data from postgresql
>> rec_cnt_val = rec_cnt_func(con, table_name_in)
>>
>> #print the formatted data in an informative way
>> print "Records in table: {}: {:> 20,}".format(table_name_in,
>> rec_cnt_val)
>>
>> except psycopg2.DatabaseError, e:
>> print "psycopg2 reports error: {}".format(e)
>> sys.exit(1)
>>
>> finally:
>>
>> if con:
>> con.close()
>> return
>>
>> if __name__ == '__main__':
>> main()
>> -----------------------------------------------------
>> end code
>>
>
>
Thanks, I will do the reading you have suggested.

Regarding your question about why the use of exec: I read where psycopg2
has quite a bit of capability for parameter substitution, but that
functionality is limited to just psycopg2.

I grew to rely on runtime construction, substitution and execution of
commands in another environment and I was trying to mimic that behavior
in a more general way in python. This way works, I was just wondering
if there was a better way.

Regarding the statistics, I have since learned that the command:

select reltuples from pg_class where relname = 'your_file_name';

Will give a record count. I don't know how well it will keep up in a
dynamic environment but it's a start.

Thanks,

Bill

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Adrian Klaver 2012-03-26 01:58:31 Re: Connection function
Previous Message Adrian Klaver 2012-03-25 19:12:32 Re: Connection function