django and postgres and transactions oh-my

Ok, so I’m having this issue where my django transactions are not working correctly .

Its time to decouple the big-mammoth app and take this down to its simplest form. So lets dismantle this thing and figure out what is going on. For reference I am going to use the models out of the django 1.5 tutorial so if you have done that tutorial a few times you should be able to follow along with no problem.

The stack :

  • django 1.5
  • postgres 9.2

Setup Devel Env

First things first, lets set up a pristine virtualenv in which we will pip install django 1.5 and ipython.
Note: I am using virtualenvwrapper here.

$ cd workspace
$ mkdir debug_transactions
$ mkvirtualenv transdebug

$ pip install django==1.5
$ pip install psycopg2
$ pip install ipython

Now you need to verify that the install looks correct.

$ which django-admin.py
/home/jds/.virtualenvs/transdebug/bin/django-admin.py

$ ipython
In [1]: import django
In [2]: django.get_version()
Out[2]: '1.5'

Start a new Django Project

alright this all looks good. So now lets start a new django project.

$ django-admin.py help
$ django-admin.py startproject transaction_proj
$ tree
.
|- manage.py
`- transaction_proj
   |- __init__.py
   |- settings.py
   |-urls.py
   `-wsgi.py
Summary
$ cd transaction_proj
$ chmod 755 ./manage.py help

Set up git

$ git init
$ git add .
$ git commit -m"initial"

A bit more configuration to get us ready to use PostgreSQL .

Modify ‘ transaction_proj/settings.py ‘ and set

DATABASES = {
    'default': {
        'ENGINE':  'django.db.backends.postgresql_psycopg2',
        'NAME': 'transdemo',
        'OPTIONS': {
                    'autocommit': True
                    },
        'USER':     'postgres',
        'PASSWORD': 'postgres'
    }       
}

Note: This is the django 1.5 way to set autocommit. In newer versions of django a different methodology is used.

When using postgres you have to create the dbms first.

$ sudo -u postgres psql
postgres=# CREATE DATABASE transdemo;
CREATE DATABASE
postgres=#\list
ctrl -d

Now for the test, to see if we wired up or dbms correctly.

$ ./manage.py syncdb

answer the questions for setting up Django’s auth system.

Create the Django Polls application

Now to create our models ( same ones from the django tutorial )

$ ./manage.py startapp polls

modify polls/models.py
create the class Poll and the class Choice. Just like in the django tutorial.

Add ‘polls’ to installed apps .

$ ./manage.py syncdb
$ ./manage.py shell
ctrl-d

Aside: the django-extensions app is cool

Ok so here we are going to diverge for a moment. I like a few handy extensions from django extensions. So lets add that now.

$ pip install django-extensions

Now add django_extensions to your settings.py INSTALLED_APPS tuple.

$ ./manage.py shell_plus

Logging Phase 1: Enable inline Logging

Here we are going to enable logging, with this configuration changes django will log all sql output to stdout. To do this you need to modify the settings.py LOGGING

 
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'require_debug_true': {
            '()': 'django.utils.log.RequireDebugTrue'            
        },
        'require_debug_false': {
            '()': 'django.utils.log.RequireDebugFalse'
        }
    },
    'handlers': {
        'mail_admins': {
            'level': 'ERROR',
            'filters': ['require_debug_false'],
            'class': 'django.utils.log.AdminEmailHandler'
        },
        'console': {
            'level': 'DEBUG',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
        }
    },
    'loggers': {
        'django.request': {
            'handlers': ['mail_admins'],
            'level': 'ERROR',
            'propagate': True,
        },
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        },        
    }
}

Now you can see SQL logging when you do your ‘shell_plus’ session. This means that when we do anything that does a database operation django will spit out the sql that it is calling behind the scenes. This is a very powerful way to figure out what is going on ‘behind-the-scenes’ with ORM usage.

Now lets go behind-the curtain and add in some data, and see if django is behaving the way we expect it too.

$ ./manage.py shell_plus

In [1]: Poll.objects.all()

(0.001) SELECT "polls_poll"."id", "polls_poll"."question", "polls_poll"."pub_date" FROM "polls_poll" LIMIT 21; args=()
Out[1]: []

from django.db import transaction
import datetime
from django.utils import timezone

dir(transaction)
In [9]: with transaction.commit_on_success():
...: p = Poll(question="What's new?", pub_date=timezone.now())
...: p.save()
...: p.choice_set.create(choice_text='Not much', votes=0)
...: p.choice_set.create(choice_text='The sky', votes=0)
...: c = p.choice_set.create(choice_text='Just hacking again', votes=0)
...:
(0.001) INSERT INTO "polls_poll" ("question", "pub_date") VALUES ('What''s new?', '2014-07-12 04:11:51.876114+00:00') RETURNING "polls_poll"."id"; args=("What's new?", u'2014-07-12 04:11:51.876114+00:00')
(0.001) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (1, 'Not much', 0) RETURNING "polls_choice"."id"; args=(1, 'Not much', 0)
(0.000) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (1, 'The sky', 0) RETURNING "polls_choice"."id"; args=(1, 'The sky', 0)
(0.000) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (1, 'Just hacking again', 0) RETURNING "polls_choice"."id"; args=(1, 'Just hacking again', 0)

Humm…Ok so this does appear to work. I.E. the p.save() on line 13 does not immediately spit out sql this is a good thing and makes me believe that transactions are working. However:

This concern’s me a bit.. As I expected to see sql transnational output there. i.e. sql BEGIN, or BEGIN TRANSACTION, or SET TRANSACTION, or COMMIT, or SAVEPOINT none of which I saw in the raw output.

After doing quite a bit of fumbling around I found some solid-gold . This was gold was bit misleading because it did not point out where this ‘ from utils.transaction import xact ‘ was coming from, which I later found at this Xof github project .

Lets hand install this lib and test it out.

$ cdsitepackages
$ cd django/utils
$ mkdir transactions
$ cd transactions
$ wget https://raw.githubusercontent.com/Xof/xact/master/__init__.py
$ wget https://raw.githubusercontent.com/Xof/xact/master/xact.py

Now back to the previous job @ hand. Get this to print out the sql for those transactions.

$ cd ~/workspace/debug_transactions/transaction_project
$ python ./manage.py shell_plus
In [1]: import datetime
In [2]: from django.utils import timezone
In [3]: from django.utils.transaction import xact
In [4]: with xact():
...: p = Poll(question="What's new2?", pub_date=timezone.now())
...: p.save()
...: p.choice_set.create(choice_text='Not much2', votes=0)
...: p.choice_set.create(choice_text='The sky', votes=0)
...: p.save()

(0.001) INSERT INTO "polls_poll" ("question", "pub_date") VALUES ('What''s new2?', '2014-07-12 05:18:01.221274+00:00') RETURNING "polls_poll"."id"; args=("What's new2?", u'2014-07-12 05:18:01.221274+00:00')
(0.001) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (6, 'Not much2', 0) RETURNING "polls_choice"."id"; args=(6, 'Not much2', 0)
(0.000) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (6, 'The sky', 0) RETURNING "polls_choice"."id"; args=(6, 'The sky', 0)
(0.001) SELECT (1) AS "a" FROM "polls_poll" WHERE "polls_poll"."id" = 6 LIMIT 1; args=(6,)
(0.000) UPDATE "polls_poll" SET "question" = 'What''s new2?', "pub_date" = '2014-07-12 05:18:01.221274+00:00' WHERE "polls_poll"."id" = 6 ; args=("What's new2?", u'2014-07-12 05:18:01.221274+00:00', 6)

Interesting.. there is now an additional stanza after the inserting section. There is a SELECT line followed by an UPDATE line operating on the polls_poll table. But still not BEGIN in the sql. After a bit more reading on the xact module I found this construct.

In [17]: with xact():
   ....:     with xact():
   ....:         p = Poll(question="What's new3?", pub_date=timezone.now())
   ....:         p.save()
   ....:         p.choice_set.create(choice_text='Not much3', votes=0)
   ....:         
(0.000) SAVEPOINT s1219967296_x1; args=()
(0.000) INSERT INTO "polls_poll" ("question", "pub_date") VALUES ('What''s new3?', '2014-07-14 18:25:03.846435+00:00') RETURNING "polls_poll"."id"; args=("What's new3?", u'2014-07-14 18:25:03.846435+00:00')
(0.000) INSERT INTO "polls_choice" ("poll_id", "choice_text", "votes") VALUES (11, 'Not much3', 0) RETURNING "polls_choice"."id"; args=(11, 'Not much3', 0)
(0.000) RELEASE SAVEPOINT s1219967296_x1; args=()

Humm…Ok so the behavior is exactly the same. I.E. the sql does not spit out after the p.save(). Also, I do see SAVEPOINT in the sql output now. However I still do not see the BEGIN and COMMIT sql that I would have expected.

…I am now befuddled…

PostgreSQL when it’s not your job

the

Here are a few highlights from the talk that I need to revisit:

Configuring postgres


  • Turn off OOM
  • Use EXT4 or XFS
  • be sure to set SHMMAX and SHMALL  kernel options.

Tuning Postgres

  • Logging
  • Resources
  • Checkpoints
  • Planner
  • Your done

Round 2: logging from postgresql

Turns out I really should be logging on the dbms and using that to analyze my troubles. So lets do that:
Logging Setup:

In ubuntu the logfile is in /etc/postgresql/9.2/main/postgresql.conf
In ubuntu the logfile is in
/etc/postgresql/9.2/main/
postgresql.conf

# JDS items from Christophe talk
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S'	# log file name pattern,
log_rotation_age = 1d
log_rotation_size = 1GB
#log_min_duration_statement = 250ms
# JDS set this to 0 to log everything
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0

Restart the postgres service

Can I actually log that this is a transaction?
In ubuntu the csv logfile will be located in the data_directory define. which is set to

/var/lib/postgresql/9.2/main/pg_log
/var/lib/postgresql/9.2/main/
pg_log

In rhel the configuration file is in..?? and the logfile is set to ..??

Re-Test: by reviewing the logs.
After setting up for logging lets retest the ‘with transaction.commit_on_success()’ stanza

In [4]: with transactions.commit_on_success():
...: p = Poll(question="What's new6?", pub_date=timezone.now())
...: p.save()
...: p.choice_set.create(choice_text='Not much6', votes=0)
...: p.choice_set.create(choice_text='The sky6', votes=0)
...: p.save()

Note: this logging output is from the postgresql logfile.

/var/lib/postgresql/9.2/main/pg_log/postgresql-2014-07-14_171003.csv
/var/lib/postgresql/9.2/main/pg_log/
postgresql-2014-07-14_171003.csv

2014-07-14 17:34:39.565 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,19,"BEGIN",2014-07-14 17:26:30 CDT,2/78,0,LOG,00000,"duration: 0.049 ms  statement: BEGIN",,,,,,,,,""
2014-07-14 17:34:39.565 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,20,"INSERT",2014-07-14 17:26:30 CDT,2/78,7975866,LOG,00000,"duration: 0.226 ms  statement: INSERT INTO ""polls_poll"" (""question"", ""pub_date"") VALUES ('What''s new6?', '2014-07-14 22:34:39.564938+00:00') RETURNING ""polls_poll"".""id""",,,,,,,,,""
2014-07-14 17:34:39.567 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,21,"INSERT",2014-07-14 17:26:30 CDT,2/78,7975866,LOG,00000,"duration: 0.209 ms  statement: INSERT INTO ""polls_choice"" (""poll_id"", ""choice_text"", ""votes"") VALUES (17, 'Not much6', 0) RETURNING ""polls_choice"".""id""",,,,,,,,,""
2014-07-14 17:34:39.568 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,22,"INSERT",2014-07-14 17:26:30 CDT,2/78,7975866,LOG,00000,"duration: 0.170 ms  statement: INSERT INTO ""polls_choice"" (""poll_id"", ""choice_text"", ""votes"") VALUES (17, 'The sky6', 0) RETURNING ""polls_choice"".""id""",,,,,,,,,""
2014-07-14 17:34:39.568 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,23,"COMMIT",2014-07-14 17:26:30 CDT,2/0,0,LOG,00000,"duration: 0.171 ms  statement: COMMIT",,,,,,,,,""

This defiantly ‘looks’ like its working. For fun lets test with the other xact module.

In [4]: with xact():
...: p = Poll(question="What's new7?", pub_date=timezone.now())
...: p.save()
...: p.choice_set.create(choice_text='Not much7', votes=0)
...: p.choice_set.create(choice_text='The sky7', votes=0)
...: p.save()

/var/lib/postgresql/9.2/main/pg_log/postgresql-2014-07-14_171003.csv
/var/lib/postgresql/9.2/main/pg_log/
postgresql-2014-07-14_171003.csv

2014-07-14 17:39:45.573 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,24,"BEGIN",2014-07-14 17:26:30 CDT,2/79,0,LOG,00000,"duration: 0.059 ms  statement: BEGIN",,,,,,,,,""
2014-07-14 17:39:45.573 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,25,"INSERT",2014-07-14 17:26:30 CDT,2/79,7975867,LOG,00000,"duration: 0.267 ms  statement: INSERT INTO ""polls_poll"" (""question"", ""pub_date"") VALUES ('What''s new7?', '2014-07-14 22:39:45.572973+00:00') RETURNING ""polls_poll"".""id""",,,,,,,,,""
2014-07-14 17:39:45.575 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,26,"INSERT",2014-07-14 17:26:30 CDT,2/79,7975867,LOG,00000,"duration: 0.207 ms  statement: INSERT INTO ""polls_choice"" (""poll_id"", ""choice_text"", ""votes"") VALUES (18, 'Not much7', 0) RETURNING ""polls_choice"".""id""",,,,,,,,,""
2014-07-14 17:39:45.576 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,27,"INSERT",2014-07-14 17:26:30 CDT,2/79,7975867,LOG,00000,"duration: 0.182 ms  statement: INSERT INTO ""polls_choice"" (""poll_id"", ""choice_text"", ""votes"") VALUES (18, 'The sky7', 0) RETURNING ""polls_choice"".""id""",,,,,,,,,""
2014-07-14 17:39:45.664 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,28,"SELECT",2014-07-14 17:26:30 CDT,2/79,7975867,LOG,00000,"duration: 86.230 ms  statement: SELECT (1) AS ""a"" FROM ""polls_poll"" WHERE ""polls_poll"".""id"" = 18  LIMIT 1",,,,,,,,,""
2014-07-14 17:39:45.665 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,29,"UPDATE",2014-07-14 17:26:30 CDT,2/79,7975867,LOG,00000,"duration: 0.309 ms  statement: UPDATE ""polls_poll"" SET ""question"" = 'What''s new7?', ""pub_date"" = '2014-07-14 22:39:45.572973+00:00' WHERE ""polls_poll"".""id"" = 18 ",,,,,,,,,""
2014-07-14 17:39:45.666 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,30,"COMMIT",2014-07-14 17:26:30 CDT,2/0,0,LOG,00000,"duration: 0.174 ms  statement: COMMIT",,,,,,,,,""
2014-07-14 17:39:45.666 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,31,"SHOW",2014-07-14 17:26:30 CDT,2/0,0,LOG,00000,"duration: 0.046 ms  statement: SHOW default_transaction_isolation",,,,,,,,,""
2014-07-14 17:39:45.666 CDT,"postgres","transdemo",8249,"[local]",53c45916.2039,32,"SET",2014-07-14 17:26:30 CDT,2/0,0,LOG,00000,"duration: 0.032 ms  statement: SET default_transaction_isolation TO DEFAULT",,,,,,,,,""

…Interesting…

You will notice there is an additional update here as well as some modification of the default_transaction_isolation . For the details about what is going on here follow this recipe , and read the readme for the xact module

Quick Diagnosis


Another take away from the Christophe talk is that there is a technique for peeking at a running prod system and debugging transaction locking problems.

There are 2 tables that you need to be looking into for transaction locking woes:

pg_stat_activity
pg_locks

When you get into lock contention. Goto pg_stat_activity and see who is waiting.

sudo -u postgres psql transdemo
transdemo=# select * from pg_stat_activity where waiting='t';

Then go into pg_locks and see who has a lock.

transdemo=#  select * from pg_locks;

Summary

  • Use postgres logging to debug transactions. Django sql logging is handy, buy will not do the trick to debug transactions.
  • if using django 1.5 and postgres, use the python xact module and follow this recipe to have proper functioning transactions.

Reference:

Most of these links are from: Christophe Pettus software development blog. Which is solid gold.

TODO add the other link to stacked overflow for how to enable logging.