Tag: Python

Executing a lot of SQL queries with Django’s db API? Watch out for one thing.

I recently had the need to write an indexer that processes a lot of text files and creates a huge number of database rows (19 million+). This is a background script for an internal django website so I ended up using the standard django psycopg2 library, but tried to minimize the overhead by executing queries directly using the db cursor. Even after this consideration and special ‘del’ statements to clean things out after regular intervals, to my surprise, the process was still taking up a lot of memory. So, I tweeted about this experience. Daniele Varrazzo, one of the authors of psycopg2, tweeted back with more info.

Read full post...

Speeding up django’s development server on Windows

I am very picky about my development environment and I need it to be just right, otherwise the fun part of programming disappears. I have a dedicated Linux server in my office that is sharing and serving the development files. This is a solid server, fast enough that django’s development server refreshes as soon as I save the files, even before I have switched to my browser; and that’s how I like it! 🙂 Lately I have been on the road quite a bit so I have had to run the development environment on my tablet (when in Windows 7; runs excellent in Linux). The tablet has OK specs: 1.4GHz Core Duo with 2GB RAM and a 7200 RPM drive (generally the bottleneck). But for some reason django’s development server seems especially slow at serving the files. The refreshes after changes are OK, not fast, but OK. It is the media that it is very slow at serving (understandably so).

I did a lot of research on my options to speed this up. I am using the standard CPython distribution on Windows. I saw a lot of references to unladden-swallow, but there weren’t a lot of benchmarks to prove the speed gain yet. I realize that this is still under very heavy development, but the one benchmark that I found really excited me so I decided to give it a try. Unfortunately, after hunting for a number of source code packages necessary for compilation and still not succeeding I concluded that it wasn’t worth the time yet 😐 I decided to rule out pypy because of the possibility of compatibility issues, I wanted something that I could plug into the existing system. For some of my projects I am using external libraries, which might not work with pypy.

Anyways, my solution ended up involving Apache. Based on the console output of django’s dev server I had an idea that it was slow at serving multiple files. So I decided to serve the media, which generally is the majority of the files in a given view, using Apache and let django’s server deal only with the views. Microsoft’s IIS is also an option, but I had Apache setup for another project so I decided to use that. Below is a part of my dev_settings.py that makes this change.

1
2
3
4
5
6
7
8
9
import socket
# ...

# System specific dev settings.
if socket.gethostname() == "mystic":
	MEDIA_URL = 'https://thebitguru.com/projectname_media/'
	SERVE_STATIC_FILES = False
else:
	SERVE_STATIC_FILES = True

With this new combination and using 127.0.0.1 instead of localhost now my dev environment on Windows is fast enough to keep things interesting.

Project logic, coded!

The other day I was thinking if there was a logic to how I took on new projects and actually performed them. So, last night I wrote the following python program

This is some imaginary python code reflecting my current state of mind as it relates to work/side projects. Do you have any advice for me? Code it in the comments! 🙂

Read full post...

gawk script for updating your models.py

Last evening I updated my django source to the latest SVN and found a surprise. The latest set of backwards incompatible changes were affecting my project. Specifically the new ModelAdmin moves were making things hard for me. Given that I have a few applications in django, I decided to write a script that would help speed up the process. What I ended up with was a simple gawk script that processes the models.py file and outputs the code for new ModelAdmin classes. I still have to manually remove the original admin classes because I didn’t want this to mess up things, but not having to look through the models and copy-pasting should save you some time.

For instance, if your models.py has three models-Tag, TaggedItem and NodeGroup-with Admin classes defined then this script will print out…

$ awk -f admin_model.awk models.py
class NodeGroupAdmin(admin.ModelAdmin):
                date_hierarchy = 'created_on'
                list_per_page = 25
                search_fields = ('title', )
                list_display_links = ('title',)
                list_display = ('id', 'title', 'created_on', 'updated_on', 'published', 'group_type',)
                list_filter = ('published','group_type',)
class TaggedItemAdmin(admin.ModelAdmin):
                pass
class TagAdmin(admin.ModelAdmin):
                pass
admin.site.register(Tag, TagAdmin)
admin.site.register(TaggedItem, TaggedItemAdmin)
admin.site.register(NodeGroup, NodeGroupAdmin)
$

I have added the usage instructions at the top of the script, and here is the script.

Getting psycopg2 to work in cygwin

A few months ago I tried to setup django with Postgresql in Windows through cygwin. Part of this setup included installing the pyscopg2 module for python. Interestingly everything would compile and install OK, but when it came time to import the module, python would complain that it couldn’t find a file, specifically _psycopg.

>>> import psycopg2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/psycopg2/__init__.py", line 60, in <module>
    from _psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
ImportError: No such file or directory

At first I decided to let it slide and do away with cygwin and go the Windows python route for the normal development, and using my “500MHz overclocked to 555MHz linux machine” to do advanced stuff. Lately this solution has been painful because of the background jobs has a significant difference in processing time; by significant I mean 22 minutes vs 5 minutes! So, over the past few days I decided that I was going to find out the solution to this. I started my investigation the normal way, doing some research on my own, checking in the IRC channels for python, asking on the psycopg mailing list, but no luck!

Read full post...

Quickly running doctests in ipython

Python has a nice module called doctest that lets you embed test cases with the function documentation. This is a really nice time saver because these test cases also serve as examples, which I would have to write anyways. I was working on a function where I decided to write a few doctests, but I did not like waiting for django’s test runner to create the database and then run the tests, which wasn’t necessary for these specific tests. So, I thought why not manually run the doctests in a django shell?

ipython has support for outputting in doctest mode (%doctest_mode magic function), but it doesn’t come with any magic functions that will let you quickly run doctests. So, I ended up writing the following magic function that will let you accomplish this.

To use this, here is what you do.

  1. Edit your ~/.ipython/ipy_user_conf.py file…
    1. Copy and paste the dodoctest function at the bottom (before the main() call).
    2. Copy and paste the ip.expose_magic call all the way at the end of the file (i.e. after the main() call).
  2. Run django shell (or ipython directly): django-admin.py shell
  3. Import your models: import myproject.myapp.models
  4. Call the %dodoctest function with your models as a parameter:
    %dodoctest myproject.myapp.models

The magic function.

1
2
3
4
5
6
7
8
9
def dodoctest(self, arg):
    """Initializes the test runner and runs the doc tests for the specified object."""
    ip = self.api
    ip.ex("""import doctest, unittest;
suite = unittest.TestSuite()
suite.addTest(doctest.DocTestSuite(%s))
runner = unittest.TextTestRunner()
runner.run(suite)
""" % arg)

The call to register the above magic function.

1
ip.expose_magic('dodoctest', dodoctest)

Unicode Fun!

Bottom line: If you keep getting “No JSON object could be decoded” when loading JSON objects in django that look perfectly valid then make sure that you are using ASCII encoding.

I have been experimenting with django’s test framework and I recently hit a wall while creating a JSON fixture. Django wouldn’t like my JSON file and keep spitting out…

Problem installing fixture 'thebitguru/nodes/fixtures/nofmt_nodes.json': No JSON object could be decoded

The fixture was very simple and for the longest time I couldn’t figure out what was wrong. I had used the code>dumpdata command to dump the JSON in the first place, but after Magus- on #django (IRC) asked me if that’s what I had done, I decided to do it again. This time somehow magically it worked! I used TortoiseMerge to diff the two files and it claimed that they were exactly the same. Obviously, they weren’t!

I, being I who I am, wasn’t satisfied without finding out really why my initial export was not working. So, off I went to do some investigation. After about fifteen minutes and several different hex dumps and google searches I was looking at this wikipedia entry. Yes, 0xFEFF! That was the key to this puzzle. So, somehow my initial export had ended up in UTF-16 Big Endian encoding. A quick encoding change in VIM, set fileencoding=ascii, and everything was back to normal.

I took a trip down the memory lane and realized that I was using PowerShell at the time, and considering that it is fairly new and fancy shell I guessed that it must probably is using Unicode as the default output. If you look through the PowerShell User Guide you will see this specifically mentioned…

By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:

PS> Get-Process | Out-File -FilePath C:\temp\processlist.txt -Encoding ASCII

Another quick test verified that this was in fact the cause. Whew! Finally, another mystery solved!

Lambda functions in Python

Python

The lambda functions in Python always seemed very cool to me, but I never thought that I would actually get to use them in my projects. Guess what? Last night I was working on a side project when I had the perfect opportunity to use and very much appreciate this feature!

As part of my move of this site to django I have to move over the “custom” textile format parsing that I created for this site. Basically, it’s mostly textile with the additional feature of allowing you to easily embed the attached image or file. Below is the Python specific regular expression that defines this syntax.

pre.
\[(?Pinline|image):(?P[<>]?)(?P\S+?)(=(?P.*?))?\]

This allows me to easily reference an attached image and instructor the site to resize based on my needs. So, after I upload an image for a blog entry all I have to do is put [image:>0=thumb] (right aligned “standard” thumbnail sized, as predefined by me) or [image:0=200x100] (default alignment with 200×100 resize keeping the original image in proportion) wherever I want an embedded image to be displayed. With this I don’t have to worry about resizing the image or adding the img tag manually, all that is done automatically by my Node class.
Ruby has a useful notation where I could do this in a block notation as…

1
2
3
replaced = txt.gsub(/\[(inline|image):([<>]?)(\S+?)(=(.*?))?\]/m) {
  // do all the custom code in this block.
}

Since python doesn’t have the concept of blocks it uses the approach of passing in functions that return the replacement text.

1
2
3
4
5
6
7
8
def replacement_function(match):
  // interpret the match in this function
	return "...replacement for match." 

def interpret_attachments(text):
  attach = re.compile(r'\[(?P<type>inline|image):(?P<alignment>[<>]?)(?P<positionOrAlias>\S+?)(=(?P<size>.*?))?\]', re.MULTILINE | re.IGNORECASE)
  replaced = attach.sub(replacement_function, text)
  return replaced

Now, the problem comes in if replacement_function needs some extra parameter, which in my case was node_id. How can you pass that to replacement_function?

1
2
3
def replacement_function(node_id, match):
	// interpret the match in this function
	return "...replacement for match." 

That is where lambda function comes in. I ended up encapsulating the call in a lambda function and passing that to the sub() function.

1
2
3
4
def interpret_attachments(text, node_id):
  attach = re.compile(r'\[(?P<type>inline|image):(?P<alignment>[<>]?)(?P<positionOrAlias>\S+?)(=(?P<size>.*?))?\]', re.MULTILINE | re.IGNORECASE)
  replaced = attach.sub(lambda match: get_replacement(node_id, match), text)
  return replaced

I remembered seeing numerous references to lambda functions in the django framework, so now that I understood the need for these, I went back and looked at a few examples. They all make much more sense now!

Here is an excerpt from django/contrib/auth/decorators.py

1
2
3
4
5
6
def permission_required(perm, login_url=None):
    """
    Decorator for views that checks whether a user has a particular permission
    enabled, redirecting to the log-in page if necessary.
    """
    return user_passes_test(lambda u: u.has_perm(perm), login_url=login_url)

…and one more example from django/template/defaultfilters.py

1
2
3
4
5
def title(value):
    """Converts a string into titlecase."""
    return re.sub("([a-z])'([A-Z])", lambda m: m.group(0).lower(), value.title())
title.is_safe = True
title = stringfilter(title)

Awesome!

If you are confused about lambda functions then I would suggest reading through the filter, map and reduce section on the python site. That section demonstrates good usage of lambda functions.