Benchmarking queries in Django

So this is not a post specific to Django, but the examples I give are.

The problem I was looking for a solution for was that I wanted to benchmark my Django ORM queries to find ways to improve performance. More specifically, I wanted to find the optimum way to count objects returned as querysets.

I stumbled across MAX() and COUNT() and wanted a raw speed performance comparison between the two. Turns out there is a way without leaving the python shell.

This code is attributed to sleepycal on djangosnippets.org. Original post can be found here.

Here it is:


>>> from django.db.models import Max,Count 
>>> _t = time.time(); x = Article.objects.aggregate(Max('id')); "Took %ss"%(time.time() - _t )
'Took 0.00190091133118s'

and using Count():


>>> _t = time.time(); x = Article.objects.aggregate(Count('id')); "Took %ss"%(time.time() - _t )
'Took 1.34142112732s'

As you can see – quite a significant difference on ~108 000 rows. This is a good way to get a quick look at how a queryset call performs when there is more than one way in which it can be constructed.

Advertisements

2 thoughts on “Benchmarking queries in Django

  1. Sometimes following the KISS principle is better.
    Why would you overcomplicate the simplest, fastest, easiest thing that all databases perform at their optimum?

    I am saying so because I “trust” the Django ORM and I “know” how a database works,
    so, if you just like to count how many objects are in the Article model, the quickest way is doing a simple

    Article.objects.count()

    because it performs this query on the database:

    SELECT COUNT(*) FROM article;

    and this is the fastest way to achieve the result wanted.

    PS: Benchmarking is usually composed of more than one test, in fact to appreciate the variability and stability of your goal, you have to repeat the same single atomic procedure more than twice.

  2. @bafio You are quite right. However, I think it is always useful to be aware of the variety of ways you can achieve a result. This post serves more to demonstrate that you can time the results of a query than finding the very best way of doing the query. Of course you should run the query in a loop and rinse and repeat. I really just wanted to communicate the benchmarking code that I stumbled on on my way to finding the best performance solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s