A keyword cloud in Django

Today I spent a large amount of time trying to do something which seemed very straight forward at first. I assume anyone who has even a brief acquaintance with writing code is familiar with this experience.

Essentially I had an application which concerned itself with retrieving and displaying scientific articles from a database. Each article had zero or more keywords associated with it, and of course each unique keyword could be associated with one or more articles. This is the basis of a classic many-to-many relationship and so I coded my models as such:

# keywords
class Keyword(models.Model):
    keyword = models.CharField(max_length=355, blank=True)

# article
class Article(models.Model):
    volume = models.ForeignKey(Volume)
    title = models.TextField(blank=True)
    slug = models.SlugField(max_length=100, db_index=True)
    keywords = models.ManyToManyField(Keyword, related_name="keyword_set", null=True, blank=True)
    start_page = models.IntegerField(null=True, blank=True)
    end_page = models.IntegerField(null=True, blank=True)
    authors = models.ManyToManyField(Author, related_name="author_set", null=True, blank=True)
    file = models.CharField(max_length=765, blank=True)

Everything is pretty straightforward at this point. The Django ORM takes care of the association between these tables behind the scenes. It is enough to know that an intermediate table is created at the database level to manage the relationship of the keywords to articles, due to the absence of a many-to-many relationship provided by the database.

Now, for my keyword cloud, what I wanted seem quite simple: I need to get a count for each unique keyword (i.e. how many different articles refer to each keyword), and turn that into a relative weighting (popularity) which can be passed to a css tag. The css tag allows me to colour/ size the keyword depending on its relative weighting popularity.

The key is to get the count per keyword. After some experimentation in the shell (insanely useful for this type of stuff) and some discussion on StackOverflow I came up with the two key lines:

    # reverse lookup using the related_name 'keyword_set' manager
    keywords_with_article_counts = Keyword.objects.all().exclude(keyword__in=excludes).annotate(count=Count('keyword_set'))
    # make list of dictionaries using returned values, order by count descending 
    keywords = keywords_with_article_counts.values('keyword', 'count').order_by('-count')

From here I could construct the rest of my view which would take care of the weightings and return a dictionary of keywords, weights and counts:

def keyword_cloud(request):
    #: keywords - a function for generating a keyword cloud
    # define maximum rank as weight for CSS tags
    MAX_WEIGHT = 5
    # define number of keywords to display in cloud
    NUMBER_OF_KEYWORDS = 25
    # reverse lookup using the related_name 'keyword_set' manager
    keywords_with_article_counts = Keyword.objects.all().exclude(keyword__in=excludes).annotate(count=Count('keyword_set'))
    # make list of dictionaries using returned values, order by count descending 
    keywords = keywords_with_article_counts.values('keyword', 'count').order_by('-count')[:NUMBER_OF_KEYWORDS]
    # set min_count and max_count to highest returned count value initially
    min_count = max_count = keywords[0]['count']
    for keyword in keywords:
        if keyword['count'] < min_count:
            min_count = keyword['count']
        if max_count < keyword['count']:
            max_count = keyword['count']            
    range = float(max_count - min_count)
    if range == 0.0:
		range = 1.0
    for keyword in keywords:
		keyword['weight'] = int(
			MAX_WEIGHT * (keyword['count'] - min_count) / range
		)
    return { 'keywords': keywords}

Then in my template I could display the could with just a couple of line:

<div id="tagCloud">
	<h3>Keywords</h3>
	{% for keyword in keywords|shuffle %}
		<a href="/search/?q={{ keyword.keyword|urlencode }}" class="tagCloud-{{ keyword.weight }}">{{ keyword.keyword }}</a>
	{% endfor %}
</div>

And the relevant CSS:

a.tagCloud-0 {
    font-size: x-small;
    color: #669EC2;
} 
a.tagCloud-1 {
    font-size: small;
    color: #6B66C2;
} 
a.tagCloud-2 {
    font-size:medium;
    color: #A666C2;
} 
a.tagCloud-3 {
    font-size:large;
    color: #C167AF;
} 
a.tagCloud-4 {
    font-size:larger;
    color: #0765D3;
}
a.tagCloud-5 {
    font-size:x-large;
    color: #0765D3;
}

Almost there. Left like this the view/ template displays an ordered cloud of keywords. However, what I really wanted was a fancy randomised list. I could shuffle my keywords in two ways. First was in the view by adding these lines at the end:

def keyword_cloud(request):
    ...
    # cast keywords queryset as a list in order to use item assignment (used by random.shuffle)
    keywords = list(keywords)
    # shuffle list
    random.shuffle(keywords)
    # return randomised list
    return { 'keywords': keywords}

Or I could add a custom template tag (called shuffle.py for example):

import random
from django import template
register = template.Library()

@register.filter
def shuffle(arg):
    tmp = arg[:]
    random.shuffle(tmp)
    return tmp

And then import it into the template:

{% load shuffle %}

Either way worked for me, though I’m sure someone cleverer than I am can comment on the relative merits of each. Personally, I prefer to take care of this stuff in view and leave the template less busy.

And that is my keyword cloud. Of course there are packaged alternatives such as django-tagging but I was useful exercise to do it myself and enabled me to get the exact results I needed.

Advertisements

Setting up Django Debug Toolbar

Django Debug Toolbar is a very useful piece of django middleware which gives you a side panel displaying useful information about your app, including requests and sql queries.

As with any new piece of software in the hands of a novice, I had a few teething problems setting the toolbar up on my local machine. To remind me and to help others, I have distilled my morning’s work into the following succinct steps:

  1. Install django-debug-toolbar. I used
    $ sudo easy_install django_debug_toolbar

    though there are various methods. Just make sure the package ends up on your python path.

  2. Now we add the package to the MIDDLEWARE_CLASSES in your project settings.py:
    MIDDLEWARE_CLASSES = (
        'django.middleware.common.CommonMiddleware',
        'django.contrib.sessions.middleware.SessionMiddleware',
        'django.middleware.locale.LocaleMiddleware',
        'django.middleware.csrf.CsrfViewMiddleware',
        'django.contrib.auth.middleware.AuthenticationMiddleware',
        'django.contrib.messages.middleware.MessageMiddleware',
        'django.middleware.csrf.CsrfResponseMiddleware',
        'debug_toolbar.middleware.DebugToolbarMiddleware',
    )

    The debug toolbar should be after any other middleware that encodes the response content, so it is best to place it last in the middleware sequence.

  3. We also add the package to  INSTALLED_APPS in the project settings.py:
    INSTALLED_APPS = (
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.sites',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'django.contrib.admin',
        'debug_toolbar',
    )
  4. Finally, make sure that the following appears in the project settings.py:
    INTERNAL_IPS = ('127.0.0.1')
  5. Now fire up your project, and all being well, the django debug side panel should appear in the browser window. If this does not happen, the first thing to check is whether you have <body></body> tags in your html. If you switch to your project admin and the debug toolbar appears, this is most likely to be the issue. Other things to check are that the debug toolbar package is installed and on the python path:
    >>> import sys
    >>> print sys.path
    
Useful links:

Setting up Sphinx Documentation with Django

Having generated my first few real applications in Django, it became obvious there was at least one glaring omission to my development process – documentation!

I knew there were various solution to this problem, but after DjangoCon 2011 there was only one solution that seems to be talked about seriously – Sphinx.

Initial reading of the documentation and tutorials left me a little bewildered. However, after I managed to successfully setup and make my first html docs, it all seemed a little more straightforward. To help clarify this process for myself and anyone else in a similar situation, here are the steps I took.

Note: I am using Ubuntu linux. Process may differ for Mac/ Windows

        1. Install sphinx using
          $ easy_install sphinx
        2. Navigate to your project directory and run
          $ sphinx-quickstart
        3. Answer all questions. Importantly, answer yes to
          autodoc: automatically insert docstrings from modules (y/N) [n]:
          
        4. This should produce a directory structure like so:
          • Makefile
          • _build
          • _static
          • _templates
          • conf.py
          • index.rst
        5. Now we can run
          $ make html

          to create the first version of our docs. This will create the _build/html directory containing the html files

        6. Open index.html in the browser and you should see your empty (for the moment) project docs
        7. Now, to make use of the handy autodoc extension, we need to make sure conf.py is setup properly. First check that the autodoc extension is enabled:
          # Add any Sphinx extension module names here, as strings. They can be extensions
          # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
          extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.pngmath', 'sphinx.ext.viewcode']

          Then edit the file to include the path and Django project settings:

          # documentation root, use os.path.abspath to make it absolute, like shown here
          sys.path.insert(0, os.path.abspath('.'))
          # setup Django
          import settings
          from django.core.management import setup_environ
          setup_environ(settings)
        8. Now we can test this by referencing our first module. Create another directory called /modules in the project root and created models.rst to reference my models.py module in my app. In this write:
          Models
          =======
          
          .. automodule:: myapp.models
              :members:
          
          
        9. Now in index.rst we can add a reference to this module:
          Contents:
          =========
          
          .. toctree::
          :maxdepth: 2
          
          modules/models.rst
          
          Indices and tables
          ==================
          
          * :ref:`genindex`
          * :ref:`modindex`
          * :ref:`search`
        10. The autodoc extension should automatically extract information and comments about the models from models.
        11. Run
          $ make html

          again and refresh the docs in the browser. Now there should be section entitled models and page listing the classes, fields and comments of your models.py module

        12. From here you can explore some of the further options in autodoc such as:
          Models:
          ======
          
          .. automodule:: myapp.models
              :members:
              :undoc-members:
              :inherited-members:

Some useful links: