A keyword cloud in Django

Today I spent a large amount of time trying to do something which seemed very straight forward at first. I assume anyone who has even a brief acquaintance with writing code is familiar with this experience.

Essentially I had an application which concerned itself with retrieving and displaying scientific articles from a database. Each article had zero or more keywords associated with it, and of course each unique keyword could be associated with one or more articles. This is the basis of a classic many-to-many relationship and so I coded my models as such:

# keywords
class Keyword(models.Model):
    keyword = models.CharField(max_length=355, blank=True)

# article
class Article(models.Model):
    volume = models.ForeignKey(Volume)
    title = models.TextField(blank=True)
    slug = models.SlugField(max_length=100, db_index=True)
    keywords = models.ManyToManyField(Keyword, related_name="keyword_set", null=True, blank=True)
    start_page = models.IntegerField(null=True, blank=True)
    end_page = models.IntegerField(null=True, blank=True)
    authors = models.ManyToManyField(Author, related_name="author_set", null=True, blank=True)
    file = models.CharField(max_length=765, blank=True)

Everything is pretty straightforward at this point. The Django ORM takes care of the association between these tables behind the scenes. It is enough to know that an intermediate table is created at the database level to manage the relationship of the keywords to articles, due to the absence of a many-to-many relationship provided by the database.

Now, for my keyword cloud, what I wanted seem quite simple: I need to get a count for each unique keyword (i.e. how many different articles refer to each keyword), and turn that into a relative weighting (popularity) which can be passed to a css tag. The css tag allows me to colour/ size the keyword depending on its relative weighting popularity.

The key is to get the count per keyword. After some experimentation in the shell (insanely useful for this type of stuff) and some discussion on StackOverflow I came up with the two key lines:

    # reverse lookup using the related_name 'keyword_set' manager
    keywords_with_article_counts = Keyword.objects.all().exclude(keyword__in=excludes).annotate(count=Count('keyword_set'))
    # make list of dictionaries using returned values, order by count descending 
    keywords = keywords_with_article_counts.values('keyword', 'count').order_by('-count')

From here I could construct the rest of my view which would take care of the weightings and return a dictionary of keywords, weights and counts:

def keyword_cloud(request):
    #: keywords - a function for generating a keyword cloud
    # define maximum rank as weight for CSS tags
    MAX_WEIGHT = 5
    # define number of keywords to display in cloud
    NUMBER_OF_KEYWORDS = 25
    # reverse lookup using the related_name 'keyword_set' manager
    keywords_with_article_counts = Keyword.objects.all().exclude(keyword__in=excludes).annotate(count=Count('keyword_set'))
    # make list of dictionaries using returned values, order by count descending 
    keywords = keywords_with_article_counts.values('keyword', 'count').order_by('-count')[:NUMBER_OF_KEYWORDS]
    # set min_count and max_count to highest returned count value initially
    min_count = max_count = keywords[0]['count']
    for keyword in keywords:
        if keyword['count'] < min_count:
            min_count = keyword['count']
        if max_count < keyword['count']:
            max_count = keyword['count']            
    range = float(max_count - min_count)
    if range == 0.0:
		range = 1.0
    for keyword in keywords:
		keyword['weight'] = int(
			MAX_WEIGHT * (keyword['count'] - min_count) / range
		)
    return { 'keywords': keywords}

Then in my template I could display the could with just a couple of line:

<div id="tagCloud">
	<h3>Keywords</h3>
	{% for keyword in keywords|shuffle %}
		<a href="/search/?q={{ keyword.keyword|urlencode }}" class="tagCloud-{{ keyword.weight }}">{{ keyword.keyword }}</a>
	{% endfor %}
</div>

And the relevant CSS:

a.tagCloud-0 {
    font-size: x-small;
    color: #669EC2;
} 
a.tagCloud-1 {
    font-size: small;
    color: #6B66C2;
} 
a.tagCloud-2 {
    font-size:medium;
    color: #A666C2;
} 
a.tagCloud-3 {
    font-size:large;
    color: #C167AF;
} 
a.tagCloud-4 {
    font-size:larger;
    color: #0765D3;
}
a.tagCloud-5 {
    font-size:x-large;
    color: #0765D3;
}

Almost there. Left like this the view/ template displays an ordered cloud of keywords. However, what I really wanted was a fancy randomised list. I could shuffle my keywords in two ways. First was in the view by adding these lines at the end:

def keyword_cloud(request):
    ...
    # cast keywords queryset as a list in order to use item assignment (used by random.shuffle)
    keywords = list(keywords)
    # shuffle list
    random.shuffle(keywords)
    # return randomised list
    return { 'keywords': keywords}

Or I could add a custom template tag (called shuffle.py for example):

import random
from django import template
register = template.Library()

@register.filter
def shuffle(arg):
    tmp = arg[:]
    random.shuffle(tmp)
    return tmp

And then import it into the template:

{% load shuffle %}

Either way worked for me, though I’m sure someone cleverer than I am can comment on the relative merits of each. Personally, I prefer to take care of this stuff in view and leave the template less busy.

And that is my keyword cloud. Of course there are packaged alternatives such as django-tagging but I was useful exercise to do it myself and enabled me to get the exact results I needed.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s