Unicode is hard

7th October, 2007 . 1 comments

Full unicode support is steadily being fleshed out for rst2a's rendering API.

Since the unicode branch has been merged, unicode support in Django seems to be very solid.

However, when files are posted as a file upload, Django doesn't make their contents available as a unicode string. I'm not certain how / if to handle this at the framework level - as you don't want binary files getting encoded. There's a ticket open with regards to content transfer encoding. Perhaps something could be worked in with that.

Its easy to handle though with:

from django.utils.encoding import smart_unicode
content = smart_unicode(file['content'])

This got the unit tests passing, but Duncan could still make PDF rendering blow up by mashing alt+some keys on his macbook :(

Using rst2latex with --input-encoding=UTF-8 and --output-encoding=UTF-8 we were still getting this error:

! Package inputenc Error: Unicode char \u8:? not set up for use with LaTeX.

The fix for this was:

sudo apt-get install latex-ucs

rst2a should now handle most documents you throw at it, with regards to unicode. There are some issues with PDF when rendering odd characters. For example I'm trying to work out how to render documents with the ยข character. LaTeX gives this error:

! LaTeX Error: Command \textcent unavailable in encoding T1.

Thanks Duncan and Gabriel for your help.

1 Comments

Subscribe to comments

 #1

Forest Bond - 4th January, 2008 at 2:43 p.m.

xetex is latex that handles Unicode input (and TrueType fonts). Have you tried that?

Post a comment