Python encoding and unicode

Python source code encoding and unicode in 2.x.

Define python encoding

Python will default to ASCII as standard encoding if no other
encoding hints are given.

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

      # coding=<encoding name>

or (using formats recognized by popular editors)

      # -*- coding: <encoding name> -*-


      # vim: set fileencoding=<encoding name> :

More precisely, the first or second line must match the regular
expression "^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)".

Python 2.x unicode

  • Unicode type:
    • basestring <= str or unicode
    • isinstance(obj, basestring) => isinstance(obj, (str, unicode))
    • unicode(string[, encoding, errors])
  • Unicode literal:
    • u’àôî’
    • U’ÀÔÎ’
    • \x
    • \u