Python encoding and unicode

Python source code encoding and unicode in 2.x.

Define python encoding

Python will default to ASCII as standard encoding if no other
encoding hints are given.

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

      # coding=

or (using formats recognized by popular editors)

      #!/usr/bin/python
      # -*- coding:  -*-

or

      #!/usr/bin/python
      # vim: set fileencoding= :

More precisely, the first or second line must match the regular
expression "^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)".

Python 2.x unicode

  • Unicode type:
    • basestring <= str or unicode
    • isinstance(obj, basestring) => isinstance(obj, (str, unicode))
    • unicode(string[, encoding, errors])
  • Unicode literal:
    • u'àôî'
    • U'ÀÔÎ'
    • \x
    • \u

References

python