Status of Unicode in Python 3

Summary

Introduced in Python 2.0, unicode became the default string type in Python 3.0. It took 8 years to switch to unicode, and since Python 3.0, a lot of bugs has been fixed. The switch to unicode opened many questions. Should Python support both bytes and characters for filenames? What to do with undecodable bytes? etc.

Description

The talk will focus on the recent issues fixed in Python 3.1 and 3.2:

  • Use the PEP 383 (error handler to store undecodable bytes) everywhere
  • Encoding of the command line arguments: utf-8 on Mac OS X, locale encoding on UNIX/BSD, unicode on Windows
  • Environment variables: creation of os.environb
  • Filenames: huge work to support the PEP 383 everywhere, creation of os.fsencode() and os.fsdecode()
  • Python source code encoding: use tokenize.detect_encoding() instead of the locale encoding
  • some library examples: email, ftp, ...
  • etc.

The talk will present not only the changes in Python, but also in the C API.