Unicode and new style string formatting ~ The Voidspace Techie Blog - 0 views
-
Unicode and new style string formatting Python 2.6 and Python 3 gain a new style of string formatting, which is apparently based on the string formatting in C#. I wasn't a big fan of the string formatting in C# and so wasn't very excited about it moving into Python, but as is to be expected it has grown a bit on me.
-
As always, the best solution is to not mix Unicode and byte-strings but to keep all strings in Unicode and only perform the encode when actually needed.
-
So why does this behaviour matter? Well it particularly matters for framework authors formatting messages based on 'user' input. This is the case with unittest, which creates error messages when tests fail. The error messages internally in unittest are byte-strings and they are often mixed with user supplied messages using string formatting. We use old-style (% based) formatting, so if the user supplies byte-strings then the resulting messages will be byte-strings. If the user supplies Unicode strings then the resulting messages will be in Unicode. Because all the internal unittest messages are ascii only we can guarantee than an implicit decode to Unicode will succeed - so the user can choose the output type by varying the type of the messages they provide.