The first find operation works fine because it does not involve any non-ascii characters. UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 0: ordinal not in range(128) See the following example: > u'abcd'.find('b') But we should use Python as if it is a strict static type language and never cross use the two types, and when needed, do conversions explicitly. Python is a dynamic language and sometimes will do smart things between the two types. Notice that the types of the two strings are different, one is ‘unicode’, and the other is ‘str’. When writing Unicode strings literals, we put a prefix u before the string: > type(u'') Lesson 1: Normal string and Unicode string are two types. This post is a note I start today and I will update it when I encounter new Unicode problems… Python 2.7 (I use Python for data crawling and processing and R for modeling and visualization.) However recently I am working on projects on Chinese Weibo data, and I encountered some Unicode problems when using Python and R. Net languages or JVM languages, in which every string is Unicode and of course when the characters are displayed they are displayed as characters (not as the unreadable escaped strings or Unicode IDs). And when there is a little processing in Chinese characters or other Unicode characters, I use. Most of time, I don’t need to deal with different encodings at all.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |