特殊字符

  1. 合并字符

教程

  1. wiki 资料汇总: Category:Unicode - Wikipedia
  2. Unicode — pysheeet

    • 一个 python unicode 使用介绍,类似 cookbook
    • encode, decode, 错误处理

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      
      >> u = b"\xff"
      u.decode('utf-8', 'strict')
          Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
      # use U+FFFD, REPLACEMENT CHARACTER
      >> u.decode('utf-8', "replace")
      '\ufffd'
      # inserts a \xNN escape sequence
      >> u.decode('utf-8', "backslashreplace")
      '\xff'
      # leave the character out of the Unicode result
      >> u.decode('utf-8', "ignore")
      ''
  3. Python Unicode HowTo: Unicode HOWTO — Python 3.11.1 documentation

添加声调 accent 等 combing character (合并字符)

参考:

解说:

  • 手动合并字符
  • 不同的合并字符组合可能无效,或者达不到预期效果的奇怪符号

    • 比如 "1" 上面加横线字符,变成:
1
2
In [40]: 'fue'+ u'\u0301'
Out[40]: 'fué'