regex ---- python regex

Tutorial

python 官方
- re — Regular expression operations — Python 3.11.1 documentation
Regular Expressions | Regex Examples | Regexp Tutorials | OCPsoft
- 常用例子解说
Cover - Understanding Python re(gex)?
- 类似一个 manual, 另外有不少高级用法，如：使用 perl regex 语法

工具

可视化测试工具
- regex101
正则生成随机字符串
- exrex
  - exrex github
python regex pip 包
- GitHub - mrabarnett/mrab-regex
python 官方 re 包
- 注意 python re 和 regex 时两个工具
python 官方 string 库
- 各种字符集和使用函数
- eg: string.digits, string.punctuations
- string.punctuation

速度

compile 后，正则更快

1
2
3
4
5
6
pattern = r'\d+'

re.match(pattern, '123')  # 慢

pattern = re.compile(pattern)
pattern.match('123')      # 快

一般快三四左右

替换 re.sub

ref: https://docs.python.org/3/library/re.html#re.sub
接口
1
re.sub(pattern, repl, string)
repl
- 如何替换
  - 分组
- 未命名分组： r'(hello) (\w+)'
- \1 表示分组 1
  - 缺点：如果下一个字符是数字，后面必须跟空格，e
    - eg: '\1 1'(\g<1> 1) 和 '\11' (\g<11>)
- \g<1> 表示分组 1
  - 命名分组： r'(?P<name>Lucy-\w+) repeat: (?P=name)'
- \g<name>

查找所有 re.findall

分组问题
- 即：() 捕获的内容

特殊性

没有使用（）捕获

整个 pattern 当作一个捕获

1
2
In [74]: re.findall(r'[A-Z][a-z]?[a-z]?\d*(?:\.\d+)?', 'H2O')
Out[74]: ['H2', 'O']

使用了（）捕获

根据使用的 () 数量和确定一个 find 结果包含几个数据

1
2
3
4
5
6
7
# * 一个 ()
In [76]: re.findall(r'[A-Z][a-z]?[a-z]?\d*(?:\.\d+)?', 'H2O')
Out[76]: ['H2', 'O']

# * 两个（）
In [77]: re.findall(r'([A-Z][a-z]?[a-z]?)(\d*(?:\.\d+)?)', 'H2O')
Out[77]: [('H', '2'), ('O', '')]

更快速的 regex

工具

flashtext
- 教程
  - https://mp.weixin.qq.com/s/ImtSR6VwYalN6gTgjVJOsQ
  - https://github.com/vi3k6i5/flashtext

And 操作

参考
- https://www.ocpsoft.org/tutorials/regular-expressions/and-in-regex/

实例

1
2
3
4
(?=.*word1)(?=.*word2)(?=.*word3)

# * 既包含kind 又 包含good
^Start (?=.*kind)(?=.*good).* deed$

模糊匹配和衍生词

定义词内匹配
1
word = r'(?:\b\w*?(interested_part)\w*?\b)'
- 在 interested_part 部分数据待搜索的部分
- 好处
  - 这种方法能够搜索 词内成分 ，进而模糊匹配和衍生词匹配

匹配空串

^(?!.)
- 解说：前向不匹配任何字符

| 或 —- 使用误区

不只是作用于前后两个字符，作用于前面的整个字符

例子

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
r'(today|no|yes)'
# 等价于 r'((today)|(no)|(yes))'
# 不等价于 r'(today(y|n)(o|y)yes)'

# 匹配的 str
'today'
'no'
'yes'

# 不能匹配的 str
'todanoes'

\ 特殊性

r"\\" 才能表示匹配 \

/ 特殊性

r"/" 和 r"\/" 效果一致

如何获得 python 内部存储的正则的手写原文

print(your_regex_pattern)

文章目录

Tutorial

工具

速度

替换 re.sub

查找所有 re.findall

更快速的 regex

工具

And 操作

模糊匹配 和 衍生词

匹配空串

| 或 —- 使用误区

\ 特殊性

/ 特殊性

如何获得 python 内部存储的正则的手写原文

模糊匹配和衍生词