The confusion is due to the fact that the backslash character
is used as an escape at two different levels. First, the Python interpreter itself performs substitutions for
before the re
module ever sees your string. For instance,
is converted to a newline character,
is converted to a tab character, etc. To get an actual
character, you can escape it as well, so \
gives a single
character. If the character following the
isn't a recognized escape character, then the
is treated like any other character and passed through, but I don't recommend depending on this. Instead, always escape your
characters by doubling them, i.e. \
.
If you want to see how Python is expanding your string escapes, just print out the string. For example:
s = 'a\bc'
print(s)
If s
is part of an aggregate data type, e.g. a list or a tuple, and if you print that aggregate, Python will enclose the string in single quotes and will include the
escapes (in a canonical form), so be aware of how your string is being printed. If you just type a quoted string into the interpreter, it will also display it enclosed in quotes with
escapes.
Once you know how your string is being encoded, you can then think about what the re
module will do with it. For instance, if you want to escape
in a string you pass to the re
module, you will need to pass \
to re
, which means you will need to use \\
in your quoted Python string. The Python string will end up with \
and the re
module will treat this as a single literal
character.
An alternative way to include
characters in Python strings is to use raw strings, e.g. r'a'
is equivalent to "a\b"
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…