# remove duplicate words (and triplicate ( and quadruplicate…))
1 while s//b(/w+) /1/b/$1/gi;
7.3.3 tr///操作符(字译)
LVALUE =~ tr/SEARCHLIST/REPLACELIST/cds
tr/SEARCHLIST/REPLACELIST/cds
使用说明:
l tr///的修饰符如下:
修饰符
意义
/c
补替换 (Complement SEARCHLIST)
/d
删除找到未替换的字符串(在SEARCHLIST中存在在REPLACELIST中不存在的字符)
/s
将重复替换的字符变成一个
l 如果使用了/d修饰符,REPLACEMENTLIST总是解释为明白写出的字符串,否则,如果REPLACEMENTLIST比SEARCHLIST短,最后的字符将被复制直到足够长,如果REPLACEMENTLIST为空,等价于SEARCHLIST,这种用法在想对字符进行统计而不改变时有用,在用/s修饰符压扁字符时有用。
tr/aeiou/!/; # change any vowel into !
tr{////r/n/b/f. }{_}; # change strange chars into an underscore
tr/A-Z/a-z/ for @ARGV; # canonicalize to lowercase ASCII
$count = ($para =~ tr//n//);
$count = tr/0-9//;
$word =~ tr/a-zA-Z//s; # bookkeeper -> bokeper
tr/@$%*//d; # delete any of those
tr#A-Za-z0-9+/##cd; # remove non-base64 chars
# change en passant
($HOST = $host) =~ tr/a-z/A-Z/;
$pathname =~ tr/a-zA-Z/_/cs; # change non-(ASCII) alphas to single underbar
Match the control character Control-X (/cZ, /c[, etc.).
/C
Yes
Match one byte (C char) even in utf8 (dangerous).
/d
Yes
Match any digit character.
/D
Yes
Match any nondigit character.
/e
Yes
Match the escape character (ASCII ESC, not backslash).
/E
--
End case (/L, /U) or metaquote (/Q) translation.
/f
Yes
Match the form feed character (FF).
/G
No
True at end-of-match position of prior m//g.
/l
--
Lowercase the next character only.
/L
--
Lowercase till /E.
/n
Yes
Match the newline character (usually NL, but CR on Macs).
/N{NAME}
Yes
Match the named char (/N{greek:Sigma}).
/p{PROP}
Yes
Match any character with the named property.
/P{PROP}
Yes
Match any character without the named property.
/Q
--
Quote (de-meta) metacharacters till /E.
/r
Yes
Match the return character (usually CR, but NL on Macs).
/s
Yes
Match any whitespace character.
/S
Yes
Match any nonwhitespace character.
/t
Yes
Match the tab character (HT).
/u
--
Titlecase next character only.
/U
--
Uppercase (not titlecase) till /E.
/w
Yes
Match any "word" character (alphanumerics plus "_").
/W
Yes
Match any nonword character.
/x{abcd}
Yes
Match the character given in hexadecimal.
/X
Yes
Match Unicode "combining character sequence" string.
/z
No
True at end of string only.
/Z
No
True at end of string or before optional newline.
(以上均直接Copy自《Programming Perl》,下面未翻译者同)
其中应注意以下经典的字符集合:
Symbol
Meaning
As Bytes
As utf8
/d
Digit
[0-9]
/p{IsDigit}
/D
Nondigit
[^0-9]
/P{IsDigit}
/s
Whitespace
[ /t/n/r/f]
/p{IsSpace}
/S
Nonwhitespace
[^ /t/n/r/f]
/P{IsSpace}
/w
Word character
[a-zA-Z0-9_]
/p{IsWord}
/W
Non-(word character)
[^a-zA-Z0-9_]
/P{IsWord}
POSIX风格的字符类如下:
Class
Meaning
alnum
Any alphanumeric, that is, an alpha or a digit.
alpha
Any letter. (That's a lot more letters than you think, unless you're thinking Unicode, in which case it's still a lot.)
ascii
Any character with an ordinal value between 0 and 127.
cntrl
Any control character. Usually characters that don't produce output as such, but instead control the terminal somehow; for example, newline, form feed, and backspace are all control characters. Characters with an ord value less than 32 are most often classified as control characters.
digit
A character representing a decimal digit, such as 0 to 9. (Includes other characters under Unicode.) Equivalent to /d.
graph
Any alphanumeric or punctuation character.
lower
A lowercase letter.
print
Any alphanumeric or punctuation character or space.
punct
Any punctuation character.
space
Any space character. Includes tab, newline, form feed, and carriage return (and a lot more under Unicode.) Equivalent to /s.
upper
Any uppercase (or titlecase) letter.
word
Any identifier character, either an alnum or underline.
xdigit
Any hexadecimal digit. Though this may seem silly ([0-9a-fA-F] works just fine), it is included for completeness.
my $ip_region = Ipregion->new("new_ip_region.dat");
my @search_result = $ip_region->get_area_isp_id(974173694);
.Perl特殊变量
变量符号(名)
意义
$a
sort函数使用存储第一个将比较的值
$b
sort函数使用存储第二个将比较的值
$_ ($ARG)
默认的输入或模式搜索空间
@_ (@ARG)
子程序中默认存储传入参数
ARGV
The special filehandle that iterates over command-line filenames in @ARGV
$ARGV
Contains the name of the current file when reading from ARGV filehandle
@ARGV
The array containing the command-line arguments intended for script
$^T ($BASETIME)
The time at which the script began running, in seconds since the epoch
$? ($CHILD_ERROR)
The status returned by the last pipe close, backtick(``)command, or wait, waitpid, or system functions.
DATA
This special filehandle refers to anything following the __END__ or the __DATA__ token in the current file
$) ($EGID,
$EFFECTIVE_GROUP_ID)
The effective GID of this process
$> ($EUID,
$EFFECTIVE_USER_ID)
The effective UID of this process as returned by the geteuid(2) syscall
%ENV
The hash containing your current environment variables
$@ ($EVAL_ERROR)
The currently raised exception or the Perl syntax error message from the last eval operation
@EXPORT
Exporter模块import方法使用
@EXPORT_OK
Exporter模块import方法使用
%EXPORT_TAGS
Exporter模块import方法使用
%INC
The hash containing entries for the filename of each Perl file loaded via do FILE, require or use
@INC
The array containing the list of directories where Perl module may be found by do FILE, require or use
$. ($NR,
$INPUT_LINE_NUMBER)
The current record number (usually line numberZ) for the last filehandle you read from.
$/ ($RS,
$INPUT_RECORD_SEPARATOR)
The input record separator, newline by default, which is consulted by the readline function, the <FH> operator, and the chomp function.
$/=””将使得记录分割符为空白行,不同于”/n/n”
undef $/; 文件剩余所有行将全部一次读入
$/=/$number将一次读入$number字节
@ISA
This array contains names of other packages to look through when a method call cannot be found in the current package
@+ @- $` $’ $& $1 $2 $3
匹配相关变量
$^ $~ $|
Filehandle相关
$” ($LIST_SEPARATOR)
When an array or slice is interpolated into a double-quoted string, this variable specifies the string to put between individual elements. Default is space.
请发表评论