raku - Regex speed in Perl 6

Question

Welcome To Ask or Share your Answers For Others

raku - Regex speed in Perl 6

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

raku - Regex speed in Perl 6

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly. I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

my @search = <abcde cdeff fabcd>;

my token search {
    @search
}

my @new_array = @array.grep({/ <search> /});
say @new_array;

Then I printed @array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.

$ grep -f search array

After both programs produced the same result, as expected, I measured the time they were working.

$ time perl6 search.p6
real    0m6,683s
user    0m6,724s
sys     0m0,044s
$ time grep -f search array
real    0m0,009s
user    0m0,008s
sys     0m0,000s

So, what am I doing wrong in my Perl 6 code?

UPD: If I pass the search tokens to grep, looping through the @search array, the program works much faster:

my @array = "aaaaa" .. "fffff";
say +@array;

my @search = <abcde cdeff fabcd>;

for @search -> $token {
  say [email protected]({/$token/});
}

$ time perl6 search.p6
real    0m1,378s
user    0m1,400s
sys     0m0,052s

And if I define each search pattern manually, it works even faster:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

say [email protected]({/abcde/});
say [email protected]({/cdeff/});
say [email protected]({/fabcd/});

$ time perl6 search.p6
real    0m0,587s
user    0m0,632s
sys     0m0,036s

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:27:39+0000

The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.

For a more performant example, you could pre-compile the regex:

my $search = "/@search.join('|')/".EVAL;
#  $search =  /abcde|cdeff|fabcd/;
say [email protected]($search);

That change causes it to run in about half a second.

If there is any chance of malicious data in @search, and you have to do this it may be safer to use:

"/@search?.Str?.perl.join('|')/".EVAL

The compiler can't quite generate that optimized code for /@search/ as @search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search doesn't get modified.
(I think Perl?5 does something similar)

One important fact you have to keep in mind is that a regex in Perl?6 is just a method that is written in a domain specific sub-language.

Categories

raku - Regex speed in Perl 6

raku - Regex speed in Perl 6

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags