There are multiple reasons why the Swift code is slower than the Objective-C code.
I made a very simple test case by comparing two fixed strings 100 times.
- Objective-C code: 0.026 seconds
- Swift code: 3.14 seconds
The first reason is that a Swift Character
represents an "extended grapheme cluster",
which can contain several Unicode code points (e.g. "flags"). This makes the
decomposition of a string into characters slow. On the other hand, Objective-C
NSString
stores the strings as a sequence of UTF-16 code points.
If you replace
let a = Array(aStr)
let b = Array(bStr)
by
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
so that the Swift code works on UTF-16 sequences as well then the time goes down
to 1.88 seconds.
The allocation of the 2-dimensional array is also slow. It is faster to allocate
a single one-dimensional array. I found a simple Array2D
class here:
http://blog.trolieb.com/trouble-multidimensional-arrays-swift/
class Array2D {
var cols:Int, rows:Int
var matrix: [Int]
init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = Array(count:cols*rows, repeatedValue:0)
}
subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col]
}
set {
matrix[cols*row+col] = newValue
}
}
func colCount() -> Int {
return self.cols
}
func rowCount() -> Int {
return self.rows
}
}
Using that class in your code
func levenshtein(aStr: String, bStr: String) -> Int {
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
var dist = Array2D(cols: a.count + 1, rows: b.count + 1)
for i in 1...a.count {
dist[i, 0] = i
}
for j in 1...b.count {
dist[0, j] = j
}
for i in 1...a.count {
for j in 1...b.count {
if a[i-1] == b[j-1] {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}
return dist[a.count, b.count]
}
the time in the test case goes down to 0.84 seconds.
The last bottleneck that I found in the Swift code is the min()
function.
The Swift library has a built-in min()
function which is faster. So just removing
the custom function from the Swift code reduces the time for the test case to
0.04 seconds, which is almost as good as the Objective-C version.
Addendum: Using Unicode scalars seems to be even slightly faster:
let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)
and has the advantage that it works correctly with surrogate pairs such
as Emojis.