In response to some of the other answers and comments:
It is true that as a programmer you should generally avoid premature optimization. But. This is not so true for scripting languages where the compiler does not optimize much -- or at all.
So, whenever you write something in Lua, and that is executed very often, is run in a time-critical environment or could run for a while, it is a good thing to know things to avoid (and avoid them).
This is a collection of what I found out over time. Some of it I found out over the net, but being of a suspicious nature when the interwebs are concerned I tested all of it myself. Also, I have read the Lua performance paper at Lua.org.
Some reference:
Avoid globals
This is one of the most common hints, but stating it once more can't hurt.
Globals are stored in a hashtable by their name. Accessing them means you have to access a table index. While Lua has a pretty good hashtable implementation, it's still a lot slower than accessing a local variable. If you have to use globals, assign their value to a local variable, this is faster at the 2nd variable access.
do
x = gFoo + gFoo;
end
do -- this actually performs better.
local lFoo = gFoo;
x = lFoo + lFoo;
end
(Not that simple testing may yield different results. eg. local x; for i=1, 1000 do x=i; end
here the for loop header takes actually more time than the loop body, thus profiling results could be distorted.)
Avoid string creation
Lua hashes all strings on creation, this makes comparison and using them in tables very fast and reduces memory use since all strings are stored internally only once. But it makes string creation more expensive.
A popular option to avoid excessive string creation is using tables. For example, if you have to assemble a long string, create a table, put the individual strings in there and then use table.concat
to join it once
-- do NOT do something like this
local ret = "";
for i=1, C do
ret = ret..foo();
end
If foo()
would return only the character A
, this loop would create a series of strings like ""
, "A"
, "AA"
, "AAA"
, etc. Each string would be hashed and reside in memory until the application finishes -- see the problem here?
-- this is a lot faster
local ret = {};
for i=1, C do
ret[#ret+1] = foo();
end
ret = table.concat(ret);
This method does not create strings at all during the loop, the string is created in the function foo
and only references are copied into the table. Afterwards, concat creates a second string "AAAAAA..."
(depending on how large C
is). Note that you could use i
instead of #ret+1
but often you don't have such a useful loop and you won't have an iterator variable you can use.
Another trick I found somewhere on lua-users.org is to use gsub if you have to parse a string
some_string:gsub(".", function(m)
return "A";
end);
This looks odd at first, the benefit is that gsub creates a string "at once" in C which is only hashed after it is passed back to lua when gsub returns. This avoids table creation, but possibly has more function overhead (not if you call foo()
anyway, but if foo()
is actually an expression)
Avoid function overhead
Use language constructs instead of functions where possible
function ipairs
When iterating a table, the function overhead from ipairs does not justify it's use. To iterate a table, instead use
for k=1, #tbl do local v = tbl[k];
It does exactly the same without the function call overhead (pairs actually returns another function which is then called for every element in the table while #tbl
is only evaluated once). It's a lot faster, even if you need the value. And if you don't...
Note for Lua 5.2: In 5.2 you can actually define a __ipairs
field in the metatable, which does make ipairs
useful in some cases. However, Lua 5.2 also makes the __len
field work for tables, so you might still prefer the above code to ipairs
as then the __len
metamethod is only called once, while for ipairs
you would get an additional function call per iteration.
functions table.insert
, table.remove
Simple uses of table.insert
and table.remove
can be replaced by using the #
operator instead. Basically this is for simple push and pop operations. Here are some examples:
table.insert(foo, bar);
-- does the same as
foo[#foo+1] = bar;
local x = table.remove(foo);
-- does the same as
local x = foo[#foo];
foo[#foo] = nil;
For shifts (eg. table.remove(foo, 1)
), and if ending up with a sparse table is not desirable, it is of course still better to use the table functions.
Use tables for SQL-IN alike compares
You might - or might not - have decisions in your code like the following
if a == "C" or a == "D" or a == "E" or a == "F" then
...
end
Now this is a perfectly valid case, however (from my own testing) starting with 4 comparisons and excluding table generation, this is actually faster:
local compares = { C = true, D = true, E = true, F = true };
if compares[a] then
...
end
And since hash tables have constant look up time, the performance gain increases with every additional comparison. On the other hand if "most of the time" one or two comparisons match, you might be better off with the Boolean way or a combination.
Avoid frequent table creation
This is discussed thoroughly in Lua Performance Tips. Basically the problem is that Lua allocates your table on demand and doing it this way will actually take more time than cleaning it's content and filling it again.
However, this is a bit of a problem, since Lua itself does not provide a method for removing all elements from a table, and pairs()
is not the performance beast itself. I have not done any performance testing on this problem myself yet.
If you can, define a C function that clears a table, this should be a good solution for table reuse.
Avoid doing the same over and over
This is the biggest problem, I think. While a compiler in a non-interpreted language can easily optimize away a lot of redundancies, Lua will not.
Memoize
Using tables this can be done quite easily in Lua. For single-argument functions you can even replace them with a table and __index metamethod. Even though this destroys transparancy, performance is better on cached values due to one less function call.
Here is an implementation of memoization for a single argument using a metatable. (Important: This variant does not support a nil value argument, but is pretty damn fast for existing values.)
function tmemoize(func)
return setmetatable({}, {
__index = function(self, k)
local v = func(k);
self[k] = v
return v;
end
});
end
-- usage (does not support nil values!)
local mf = tmemoize(myfunc);
local v = mf[x];
You could actually modify this pattern for multiple input values
The idea is similar to memoization, which is to "cache" results. But here instead of caching the results of the function, you would cache intermediate values by putting their calculation in a constructor function that defines the calculation function in it's block. In reality I would just call it clever use of closures.
-- Normal function
function foo(a, b, x)
return cheaper_expression(expensive_expression(a,b), x);
end
-- foo(a,b,x1);
-- foo(a,b,x2);
-- ...
-- Partial application
function foo(a, b)
local C = expensive_expression(a,b);
return function(x)
return cheaper_expression(C, x);
end
end
-- local f = foo(a,b);
-- f(x1);
-- f(x2);
-- ...
This way it is possible to easily create flexible functions that cache some of their work without too much impact on program flow.
An extreme variant of this would be Currying, but that is actually more a way to mimic functional programming than anything else.
Here is a more extensive ("real world") example with some code omissions, otherwise it would easily take up the whole page here (namely get_color_values
actually does a lot of value checking and recognizes accepts mixed values)
function LinearColorBlender(col_from, col_to)
local cfr, cfg, cfb, cfa = get_color_values(col_from);
local ctr, ctg, ctb, cta = get_color_values(col_to);
local cdr, cdg, cdb, cda = ctr-cfr, ctg-cfg, ctb-cfb, cta-cfa;
if not cfr or not ctr then
error("One of given arguments is not a color.");
end
return function(pos)
if type(pos) ~= "number" then
error("arg1 (pos) must be in range 0..1");
end
if pos < 0 then pos = 0; end;
if pos > 1 then pos = 1; end;
return cfr + cdr*pos, cfg + cdg*pos, cfb + cdb*pos, cfa + cda*pos;
end
end
-- Call
local blender = LinearColorBlender({1,1,1,1},{0,0,0,1});
object:SetColor(blender(0.1));
object:SetColor(blender(0.3));
object:SetColor(blender(0.7));
You can see that once the blender was created, the function only has to sanity-check a single value instead of up to eight. I even extracted the difference calculation, though it probably does not improve a lot, I hope it shows what this pattern tries to achieve.