在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
转载自:http://magicpanda.net/2010/10/lua%E6%9E%B6%E6%9E%84%E6%96%87%E6%A1%A3/
十 102010
前段时间翻译了lua官方关于lua5架构设计的一份文档,现在分享给大家。 注意:所有版权都归lua官方所有,本人仅将其翻译为中文,以方便中文阅读者。翻译中出现任何错误导致的结果,本人不负任何责任。 如果有任何翻译错误,以及意见与建议,请email本人。邮件地址:[email protected]。 转载请注明原作者、翻译者与出处。 The Lua ArchitectureLua架构Advanced Topics in Software Engineering 软件工程中的高级主题 Mark Stroetzel Glasberg Jim Bresler Yongmin ‘Kevin’ Cho Introduction介绍Lua is a powerful light-weight programming language designed to extend applications. lua是一种轻量级的编程语言,其设计目的在于扩展应用程序。 Lua started as a small language with modest goals. The language has grown incrementally. As a result, Lua has been re-architected and rewritten several times in the past ten years. The original functional requirements and motivation of the architecture were documented in the paper “Lua-an extensible extension language” [1]. A few other versions of the language were briefly described in “The Evolution of an Extension Language: A History of Lua” [3]. lua起始于一个目标适中的简单语言,然后不断成长。由此,lua在过去10年中被重新设计、重新编写了多次。模型的最初功能需求与驱动记录于 “Lua-an extensible extension language”文档中。lua的一些其他版本在“The Evolution of an Extension Language: A History of Lua”有简单的描述。 The architecture and implementation was created and is maintained by Roberto Ierusalimschy, Waldemar Celes and Luiz Henrique de Figueiredo at the Computer Graphics Technology Group of the Pontifical Catholic University of Rio de Janeiro in Brazil (PUC-Rio). lua的架构与实现由Roberto Ierusalimschy,Waldemar Celes和Luiz Henrique de Figueiredo创建与维护,于巴西 – 里约热内卢 - Pontifical Catholic大学的计算机图形技术组。 Lua’s implementation consists of a small library of ANSI C functions that compiles unmodified in all known platforms. The implementation goals are simplicity, efficiency, portability, and the ability to run on small devices with limited capabilities. These implementation goals resulted is a fast language engine with small footprint, making it ideal in embedded systems. lua的实现包含一个很小的库,这个库由一些在所有平台一致的ansi c函数组成。这样实现的目的在于简单、高效、轻便,并能运行于兼容性有限的微小设备上。实现这些目标后得到的就是一个资源占用很小的高速语言引擎,适于系统嵌入。 This paper reconstructs and documents the architecture of Lua version 5.0.2. Lua 5.0.2 contains approximately 25,000 lines of source code. The code base uses an instance of the compiler reference model, has several identifiable patterns, and is divided into clearly defined modules such as the code interpreter, parser and virtual machine. 这份文档重建并文档化了lua V5.02的架构。lua V5.02包含大约25,000行源码(YoungMan注:lua V5.1.3大概是17,000行)。这些代码基于编译引用模型实例的应用,有机种固定模式,并划分成一些定义清晰的模块,如,代码解释器,语法分析器, 虚拟机。 The language is being used in several projects at Tecgraf, PUC-Rio, University of Illinois at Urbana-Champaign and in several industry companies such as Microsoft, LucasArts Entertainment, and others. Lua正用于Tecgraf, PUC-Rio, University of Illinois of Urbana-Champaign的数个工程中,在一些商业公司中,如,微软,lucasArts娱乐等也有应用。 Architectural Requirements架构需求Lua’s main quality attributes, which are simplicity, performance, flexibility, and portability, were driven by the business requirements of projects developed at Tecgraf. Tecgraf is a Computer Graphics Technology Group created in May 1987 in a partnership between PETROBRAS (the Brazilian main oil company) and the Pontifical Catholic University of Rio de Janeiro – PUC-Rio. Its purpose is to develop, establish, and maintain computer graphics and user interface software for technical and scientific applications. lua的主要质量属性,也就是,简明性,性能,灵活性,轻便性,主要由Tecgraf小组开发的数个工程需求驱动。Tecgraf小组由PETROBRAS(巴西主要的石油公司)公司与PUC-Rio大学合作于1987年创建,是一个计算机图形技术小组。 Business Drivers业务驱动The first project in which Lua was used needed to represent data for graphic simulators. This rudimentary ancestor of Lua was not a script language, but a data representation language. Because graphic information is naturally large, high performance was the first attribute to be considered. When users began to demand more power from this data representation language - such as the use of boolean expressions, conditional control and loops - it was clear that a true programming language was needed. Lua应用的第一个工程需要为图形模拟器表示数据。Lua的最初雏形不是一个脚本语言,而是一个数据表示语言。因为图形信息天生就很庞大,所以高性 能是首要考虑的目标。当用户对这种数据表示语言提出更多功能需求时,例如,boolean表达式的使用,条件控制与循环等,结果就十分清晰了:我们需要的 是一门真正的编程语言。 Data description played an important role in the evolution of Lua as different projects at Tecgraf were adopting Lua. Therefore, Lua had to be easily customizable for each application. Lua adopted the use of semi-structured data in the form of associative arrays. Moreover, metamethods (explained later) were introduced in the language to provide additional flexibility to data access. 由于Tecgraf的多个工程都采用Lua,数据描述在lua的演化中扮演了十分重要的角色。由此,Lua必须很容易适应各种应用程序。Lua采用 了关联数组形式的半结构化数据(YoungMan注:也就是lua中的table)。更重要的是,元方法(后面详细解释)被加入到语言中以对数据访问提供 额外的灵活性。 Portability was an important issue for Lua. At the time, Tecgraf’s main client, PETROBRAS, required all developed systems to be capable to run on a wide range of computers systems. 轻便性在lua中十分重要。在当时,Tecgraf的主要客户——PETROBRAS要求所有开发系统必须对各种范围广泛的计算机系统兼容。 Finally, the business drivers also originated other functional requirements such as concurrency support, string pattern matching, garbage collection, and namespaces for modules. Most of these functionalities were added at a later point in the evolution of the language.
最后,业务驱动也源于其他一些功能性需求,例如并发行,字符串模式匹配,垃圾收集,以及模块的命名空间。这些功能大多数是在lua之后的演化过程中逐步添加进来的。
Quality Attributes and Tactics质量属性与策略As a result of its business drivers, Lua’s main quality attributes were not only performance, portability, and extensibility, but also simplicity, availability, and compactness .
由于是业务驱动,lua的主要质量属性不仅仅是性能,轻便性,与扩展性,还包括简明性,实用性,与紧密性。
The development process of Lua played an important role in assuring that the quality requirements were met. New features could only be added to the language only when all committee members reached unanimity. This helped keep Lua’s simplicity, which is generally considered Lua’s most important asset. Lua’s compactness is a consequence of its simplicity. As a result, extensibility became essential to the language.
Lua的开发进程中非常看重是否达到这些质量需求。只有委员会成员达成一致,Lua才能添加新的语言特性。这帮助保持了lua的简明性,而简明 性大体上被认为是lua最为重要的方面。lua的紧密性是其简明性的结果。由此,对lua来说扩展性变得不可或缺。(YoungMan注:也就是说,为了 保证lua的简明性与紧密型,lua核心的内容很少,绝不轻易添加内容;为了不局限lua语言的功能,其扩展性就变得非常重要,以提供用户定义的功能扩 展)。
Extensibility has been achieved by adopting several mechanisms such as data representation of C/C++ pointers (called userdata), dynamic loading of external modules, namespaces, and metamethods. Metamethods allow user-defined functions to be invoked when an associative array or a userdata variable is accessed. This allows customization of data accessing through the definition of user-defined operations. This mechanism provides flexibility to the language without making it complex.
扩展性已经通过一些方法完成了,这些方法包括如,C/C++指针的数据表示法(称之为用户数据userdata),动态加载外部模块,名字空 间,以及元方法等。元方法允许在访问关联数组或者用户数据变量时引入用户自定义函数。这就允许通过自定义用户操作来自定义数据访问过程。这些方法使得 Lua灵活却不复杂。
Thanks to Lua’s flexibility, several individual libraries add functionality to Lua. Some examples include: user interface support [14], socket support [15], and middleware support [16].
得益于Lua的灵活性,几个独立的库为Lua添加了许多功能。例如:用户接口支持,socket支持,中间件支持等。
One tactic used to improve Lua’s performance was the use of a virtual machine. A virtual machine provides a way to speed execution time because it allows precompiled code to be forwarded directly to it, therefore bypassing the lexical and syntactical analysis. This architecture was pioneered in Smalltalk (Goldberg–Robson 1983; Budd 1987) from which the term bytecode is borrowed.
虚拟机的使用是为改进lua性能而采用的一大策略。虚拟机提供了一种方式加速执行时间,即,它允许直接访问预编译代码,也就省去了词法与语法分析的时间。这种架构始于smalltalk中,smalltalk就是访问一段段的字节码。
Lua’s portability is archived by using only ANSI C code. But even though the standard was followed closely, great care had to be taken to avoid portability problems. The heterogeneous community of users has been helpful identifying compilers discrepancies and solving compiling issues throughout the language’s evolution.
lua的轻便性通过仅使用ansi c达成。但尽管完全遵守标准,仍然必须非常小心以避免轻便性方面的问题。在lua的演化中,各种各种的用户社区对识别编译器的差异性、解决编译的各种问题提供了非常多的帮助。
Availability has been achieved as a consequence of compactness, the use of good implementation standards, and a big set of tests to catch bugs.
实用性则是语言紧密性,充分利用好的实现标准,以及应用大量测试来查找bug后得到的结果。
Lua FeaturesLua特性Lua offers a wide variety of programming functionalities such as loops, scope, function calls, arithmetic calculations, string pattern matching, error handling (exception like), coroutines, garbage collection, debugging mechanisms, comprehensive C API, OS facilities, input and output functions, and more.
Lua提供了广泛的编程功能,如循环,作用域,函数调用,数学计算,字符串匹配,错误处理(类似异常),协程,垃圾收集,debug方法,综合多样的C API,OS操作,IO功能,等等。
Following its objective of simplicity, Lua defines a reduced number of data types. Although the number of data types is reduced, they are powerful enough to accomplish all of Lua’s objectives. Below we present and describe Lua’s data types.
根据Lua简明性的目标,lua仅定义几种简单的数据类型。尽管数据类型很少,但足够强大以完成lua的所有目标。下面是lua中数据类型的表示与描述。
One important design consideration is Lua’s ability to classify and represent data. Lua is a weakly typed language; this means that variables do not have types, but the variable’s values do.
设计考虑中很重视lua分类和表示数据的能力。lua是弱类型语言,这意味着变量没有类型这一概念,但变量的值有。
Among Lua’s data types are tables and userdata. Lua uses the table and userdata types to support application specific structures. Tables are really associative arrays [6], an abstract data type that behaves similarly to arrays but allows anything to be a key or a value. This structure allows several data structures to be represented such as trees, graphs, or even XML files.
Table和userdate是lua数据类型之一。Lua使用table和userdate类型来支持应用程序特定结构。Table实际上是 关联数组,一种抽象的数据结构类型,类似于数组,但允许任意的内容作为其项的键与值(key & value)。这种结构可以表示各种数据结构,例如树,图,甚至XML文件。
In addition, Lua allows the semantics for both userdata accesses and table accesses to be altered through metamethods. Metamethods enable applications to override the default logic of data access, supporting for example the implementation of object oriented mechanisms [18].
此外,lua允许通过元方法切换访问table、userdate的语义。元方法允许应用程序覆盖对默认逻辑数据的访问,支持例如面向对象方法的实现。
Changing table access policies enables the user to reflect a C++ variable in Lua and translate Lua method calls to C++ method calls. This mechanism is used extensively by tolua [13], a tool that helps creates C and C++ bindings to Lua. Finally, Lua allows metamethods to be changed during execution, adding even more flexibility to the mechanism.
改变table访问策略使得用户可以在lua中表示C++变量,吧lua方法调用转换成C++方法调用。这种方法在tolua中用的很多。tolua是帮助创建C/C++对lua绑定的工具。最后,lua还允许执行时改变元方法,更加强了这种方法的灵活性。
Another characteristic of Lua function calls is that function parameters are passed as references, so function calls are fast. Nonetheless, parameters cannot be changed inside functions; as a result, the number of function return values becomes limited. Lua solves this issue by allowing multiple values to be returned in function calls. lua函数调用的另一个特性是函数参数是引用传递的,所以函数调用很快。尽管如此,参数不会在函数内部被修改;因此,函数返回值的个数也是有限的。但 lua允许在函数调用中返回多个值,由此解决了这个问题。(YoungMan注:也就是说,lua无法使用类似C函数在参数中获取多余一个的返回值的方 式,但允许直接返回多个结果)。
Architectural Solution架构解决方案In this section we describe the architecture of Lua. This text describes first how Lua receives and interprets a script file. Next, we present the internal module decomposition of Lua and how each module interacts with each other. Finally, we describe most of Lua’s subsystems.
在这一部分,我们描述lua的架构。这些文本首先描述lua如何接收、解释脚本文件。然后,我们展示lua内部模块的划分,以及各模块之间如何相互通信。最后,我们描述lua的重要子系统。
Lua: An Embedded Script LanguageLua:一种嵌入式脚本语言Although Lua offers a stand alone command line interpreter, Lua is designed to be embedded in an application. Applications can control when a script is interpreted, loaded, and executed. They can also catch errors, handle multiple Lua contexts, and extend Lua’s capabilities.
尽管lua提供独立的命令行解释器,lua主要设计用于嵌入到应用程序中。应用程序可以控制脚本的解释,加载,执行,当然也可以捕获错误,处理各种lua上下文,并且扩展lua的功能。
The process of initializing Lua and loading a script is depicted in Figure 1.
lua初始化,然后加载脚本的过程如下图1所示。
Figure 1: process of initializing Lua and loading a script file
图1:lua初始化与加载脚本的过程
Four steps are necessary to load and execute a Lua script. First, a state of the Lua interpreter must be created. This state is passed on to all functions of Lua’s C API, including the calls done in the following steps. Second, the application embedding Lua registers all libraries that extend Lua. Next, scripts provided by the application are parsed and instructions that the virtual machine can execute are generated. These instructions are referred to as bytecodes. Finally, the bytecodes are forwarded to the virtual machine for execution.
加载然后执行lua脚本需要4个步骤。首先,要创建一个lua解释器的状态(lua state)。这个状态会被传递给Lua C API的所有函数,包括接下来的步骤中调用的函数。第二步,嵌入lua的应用程序会注册所有的库以扩展lua。接下来,lua对应用程序提供的脚本进行语 法分析并生成虚拟机可以执行的指令。这些指令称之为字节码。最后,字节码被送入虚拟机执行。
The first step to loading and executing a script is creating a Lua interpreter reference. This step consists of initializing a lua_State structure by calling lua_open. The lua_State structure is necessary because Lua offers a reentrant API and does not use any global variable. As a result, an application may create multiple instances of the Lua interpreter.
加载与执行脚本的第一步是创建lua解释器的引用。这一步主要是调用lua_open初始化lua_State结构体。lua_State结构体必须的,因为lua提供可重入的api,并且不需要使用任何全局变量。正由于此,应用程序可以创建多个lua解释器的实例。
Next, the application needs to register libraries available to Lua programs. Lua supports a default set of libraries for target applications. Applications may expand or contact the list of libraries available to Lua programs by controlling which libraries are registered. This allows applications to customize the library functions available to Lua applications.
接下来,应用程序需要注册可用的库到lua程序中。lua支持为目标应用程序使用默认的库集合。应用程序也可以通过控制库的注册来扩展或将可用的库列表关联到lua程序中。这使得应用程序可以为lua程序自定义库的可用功能。
Afterwards, the Lua interpreter needs to obtain bytecodes to execute. At this point, there are two possible scenarios: precompiled Lua bytecodes are loaded or a Lua script is loaded. When loading a script, Lua uses standard lexer, parser, and bytecode generation components to precompile the program. These components behave like Pipes and Filters [11] by passing data to each other sequentially and incrementally. Because each of these components has a significant impact on performance, Lua needs to execute these components as quickly as possible. Therefore, Lua does not use automated code generation tools such as lex or yacc [17]; instead, the Lua implementation has a hand-written parser and lexer.
之后,lua解释器需要获取字节码以执行脚本。此时,有两种可能的场景:预编译lua字节码被加载或者lua脚本被加载。如果加载脚本,lua 使用标准的词法、语法分析器,以及字节码生成组件来预编译程序。这些组件的行为类似于管道或者说过滤器,序列化地、增量地互相传递数据。由于各个组件的性 能都非常关键,lua必须尽可能快地执行这些组件。因此,lua不能使用自动代码生成工具,例如,lex或者yacc;取而代之的是,lua实现中有手工 编写的语法、词法分析器。
Finally, Lua needs to execute the bytecodes. The virtual machine kernel contains a loop that reads and executes a virtual machine instruction.
最后,lua需要执行字节码。虚拟机内核包含一个读取和执行虚拟机指令的循环。
Module Decomposition模块划分Lua is divided in subsystems to support its requirements. These modules include an application loader, library loader, a public API, auxiliary libraries, several modules to support the virtual machine, and several modules to support translating Lua script into bytecode. A static view of Lua’s modules and relationships between them is depicted in Figure 2. In addition, the picture shows the implementation file of each module.
lua划分为子系统以支持其需求。这些模块包括应用程序加载器,库加载器,开放API,辅助库,几个支持虚拟机的模块,以及几个将lua脚本转换为字节码的模块。lua模块及其之间的关系的静态视图如下图2所示。此外,这个图也说明了每个模块的实现文件。
Figure 2: Lua Module Decomposition
图2:lua模块划分
Lua’s module decomposition helps Lua address its quality and functionality requirements. In particular, the module decomposition helps Lua maintain its compactness goals because it allows detachment of modules from the normal distribution. For example, Lua separates core platform code and auxiliary code by placing them into different libraries. As a result, applications embedding Lua are not required to link the auxiliary library. The module decomposition also allows applications to reduce Lua’s footprint by removing the parser subsystem if the application only needs to execute precompiled programs.
Lua模块划分帮助lua确定其质量和功能需求。特别是,模块划分帮助lua保持其紧密性目标,因为,它允许把模块从正常发布版本中分离出来。 例如,lua通过将核心平台代码和辅助代码放置与不同的库中来分离它们。由此得到的结果是,嵌入lua的应用程序不强制需要链接辅助库。这种模块划分允许 应用程序减少lua的印记,比如,如果应用程序仅需要执行预编译程序,就可以将语法分析器去除掉。(YoungMan注:就是说可以尽量少地包含应用程序 需要的lua模块)
Another advantage of Lua’s module decomposition is that it minimizes the dependencies between external applications and Lua. The public API presents a facade [11] that allows applications to use Lua without knowledge of its internal decomposition. This allows the Lua team to make several types of changes without breaking compatibility with existing applications. As a result, applications can often upgrade the version of the Lua interpreter without making any source code changes.
lua模块划分的另一好处是它最小化了外部应用程序与lua之间的依赖关系。这些公开的API提供了一整套接口,允许应用程序在不知道lua内 部划分时也能使用lua。这允许lua团队做各种改变,却不破坏Lua与已有程序的兼容性。由此,应用程序可以经常性升级lua解释器的版本,去不需要做 任何代码修改。(YoungMan注:简单来说,就是lua封装得很好,可以不改变接口,而改变实现。内部实现的改变不会对已有外部代码造成影响。)
However, the public API is difficult to use. The auxiliary library exposes a simplified API to applications that is easier to use than the public API. For example, the auxiliary library provides functions that load a Lua script or precompiled file from disk. The auxiliary library is powerful enough to address the requirements of most applications that embed Lua. However, applications still have the ability to use the core API when the auxiliary library is not sufficiently powerful.
尽管如此,这些公开的API使用起来还是有些困难。辅助库为应用程序提供了一套更简单、更易于使用的API。比如,辅助库提供一个函数用于从磁 盘加载脚本或者预编译它。辅助库相当强大,可以满足绝大多数嵌入lua应用程序的需求。然而,应用程序在辅助库无法满足需求时还是应该使用核心API的。
The public API interacts with subsystems to perform operations requested by the application. For example, the public API interacts with the parser subsystem to convert Lua scripts to bytecodes. The parser subsystem consists of a lexer, parser, and bytecode generator.
这些公开的API与子系统交互以提供应用程序请求的操作。例如,公开API与语法分析器子系统交互以将lua脚本转换成字节码。
Another important subsystem is the virtual machine. The virtual machine helps Lua achieve its performance requirements. Its modules consist of a virtual machine kernel, garbage collector, debugging interface, and others.
另一个重要的子系统是虚拟机。虚拟机帮助lua达到性能方面的需求。其模块包括一个虚拟机内核,垃圾收集器,debug接口,等等。
The virtual machine increases performance by decoupling language syntax from application execution semantic, allowing faster loading of pre-compiled scripts. For this purpose, the Lua distribution offers an external compiler that allows translation of Lua scripts to bytecode form. This bytecode form also serves as code scrambling, hiding the code source in the application’s final distribution.
虚拟机将lua语法与应用程序执行语义解耦合,这样能更快地加载预编译后的脚本,由此lua虚拟机的性能也得到了提升。为达到此目标,lua发布版本提供了外部编译器允许将lua脚本编译成字节码的形式。这种字节码也用于在应用程序的最终发布版本中混淆、隐藏lua源码。
Lua also uses several utility modules to meet its requirements. For example, Lua has an error detection module that allows errors to be handled in a centralized manner. The module allows the execution of C function in protected mode. A function running in protected mode can throw an exception if it encounters an error. When a function throws an exception, control returns immediately to the caller of the protected mode function.
lua也使用一些工具模块以满足需求。例如,lua有一个错误探测模块,允许集中处理错误。这个模块允许C函数以保护模式执行。运行于保护模式的函数在遇到错误时可以抛出一个异常。当函数抛出异常时,控制权就返回给以保护模式调用函数的调用者。
Subsystems子系统Lua is divided into subsystems that separate functionality. We have explained how the subsystems work together; now we explore in more detail some of Lua’s subsystems.
lua根据功能划分子系统。我们已经解释过子系统如何一起运行;现在我们探讨更多lua子系统的细节。
Parser语法分析器The objective of the Lua parser subsystem is to convert Lua scripts to bytecodes that the virtual machine will eventually execute. Therefore, the parser subsystem consists of a lexer, a parser, and a bytecode generator.
lua语法分析器子系统的目标是将lua脚本转换成字节码,而字节码则是虚拟机最终执行的代码。所以,语法分析器子系统包括一个词法分析器,一个语法分析器,和一个字节码生成器。
The lexer and parser follow the compiler reference model. In this model, the lexer is responsible for obtaining tokens, which are separate identifiable parts of the script file such as constants, operators, keywords, and others. The parser is responsible for analyzing the structure in which the tokens are disposed in the file in order to compose commands. In addition, the parser can generate error messages for invalid command constructions.
词法分析器和语法分析器使用编译引用模型。在此模型中,词法分析器负责获取标志符,也就是脚本文件中分开的可确认的部分,例如,常量,操作符,关键字等等。语法分析器负责分析文件中标志符放置的结构以组成命令。
The Lua parser uses only one-pass to maximize performance. This is viable because Lua does not perform type checking because all variables are untyped. In addition, function existence and prototype verification of calls are not checked during parsing. However, this solution causes problems during the programming process because many errors are only caught during run-time.
lua语法分析器使用一次遍历以最大化性能。这是可行的,因为lua并不进行类型检查——所有变量都是无类型的。此外,函数存在性与调用的原型验证在语法分析中也不检查。但是,这种解决方案也导致程序运行时存在问题,因为许多错误只有在运行时才会被捕获到。
The parser subsystem defines the concept of a chunk. A chunk is a unit of execution which consists of a series of command statements. The parser is designed to work on one chunk at a time. Chunks can be provided by the host program in the form of a file or a string. When Lua script is supplied incrementally (for example by providing a sequence of separate strings to be interpreted), a new chunk is created every time. In each of the chunks the user can specify direct statements, functions, local variables, and return values.
语法分析器子系统定义了一个称之为代码块(chunk)的概念。代码块是代码执行的单位,包含一系列命令语句。语法分析器一次处理一个 chunk。代码块可以以文件或者字符串的形式,由宿主程序(YoungMan注:lua被嵌入到应用程序中,所以应用程序也可以被称为宿主程序,lua 称为寄生程序。)提供。当脚本被一点一点提供(例如,一组各自独立的字符串需要被解释)的时候,每次都创建一个新的块。在每一个块中,用户可以直接指定语 句、函数、局部变量,或者返回值。
The parser will generate bytecode for each chunk that it interprets. Chunks can then be passed on to the virtual machine for execution. This can be passed directly through memory or each chunk can be persisted to disk as precompiled bytecodes for later execution in the virtual machine.
语法分析器为它解释的每一个代码块生成字节码。然后块可以传入虚拟机中执行。这可以是直接通过内存传递给虚拟机;或者也可以作为预编译字节码保存到磁盘中,等待以后虚拟机执行使用。
Virtual Machine虚拟机The virtual machine subsystem is responsible for executing bytecodes generated by the parser subsystem. Because most processor time is spent in the virtual machine subsystem after the Lua script is loaded, the design of the virtual machine subsystem has a large impact on overall system performance. Furthermore, the virtual machine subsystem cannot be removed from a minimal Lua interpreter. As a result, the virtual machine subsystem needs to be compact and have good performance.
虚拟机子系统负责执行由语法分析器子系统生成的字节码。由于在lua脚本加载之后,处理器的时间大部分都花费在虚拟机子系统上,虚拟机子系统的设计对总性能有非常大的影响。除此之外,虚拟机子系统不能从最小lua解释器中移除。因此,虚拟机子系统必须紧密且性能良好。
The virtual machine kernel can read and execute bytecodes. To support this functionality, the virtual machine kernel continuously loops for the next operation to execute. The loop identifies the type of operation and performs an instruction-specific task to execute it. Several modules, such as the garbage collector, table, and closure modules, are invoked by this loop to assist the virtual machine kernel.
虚拟机内核能读取和执行字节码。要支持这项功能,虚拟机内核不断循环以执行下一个操作。这个循环识别操作的类型然后运行特定指令以执行程序。这个循环会引用到几个模块,例如垃圾收集器,table,以及闭包模块,以辅助虚拟机内核。
Bytecode operations have been carefully designed to ensure subsystem compactness and performance. All instructions are 32-bit long and can contain from one to three arguments. The arguments are typically denoted as A, B, and C. Some instructions combine the B and C into a bigger argument called Bx. All opcodes have a constant length.
字节码的相关操作设计得十分细致,以确保子系统的紧密性与性能。所有指令都是32位长,有1~3个参数。参数以A、B、C表示。部分指令将B和C合成一个大的参数称之为Bx。所有操作码都长度固定。
The virtual machine organizes the data that it must handle in a global table, a local constant table, and several registers . The purpose of the global table is to store values that be accessed anywhere in a Lua script. The constant table stores constant data (such as strings, numbers, etc.) so that virtual machine operations do not need to directly use constant values. Data in the constant table is indexed by position. Multiple references to the same constant have the same position in the constant table.
虚拟机将其需要处理的数据组织在一张全局表,一个局部常量表,和一些寄存器中。全局表的目的是存储脚本中任何地方都可能用到的数据。常量表存储 常量数据(如,字符串,数值,等),这样虚拟机不需要直接使用常量值。常量表中的数据通过位置进行索引。对同一常量的多次引用会指向常量表的同一位置。
Stacks are used during function calls and are created for each closure. A closure is basically an instance of a function. Local variables defined in a closure reside in the closure’s corresponding stack for the duration of the closure. Moreover, designated variables that reside in the outer scope (usually called upvalues) can be accessed. When the function is completed, the closure is finished and the corresponding stack is cleaned. In the next section we explain some of the problems that need to be addressed when a function finishes.
每一个闭包都会创建一个栈,然后在函数调用中使用它。一个闭包基本上是一个函数实例。在整个闭包存在的过程中,在闭包中定义的局部变量将一直存 在于闭包相应的栈中。此外,存在于作用域以外的指定变量(也称为非局部的变量,即upvalue)也可以被访问到。当函数结束的时候,闭包也就结束,相应 的栈也会被清理。在下一节,我们会解释函数结束的时候一些需要说明的问题。
Finally, in appendix A we describe two examples of how a Lua script is translated into virtual machine instructions. Additional information about the Lua virtual machine and its bytecodes is presented in the document “A No-Frills Introduction to Lua 5 VM Instructions” [20].
最后,在附录A中,我们描述了两个例子,说明lua脚本是如何转换成虚拟机指令的。更多关于lua虚拟机与其字节码的信息保存于文档“A No-Frills Introduction to Lua 5 VM Instructions” [20]中。
Closures闭包Lua defines the scope of variables as global or local. Global variables can be accessed anywhere during the execution of a script. The scope of local variables begins at the first statement after their declaration and lasts until the end of the innermost block that includes the declaration. Variables that are in the outer scope can always be accessed from the inner scope.
lua定义变量的作用域为全局的或者局部的。全局变量在脚本执行中任意时刻都可以访问。局部变量的作用域起始于声明语句,直到包含声明的最里层代码块结束为止。外层作用域声明的变量在内层作用域总可以访问。
Functions in Lua are first class values. This means that functions can be assigned to variables just like any other value. In addition, functions can be defined inside other functions and returned as values. As a result, nested functions can be executed even after their containing block has finished executing. Because the nested function can access data outside of its local scope, Lua cannot discard a block’s variables after it finishes executing.
lua中的函数是第一类值。这意味着函数可以可以作为变量赋值,就像其他值一样。此外,函数可以在其他函数内部定义,并且作为值返回。这样,嵌 套函数即使在他们的包含块结束执行之后也可以被执行。由于嵌套函数可以访问其内部作用域以外的数据,lua不能在块结束之后就丢弃块中的变量。
Usually, a stack holds local variables; when the function call ends, the values are removed from the stack and destroyed. Because of Lua’s scope visibility rules, these variables cannot be immediately released. Lua solves the problem by using a stack to maintain local variables when the function is executing. In addition, Lua keeps a linked list of pending variables that will be checked for possible references after the block executes. After the block executes, Lua checks which local variables are still needed. Finally, variables that are still referenced are moved from the stack to the heap.
一般来说,栈中保存局部变量;当函数调用结束,这些值就从栈中移除并销毁。由于lua的作用域可见性规则,这些变量不能立即释放。为了解决这个 问题,lua在函数执行中使用栈来保持局部变量。此外,lua保持对不确定变量的引用列表,在块执行结束后会对这些不确定变量检查是否可能存在引用做检 查。
The left side of Figure 3 presents the internal Lua organization during a function call. Both nested and enclosing functions access local variables from the stack to guarantee consistency. The right side of Figure 3 shows what happens after the function ends: the stack removed the function variable and moved it to the heap. This allows more than one nested function to access external local variables of outer scopes.
下图3左侧表示在函数调用中lua的内部组织结构。嵌套的和封闭的函数都从栈中访问局部变量以保证一致性。图3的右侧展示了,在函数结束后将会发生的事情:栈移除函数变量,并将其挪入堆中。这样就允许不止一个嵌套函数访问非局部的变量。
Figure 3: Lua Closures
图3:lua闭包
The fact that Lua allows local function variables to persist through nested functions (even after the outer function has ended) provides Lua with an additional benefit. Lua can act in an object-oriented style by allowing applications to restrict variable accesses to specific functions.
由于lua允许局部函数变量贯穿于嵌套函数(即使在外部函数结束之后)存在(YoungMan注:也就是非局部的变量),这也给Lua带来一些额外的好处。
Garbage Collector垃圾收集器Lua uses a generic mark and sweep algorithm to implement garbage collection. The amount of data consumed during the execution of a Lua script is tracked. When a threshold value is reached, the garbage collector frees all data that is no longer referenced. However, this approach has a drawback. The garbage collection operation can occur at any time, creating problems when real-time applications use Lua. Furthermore, the garbage collector is executed very frequently for applications that use a lot of memory. For these reasons, major changes are expected for the garbage collector in Lua version 5.1.
Lua使用一个泛型的标记与清扫算法实现垃圾收集。lua会跟踪在脚本执行中被销毁的数据。当达到阀值时,垃圾收集器就会释放不再引用的所有数 据。然而,这种方法有一个缺点。垃圾操作会在任意时刻进行,这样在实时程序使用Lua时就会带来问题。更严重的是,垃圾收集器在消耗大量内存的应用程序中 执行十分频繁。由于这些原因,预期在lua5.1中垃圾收集器是修改的重点。
Table表Tables are the only data structure in the Lua language. Tables are a powerful construct that maintains language simplicity and generality. Internally, a table is implemented as two separate parts: a hash table and an array.
table是lua语言中唯一的数据结构。table是提供语言简单性与广泛性的强大结构。就内部实现来说,一个table由两个独立的部分组成:一个hash表和一个数组。
Non-negative integer keys are all candidates to be kept in the array part. The actual size of the array is the largest index value such that at least half the slots between zero and this value are in use. 非负整数键将被存放在数组部分。数组的实际大小是最大索引值,这样0与最大索引值间至少一半的槽位都在使用中。 The hash function uses a chained scatter table with Brent’s variation [21]. If a table element is not in its main position (i.e. the position given by the hash function) then a colliding element must be in the objects main position. In other words, there are collisions only when two elements have the same main position. Because of this, the load factor of these tables can be 100% without performance penalties.
哈希表使用Brent变种的链表保存不连续的数据。如果表中元素不在其主位置(例如,hash函数给定的位置)那么此处必定是存储了一个冲突对象。换句话说,当两个元素具有同一主位置时,就产生了冲突。因此,这些表的负载系数100%没有性能损耗。
The data structure is very efficient. Nevertheless, like all hash tables, its performance depends on the distribution of hashes. So far, the Lua community has not reported any problems with string hashes.
这种数据结构非常高效。不仅如此,像所有的hash表,其性能依赖于哈希分布。到目前为止,lua社区没有反馈说字符串哈希存在问题。
Figure 4 shows an example of a Lua table which is composed of a hash table and an array. As shown in Figure 4, the fifth numeric value of the array can either be stored in the hash table or in the array. Numeric keys which do not fall within the capacity of the array part are handled normally. This mechanism is transparent to other Lua modules.
图4展示了一个例子,说明lua是由hash表和一个数组组成。如图4所示,数组中的第五个数值元素或者存于hash表中,或者存于数组中。散落在数组容量之外的数值键也会被正常处理。这种方法对lua其他模块完全透明。
Figure 4: Example of a Lua table internal organization
图4:lua表内部组成的示例
Standard Library标准库Lua includes a standard library containing functionality needed by several applications such as string pattern matching, file I/O, and basic math support. In addition, the standard library interfaces with the Lua runtime to help support several language features such as coroutines and tables. For more information, see the Lua reference manual [7].
lua包括一个标准库,其包含一些应用程序需要的功能,例如字符串模式匹配,文件I/O,以及基本的数学支持。此外,标准库会与lua运行时进行交互以帮助支持一些诸如协程、表之类的语言特性。更多的信息请阅读lua参考手册。
Dynamic Libraries动态库Dynamic Library support provides a key mechanism to extend Lua. It allows applications that embed Lua to expand and reduce the functionality available to scripts. As a result, dynamic libraries help Lua achieve its compactness and flexibility requirements.
对动态库的支持提供了扩展lua的关键手段。它允许嵌入lua的程序扩展或减少脚本中可以使用的功能。
Applications embedding Lua initiate the process to load libraries. Libraries registered by the embedding application can be linked statically or dynamically. This allows applications to link only the subset of libraries it needs. Usually, the application initiates the extension library by calling an initialization function that registers global tables, variables and/or functions in the Lua’s script environment.
嵌入lua的应用程序初始化装载库的过程。由应用程序注册的库可以被动态或者静态链接上。这允许应用程序按需链接库的子集。一般来说,应用程序通过调用一个注册全局表、变量或函数的初始化函数来初始化扩展库。
Dynamic libraries can also be loaded directly from the Lua environment. This is done by calling the loadlib function with the filename of the dynamic library and the name of its initialization function. The use of this function allows the user to locate the library and call its initialization function.
动态库也可以直接从lua环境中直接加载。这通过调用loadlib函数,传入动态库的文件名与其初始化函数名称来完成。
Lua’s library architecture also enables loading parts of a library. For instance, the standard Lua functions include separate calls to initialize the coroutine library, the auxiliary library, string matching library, library loader, etc. This allows application greater control over what functions are available to script it embeds.
lua的库架构也允许仅加载部分库。例如,标准的lua函数会各自独立地初始化协程库、辅助库、字符串匹配库、库加载器等等。这允许应用程序从很大程度上控制哪些函数是其嵌入的脚本可用的。
As an example, consider an application that wants to allow its script to access every part of the standard Lua libraries except the dynamic library loader. The application should call the every library initialization in the Lua standard library except luaopen_loadlib.
举个例子,如果一个应用用程序需要允许其脚本访问标准的任何部分,除开动态库加载器,那么它应该调用除luaopen_loadlib以外所有lua标准库的初始化函数。
Coroutines协程Lua supports asymmetric coroutines [19]. The virtual machine kernel is responsible for maintaining state associated with the virtual machine subsystem. State managed by the virtual machine kernel consists of one or more execution states, each of which can fully specify the state of the processor. This design allows coroutines to be implemented by creating new execution states each time a new coroutine is loaded and the active execution state is changed when a coroutine yields.
lua支持非对称协成。虚拟机内核负责维持虚拟机子系统的相关状态。状态由虚拟机内核管理,包括1个或多个执行态,每一个执行态完全指定处理器的状态。这种设计允许在新协程加载或活动执行态改变时(当其他协程让出时)通过创建新执行态来实现协程。
Conclusion结束语Lua has archived its goal of simplicity, extensibility, portability, reliability, and performance.
Lua已经将达到其简明性、扩展性、轻便性、可靠性与性能方面的目标(YoungMan注:archived…无论如何也不可能是归档的意思,应该是achieve,达到,难道单词写错了?)。
Most goals derive from the fact that the language is simple. Simplicity was achived by allowing the user to create language features through the use of meta-mechanisms instead of implementing features directly in the language. Meta-mechanisms allow users to implement inheritance, namespace, and debugging. Other features such as type checking are not included in Lua.
Lua的大部分设计目标能实现是因为Lua本身十分简单。Lua允许用户通过使用元表或者元方法来创建语言特性而不是直接在语言中实现这些特性,这样便有了其简明性。元表或者元方法允许用户实现继承、名字空间,以及debug。lua并不包含其他诸如类型检查之类的特性。
Extensibility was accomplished by a comprehensive interface between Lua and C/C++. The API allows Lua to call functions in C, create objects in C++, and call methods in C++ objects. Furthermore, C/C++ applications can easily call Lua functions and access data in variables, tables, and metamethods. The success of extending the Lua library is demonstrated by the number of libraries that extend Lua’s functionalities [13,14,15,16].
扩展性通过在Lua与C/C++之间的综合性接口完成。这些API允许lua在C中调用函数,在C++中创建对象,并在C++对象中调用方法。 更棒的是,C/C++应用程序可以轻松地调用lua函数,并访问存储在变量、table中的数据与元方法。数学库对lua的功能性扩展说明了扩展lua库 的成功之处。
Reliability was achieved by extensively using tests to certify language stability. As described in History of Lua [3], “to keep a language is much more than to design it. Complete attention to detail is essential in all aspects.” In addition, reliability was achieved by using an established reference model of a compiler.
可靠性则是通过广泛的测试来证明其稳定性来做到的。如Lua发展史(文档)中所述,”比起设计一门语言,更重要的是保持好它。不论就哪方面来说,对细节的全面关注都是必须的。”。此外,可靠性还通过使用编译器的创建引用模型实现。
The Lua architecture places a significant emphasis on performance. Performance tactics include using a virtual machine that can receive precompiled opcodes and implementing the lexer and parser by hand. The virtual machine allows applications to completely bypass the normal parse sequence when the Lua scripts have been precompiled. Implementing the lexer and parser by hand improves performance when a script is loaded by providing more efficient components than automated tools could provide.
Lua的架构非常强调高性能。高性能策略包括使用虚拟机来接收预编译操作码以及手工实现词法、语法分析器。虚拟机使得在脚本预编译后,应用程序能完全绕开词法分析的过程。与自动化工具相比,手工实现的词法、语法分析器组件性能高效得多,这样在脚本加载时性能有很大改善。
Appendix A: VM Code Generation Examples附录 A:虚拟机代码生成示例The virtual machine supports 51 operations, although most programs use a small subset of these operations. Lua VM operations can be generated in a human readable form by running the external bytecode generator (called luac) with the “-p” and “-l” options.
虚拟机支持51种操作,尽管大部分程序仅使用这些操作的一个微小子集。Lua虚拟机操作通过使用”-p”和”-l”选项运行外部字节码生成器生成,生成结果是可读的。
First, consider the simple Lua script below:
首先,考虑下面这个简单的lua脚本:
y = 5 print(y) Running this script through luac with the “-p” and “-l” generates the following bytecodes:
以”-p”和”-l”选项在luac(YoungMan注:即lua compiler。编译lua源码会得到两个exe,这是其中之一。)中执行脚本会生成以下字节码:
LOADK 0 1 ; 5 SETGLOBAL 0 0 ; y GETGLOBAL 0 2 ; print GETGLOBAL 1 0 ; y CALL 0 2 1 RETURN 0 1 0 The LOADK operation loads a value from the constants table and places it in a register. Its first argument is the stack register to store the result in and its second argument is an index into the global constants table. The GETGLOBAL operation reads a value from the globals table. Its result is stored in the register specified by the first argument. The second argument is a reference to a constant table entry that has a pointer to a global value. The SETGLOBAL operation sets a value in the globals table. Its first argument is a register that has the value to be saved in the globals table. The second argument is an index into the constant table that has a pointer to a value in the globals table. The CALL operation calls a function. Its first argument gives the register that has a pointer to the function. The function arguments are placed sequentially in registers after the function pointer. Its second argument contains the number of function arguments plus one. Its third argument contains the number of values that the function returns. The return values are placed sequentially in the registers starting at the register specified for the first argument. Finally, RETURN is called when a function is finished. The first argument is the stack position of the returned first value; the other returned values are in the following positions. The second argument of the RETURN statement is the number of returned values plus one.
LOADK操作从常量表中加载变量并放入到寄存器中。它的第一个参数是要存放结果的栈寄存器,第二个参数则是全局常量表中的索引位置。 GETGLOBAL操作从全局表中读取值。其结果存放在第一个参数指定的寄存器中。该操作的第二个参数是指向常量表入口(entry)的引用,常量表中有 指向全局值的指针。SETGLOBAL操作将值存入全局表中,第一个参数是一个寄存器,其中存放需要被存放到全局表的值,第二个是常量表中索引位置,常量 表中有指向全局表中值的指针。CALL操作调用函数。第一个参数指定了存放函数指针的寄存器。函数参数则按序放置于函数指针之后的寄存器中。第二个参数则 是函数参数个数加1。第三个参数是函数需要返回的值的个数(YoungMan注:此处应该是错误,从后面来看,应当是个数再加1)。返回值按序放置于第一 个参数指定的及其之后的寄存器中。最后,函数完成时RETURN被调用。第一个参数是第一个返回值在栈中的位置;其他返回值则紧随之后。RETURN表达 式的第二个参数则是返回值个数加1.
In the example, the first two operations are responsible for setting y to 5. The first operation loads constant #1 and places it in register #0. The second operation takes register #0 and places it in a list of globally accessible values indexed by constant #0. From these opcodes, we can determine that constant 0 contains the global value index to y and constant 1 contains the number 5.
在这个例子中,先两步操作负责将y赋值为5。第一步加载常量#1并放入寄存器#0,第二步将#0寄存器的值放入全局表中,索引位置由常量#0指 定(YoungMan注:#0是指命令中的”0″,参看上面的字节码命令的图示)。通过这些操作码,我们可以确定常量#0中存放对全局值Y的索引,而常 量#1中存放数值5。
The next four operations of the example are used to call the print function. The first GETGLOBAL operation retrieves the print function from the global table and places it in register #0. The second operation places y in register #1. Finally, we execute the CALL operation. The CALL operation expects the called function and all of its arguments to be in sequential registers. Because the print function is stored in register #0 then the first argument must be stored in register #1. The third argument of the CALL function is the number of expected return values plus one.
示例中接下来的四步操作用来调用print函数。第一个GETGLOBAL操作从全局表中获取print函数,并放入寄存器#0。第二个 GETGLOBAL则将y放入寄存器#1。最后,执行CALL操作。CALL调用函数及放置于一列寄存器中的参数。由于print函数存放在#0寄存器 中,所以第一个参数只能存放在#1中。CALL的第三个参数是返回值个数加1。
Now, let’s consider a more complex example:
现在在看个更复杂的例子:
local x = 10 function test() x = 2 end luac generates the following bytecodes for this example:
luac为示例生成如下字节码:
main: LOADK 0 0 ; 10 CLOSURE 1 0 ; 0xa051940 MOVE 0 0 0 SETGLOBAL 1 1 ; test RETURN 0 1 0 test: LOADK 0 0 ; 2 SETUPVAL 0 0 0 ; x RETURN 0 1 0 This example uses three opcodes that the previous example did not: CLOSURE, SETUPVAL, and MOVE. The objective of the CLOSURE instruction is to create an instance of a function. Its first argument specifies the base register that the instantiated function references. The base is used to store a reference to the instantiated function. If the instantiated function has upvalues, one register for each upvalue will be reserved sequentially created after the base register. The CLOSURE operation’s second argument is the index into the table of function prototypes of the function we want to access.
这个例子中使用了前一例中没有用到的三个操作码:CLOSURE,SETUPVAL以及MOVE。CLOSURE指令的目的是创建函数实例。其 第一个参数指定实例化函数存放的基准寄存器。如果函数有非局部的变量,各个非局部变量将按序创建于基准寄存器之后的寄存器中。CLOSURE操作的第二个 参数是函数原型表中的索引,该索引处的函数正是我们想要访问的(YoungMan注:应该各个函数也是存放在一个函数列表中)。
The SETUPVAL instruction sets the value of a variable in |
请发表评论