Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
210 views
in Technique[技术] by (71.8m points)

c++ - How to parse text for a DSL at compile time?

Yes. That's right. I want to be able to paste an expression like:

"a && b || c"

directly into source code as a string:

const std::string expression_text("a && b || c");

Create a lazily evaluated structure with it:

Expr expr(magical_function(expression_text));

then later on evaluate substituting in known values:

evaluate(expr, a, b, c);

I'd want to expand this little DSL later so does something a little more complicated using some non-C++ syntax so I can't simply hardcode my expression the simple way. The use case is that I'll be able to copy and paste the same logic from another module used in a different development area for another language rather than have to adapt it each time to follow C++ syntax.

If someone can get me started on at least how to do the above simple concept of 1 expression and 2 boolean operators that would be really appreciated.

Note: I posted this question due to feedback from another question I posted: How to parse DSL input to high performance expression template. Here I actually wanted an answer to a slightly different problem, but the comments provoked this specific question that I thought was worth posting as the potential answers are really worth documenting.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Disclaimer: I know nothing about metaparse, and very little about proto. The following code is my attempt (mostly via trial and error) to modify this example to do something similar to what you want.

The code can be easily divided in several parts:

1. The grammar


1.1 Token definitions

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;
  • token<Parser>:
    token is a parser combinator that uses Parser to parse the input and then consumes (and discards) all whitespaces afterwards. The result of the parsing is the result of Parser.
  • lit_c<char>:
    lit_c matches the specific char and the result of the parsing is that same char. In the grammar this result is overridden by the use of always.
typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;
  • keyword<metaparse_string,result_type=undefined>:
    keyword matches the specific metaparse_string (_S("true") returns metaparse::string<'t','r','u','e'> which is what metaparse uses internally to do its magic) and the result of the parsing is result_type.
typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

In the case of and_token and or_token the result is undefined and in the grammar below it is ignored.


1.2 "Rules" of the grammar

struct paren_exp;

First paren_exp is forward-declared.

typedef one_of< 
        paren_exp, 
        transform<true_token, build_value>,
        transform<false_token, build_value>, 
        always<arg1_token, arg<0> >,
        always<arg2_token, arg<1> >, 
        always<arg3_token, arg<2> > 
    >
    value_exp;
  • one_of<Parsers...>:
    one_of is a parser combinator that tries to match the input to one of its parameters. The result is what the first parser that matches returns.
  • transform<Parser,SemanticAction>:
    transform is a parser combinator that matches Parser. The result type is the result type of Parser transformed by SemanticAction.
  • always<Parser,NewResultType>:
    matches Parser, returns NewResultType.

    The equivalent spirit rule would be:

    value_exp = paren_exp [ _val=_1 ]
        | true_token      [ _val=build_value(_1) ]
        | false_token     [ _val=build_value(_1) ]
        | argN_token      [ _val=phx::construct<arg<N>>() ];
    
typedef one_of< 
        transform<last_of<not_token, value_exp>, build_not>, 
        value_exp
    >
    not_exp;
  • last_of<Parsers...>:
    last_of matches every one of the Parsers in sequence and its result type is the result type of the last parser.

    The equivalent spirit rule would be:

    not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] 
        | value_exp                          [ _val=_1 ];
    
typedef
foldl_start_with_parser<
        last_of<and_token, not_exp>,
        not_exp,
        build_and
    > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp);

typedef
foldl_start_with_parser<
    last_of<or_token, and_exp>,
    and_exp,
    build_or
> or_exp;     // or_exp = and_exp >> *(omit[or_token] >> and_exp);
  • foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>:
    this parser combinator matches InitialParser and then RepeatingParser multiple times until it fails. The result type is the result of mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>, where RepeatingParserSequence is a sequence of the result types of every application of RepeatingParser. If RepeatingParser never succeeds the result type is simply InitialParserResult.

    I believe (xd) that the equivalent spirit rule would be:

    or_exp = and_exp[_a=_1] 
        >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);  
    
struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; 
   // paren_exp = '(' >> or_exp >> ')';
  • middle_of<Parsers...>:
    this matches the sequence of Parsers and the result type is the result of the parser that is in the middle.
typedef last_of<repeated<space>, or_exp> expression; 
   //expression = omit[*space] >> or_exp;
  • repeated<Parser>:
    this parser combinator tries to match Parser multiple times. The result is a sequence of the result types of every application of the parser, if the parser fails on its first try the result is an empty sequence. This rule simply removes any leading whitespace.
typedef build_parser<entire_input<expression> > function_parser;

This line creates a metafunction that accepts an input string and returns the result of parsing.


2. Construction of the expression

Let's look at an example walkthrough of the building of an expression. This is done in two steps: first the grammar constructs a tree that depends on build_or, build_and, build_value, build_not and arg<N>. Once you get that type, you can get the proto expression using the proto_type typedef.

"a || !b"

We start on or_expr:

  • or_expr: We try its InitialParser which is and_expr.
    • and_expr: We try its InitialParser which is not_expr.
      • not_expr: not_token fails so we try value_expr.
        • value_expr: arg1_token succeeds. The return type is arg<0> and we go back to not_expr.
      • not_expr: the return type is not modified at this step. We go back to and_expr.
    • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and its return type is the return type of its InitialParser: arg<0>. We go back to or_expr.
    • or_expr: We try its RepeatingParser, or_token matches, we try and_expr.
    • and_expr: We try its InitialParser not_expr.
      • not_expr: not_token succeeds, we try value_expr.
        • value_expr: arg2_token succeeds. The return type is arg<1> and we go back to not_expr.
      • not_expr: the return type is modified by transform using build_not: build_not::apply< arg<1> >. We go back to and_expr.
    • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and returns build_not::apply< arg<1> >. We go back to or_expr.
  • or_expr: RepeatingParser has succeeded, foldlp uses build_or on build_not::apply< arg<1> > and arg<0>, obtaining build_or::apply< build_not::apply< arg<1> >, arg<0> >.

Once we have this tree constructed we get its proto_type:

build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type;
proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;

Full Sample Code (Running on Wandbox)

#include <iostream>
#include <vector>

#include <boost/metaparse/repeated.hpp>
#include <boost/metaparse/sequence.hpp>
#include <boost/metaparse/lit_c.hpp>
#include <boost/metaparse/last_of.hpp>
#include <boost/metaparse/middle_of.hpp>
#include <boost/metaparse/space.hpp>
#include <boost/metaparse/foldl_start_with_parser.hpp>
#include <boost/metaparse/one_of.hpp>
#include <boost/metaparse/token.hpp>
#include <boost/metaparse/entire_input.hpp>
#include <boost/metaparse/string.hpp>
#include <boost/metaparse/transform.hpp>
#include <boost/metaparse/always.hpp>
#include <boost/metaparse/build_parser.hpp>
#include <boost/metaparse/keyword.hpp>

#include <boost/mpl/apply_wrap.hpp>
#include <boost/mpl/front.hpp>
#include <boost/mpl/back.hpp>
#include <boost/mpl/bool.hpp>

#include <boost/proto/proto.hpp>
#include <boost/fusion/include/at.hpp>
#include <boost/fusion/include/make_vector.hpp>

using boost::metaparse::sequence;
using boost::metaparse::lit_c;
using boost::metaparse::last_of;
using boost::metaparse::middle_of;
using boost::metaparse::space;
using boost::metaparse::repeated;
using boost::metaparse::build_parser;
using boost::metaparse::foldl_start_with_parser;
using boost::metaparse::one_of;
using boost::metaparse::token;
using boost::metaparse::entire_input;
using boost::metaparse::transform;
using boost::metaparse::always;
using boost::metaparse::keyword;

using boost::mpl::apply_wrap1;
using boost::mpl::front;
using boost::mpl::back;
using boost::mpl::bool_;


struct build_or
{
    typedef build_or type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_or<t

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...