We are offering some template engine to our customers on our portal. Attributes getting replaced with information from a data source.
Here's an example of how such template might look like, with different variations of quotes and whitespaces, nestings here and there:
<html>
<body>
<h1 class="title">Some Title</h1>
<div id="output">
[%if findthis1='123']
Bla bla bla ["findthis2"] Bla Bla Bla
[%elseif (findthis3 = "123")]
Bla Bla Bla ['findthis4'] Bla Bla Bla
[%elseif ( findthis5 = "123" )]
Bla Bla Bla [findthis6] Bla Bla Bla
[%elseif ( findthis7 = "123" OR findthis8 = 123 ) AND findthis9='123']
[findthis10] Bla Bla Bla
[%elseif ( findthis11 = "123" OR ( findthis12=123 AND findthis13='123' ) ]
Bla Bla Bla [findthis14]
[%endif]
[%uppercase findthis15]
[%lowercase findthis16 ]
</div>
</body>
</html>
Our goal is to get all words before the character =
between [%
and ]
where whitespaces might occur.
We stumbled upon this thread, this answer, but since it is made to find html attributes, we couldn't manage to reduce the pattern down to parts between [%
and ]
. And also, once there's a whitespace between the attribute and the =
, it does not match anymore.
How should we modify the regular expression as seen in the thread/answer to get the attributes like findthis1/3/5/7/8/9/11/12/13 without getting class and id, considering anything between [%
and ]
and with possible whitespaces? As for attributes findthis15 and findthis16 where there is no =
, we would like to find another regular expression for that.
EDIT: I forgot to mention 2 things:
- findthis-Attributes can be anything like "email" or "firstname"
- There are also Operators like <=, >= and !=
EDIT 2: Right now, I am thinking about using multiple regular expressions. First one would be [\%(.)*]
, which would get me all lines starting with [% and ending with ]. I am trying to figure out the next regular expression to check if there are operators in it, or if it is one of these lines like [%uppercase findthis15]
.
EDIT 3: 2nd Regular Expression of 3 would look like this this:
(S+)+[ ]*((=|<>|!=))
EDIT 4: Okay, after some experimenting, we still couldn't manage to improve the regular expression to achieve our goals.
By using /[\%(if|elseif)(.)*?(])/
, we are getting something like this (please ignore the fact that I am using a different line compared to the ones above):
[%if hello="abc" OR ( (stack=123 AND overflow = "bla") OR (how= 'bla' AND are ='bla') AND you = 'xyz' )]
But now, the final step is to get the words "hello", "stack", "overflow", "how", "are" and "you" by using PHP's preg_match function.
The following (wrong) regular expression is way too greedy:
( |()+(?:(?!(=|<|<|>|>|<=|>=|<>|<>|!=)).)*
What are we missing in this final regular expression?
question from:
https://stackoverflow.com/questions/65643254/regular-expression-to-get-all-words-before-a-single-certain-character-with-o