Scripting Helpers is winding down operations and is now read-only. More info→
Ad
Log in to vote
2

How can I create my own string patterns for string manipulation?

Asked by 6 years ago

I have been working on a parser and one issue I have run into is string patterns (I am not even sure that is what they are called). When parsing I have come across those so called string patterns and here is a link to my other question. In the question there is a s:gsub("%(([^()]+)%)", "substitute") and I was given the %(([^()]+)%). My current question is how do I construct things like %(([^()]+)%)? In my Lua 5.2 reference manual it says that %a represents all characters, c% represents all control characters, %d represents digits, and so on. How do I create my own pattern finders that look like %(([^()]+)%) and are more complex than just %w? I would appreciate any help. Thanks! Note: I am not even sure if %(([^()]+)%) is what I want because I don't know how it works.

0
for reference they're called "regular expressions" i believe User#22604 1 — 6y
0
Ok, thanks! User#21908 42 — 6y

2 answers

Log in to vote
1
Answered by 6 years ago
Edited 6 years ago

Lua site has the information you need! Documentation, More documentation, Tutorial Since the documentation link has all the patterns on it, I'll just go over what your regular expression means. When it says %(, that means a literal parenthesis, rather than a capture. The next parenthesis is the opening of a capture, meaning that is what will be returned by the string.match, returned as extra variable(s) by string.find, or can be used in string.gsub. All of these can be used along with %1-%9 to create some pretty cool effects. Lastly, the part inside your capture ([^()]+) makes use of the not character, meaning it will be any number (due to the +) of non-parentheses (due to the ()) characters. The brackets make the ^() be a single character so that both ^( and ^) are part of the part being extended.

Edit: The last %) just refers to another literal parenthesis, so overall your pattern matches to parentheses with any non-parenthesis characters between them. A better way of doing this might be %b(), but this matches any text between parentheses (e.g. it would match all of (guys()whatsup) which may be worth looking into).

0
Thank you! User#21908 42 — 6y
Ad
Log in to vote
-1
Answered by
ABK2017 406 Moderation Voter
6 years ago
Edited 6 years ago

This is what I have in my notes file, I dont know if its quite what youre looking for, but I think string captures are relevant. This is more of a comment than an answer, and it’s also way above my pay grade.

http://lua-users.org/wiki/StringLibraryTutorial

Just like string.find() we can use patterns to search in strings. Patterns are covered in the PatternsTutorial. If a capture is used this can be referenced in the replacement string using the notation %capture_index, e.g.,

= string.gsub("banana", "(an)", "%1-") -- capture any occurences of "an" and replace ban-an-a 2 = string.gsub("banana", "a(n)", "a(%1)") -- brackets around n's which follow a's ba(n)a(n)a 2 = string.gsub("banana", "(a)(n)", "%2%1") -- reverse any "an"s bnanaa 2 If the replacement is a function, not a string, the arguments passed to the function are any captures that are made. If the function returns a string, the value returned is substituted back into the string.

= string.gsub("Hello Lua user", "(%w+)", print) -- print any words found Hello Lua user 3 = string.gsub("Hello Lua user", "(%w+)", function(w) return string.len(w) end) -- replace with lengths 5 3 4 3 = string.gsub("banana", "(a)", string.upper) -- make all "a"s found uppercase bAnAnA 3 = string.gsub("banana", "(a)(n)", function(a,b) return b..a end) -- reverse any "an"s bnanaa 2 Pattern capture: The most commonly seen pattern capture instances could be

"(.-)", e.g. "{(.-)}" means capture any characters between the curly braces {} (lazy match, i.e. as few characters as possible) "(.)", e.g. "{(.)}" means capture any characters between the curly braces {} (greedy match, i.e. as many characters as possible)

= string.gsub("The big {brown} fox jumped {over} the lazy {dog}.","{(.-)}", function(a) print(a) end ) brown over dog

= string.gsub("The big {brown} fox jumped {over} the lazy {dog}.","{(.*)}", function(a) print(a) end ) brown} fox jumped {over} the lazy {dog

1
did you copy again without giving credit User#19524 175 — 6y
0
This doesn't really answer his question and you didn't explain why it does User#22604 1 — 6y
0
I thought it was clear that I didn’t write that, but perhaps not, I found where I got it and edited my answer to include the address ABK2017 406 — 6y
0
I agree, I said it was really a comment, suggesting looking at the string captures as that was part of his question regarding patterns ABK2017 406 — 6y

Answer this question