Regex for nested XML attributes -
lets have following string:
"<aa v={<dd>sop</dd>} z={ <bb y={ <cc x={st}>abc</cc> }></bb> }></aa>"
how can write general purpose regex (tag names change, attribute names change) match content inside {}
, either <dd>sop</dd>
or <bb y={ <cc x={st}>abc</cc> }></bb>
.
regex wrote "(\s*\w*=\s*\{)\s*(<.*>)\s*(\})"
matches
"<dd>sop</dd>} z={ <bb y={ <cc x={st}>abc</cc> }></bb>"
not correct.
in generic regex there's no way handle nesting in way. hence wining when question comes - never use regex parse xml/html.
in simple cases might advantageous though. if, in example, there's limited number of levels of nesting, can quite add 1 regex each level.
now let's in steps. handle first un-nested attribute can use
{[^}]*}
this matches starting brace followed number of but closing brace, followed closing brace. simplicity i'm gonna put heart of in non capturing group, like
{(?:[^}])*}
this because when inserting alternate ones, it's needed.
if allow anything closing brace ([^}]
) nested level of braces , join first regex, like
{(?:{[^}]*}|[^}])*} ^^^^^^^ original regex inserted alternative (to self)
it allows 1 level of nesting. doing same again, joining regex alternative itself, like
{(?:{(?:{[^}]*}|[^}])*}|{[^}]*}|[^}])*} ^^^^^^^^^^^^^^^ previous level repeated
will allow level of nesting. can repeated more levels if wanted.
this doesn't handle capture of attribute names , stuff though, because question isn't quite clear on want there, shows 1 way (i.m.o. easiest understand, or... :p) handle nesting in regex.
you can see handle example here @ regex101.
regards
Comments
Post a Comment