Regex for nested XML attributes -


lets have following string:

"<aa v={<dd>sop</dd>} z={ <bb y={ <cc x={st}>abc</cc> }></bb> }></aa>"

how can write general purpose regex (tag names change, attribute names change) match content inside {}, either <dd>sop</dd> or <bb y={ <cc x={st}>abc</cc> }></bb>.

regex wrote "(\s*\w*=\s*\{)\s*(<.*>)\s*(\})" matches

"<dd>sop</dd>} z={ <bb y={ <cc x={st}>abc</cc> }></bb>" not correct.

in generic regex there's no way handle nesting in way. hence wining when question comes - never use regex parse xml/html.

in simple cases might advantageous though. if, in example, there's limited number of levels of nesting, can quite add 1 regex each level.

now let's in steps. handle first un-nested attribute can use

{[^}]*} 

this matches starting brace followed number of but closing brace, followed closing brace. simplicity i'm gonna put heart of in non capturing group, like

{(?:[^}])*} 

this because when inserting alternate ones, it's needed.

if allow anything closing brace ([^}]) nested level of braces , join first regex, like

{(?:{[^}]*}|[^}])*}     ^^^^^^^    original regex inserted alternative (to self) 

it allows 1 level of nesting. doing same again, joining regex alternative itself, like

{(?:{(?:{[^}]*}|[^}])*}|{[^}]*}|[^}])*}         ^^^^^^^^^^^^^^^    previous level repeated 

will allow level of nesting. can repeated more levels if wanted.

this doesn't handle capture of attribute names , stuff though, because question isn't quite clear on want there, shows 1 way (i.m.o. easiest understand, or... :p) handle nesting in regex.

you can see handle example here @ regex101.

regards


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -