How to refactor regex in Perl -
i have following sentences:
text <mir-1> ggg-33 <exp-v-3> text text <vaccvirus-prop-1> other. text <mir-1> text <assc-phrase-1> text <vaccvirus-prop-1> other <pattern-1> other.
what want create single regular expression (regex) can match 2 sentences above. note differing pattern in above sentences middle factor <exp-v-3>
, <assc-phrase-1>
.
i'm stucked current attempt, matched them in 2 redundant regex. what's right way it?
use data::dumper; @sent = ("text <mir-1> ggg-33 <exp-v-3> text text <vaccvirus-prop-1> other.", " text <mir-1> text <assc-phrase-1> text <vaccvirus-prop-1> other <pattern-1> other."); foreach $sent (@sent) { if ( $sent =~ /.*<mir-\d+>.*<exp-v-\d+>.*<vaccvirus-prop-\d+>.*/gi ) { print "$sent\n"; } elsif( $sent =~ /.*<mir-\d+>.*<assc-phrase-\d+>.*<vaccvirus-prop-\d+>/gi ) { print "$sent\n"; } }
(?:xxx|yyy)\s*<mir-1>\s*(?:xxx|yyy)\s*(?:<exp-v-3>|<assc-phrase-1>)\s*(?:xxxx|yyy)\s*<vaccvirus-prop-1>
maybe regexp not optimized, work.
ok, here:
first magic:
(?:expr) - capture group not captured # <?:> helps avoid capturing
second magic:
(a|b|c) - choose metasymbol in work. choose between <a> or <b> or <c>
third magic:
generalization:
.+?\s*<mir-\d+>\s*.+?\s*(?:<exp-v-\d+>|<assc-phrase-\d+>)\s*.+?\s*<vaccvirus-prop-\d+>.+
and example:
reject string:
.+?\s*<mir-\d+>\s*[^\[]+?\s*(?:<exp-v-\d+>|<assc-phrase-\d+>)\s*[^\]]+?\s*<vaccvirus-prop-\d+>.+
fourth magic:
[^symbols] - class of symbols. <^> @ beginning mean 'i don't want match them'.
here example:
[abc]{1} - match <a> or <b> or <c> [^abc]{1} - not match <a> or <b> or <c>
Comments
Post a Comment