python - re.findall not returning full match? -
i have file includes bunch of strings "size=xxx;". trying python's re module first time , bit mystified following behavior: if use pipe 'or' in regular expression, see bit of match returned. e.g.:
>>> myfile = open('testfile.txt','r').read() >>> print re.findall('size=50;',myfile) ['size=50;', 'size=50;', 'size=50;', 'size=50;'] >>> print re.findall('size=51;',myfile) ['size=51;', 'size=51;', 'size=51;'] >>> print re.findall('size=(50|51);',myfile) ['51', '51', '51', '50', '50', '50', '50'] >>> print re.findall(r'size=(50|51);',myfile) ['51', '51', '51', '50', '50', '50', '50']
the "size=" part of match gone. (yet used in search, otherwise there more results). doing wrong?
the problem have if regex re.findall
tries match captures groups (i.e. portions of regex enclosed in parentheses), groups returned, rather matched string.
one way solve issue use non-capturing groups (prefixed ?:
).
>>> import re >>> s = 'size=50;size=51;' >>> re.findall('size=(?:50|51);', s) ['size=50;', 'size=51;']
if regex re.findall
tries match not capture anything, returns whole of matched string.
although using character classes might simplest option in particular case, non-capturing groups provide more general solution.
Comments
Post a Comment