James Housden
2014-02-14 13:43:13 UTC
Hello,
I am trying to parse a block of text lines where the lines of interest
always begin with a keyword from a set of known keywords. All other
lines can be ignored, even if they contain the keyword which is not the
first entry in that line.
The code that follows almost works but stops when it meets an unwanted
line containing one of the known keywords ('kw1' and 'kw2').
from pyparsing import *
def main():
# A test string where I want to match all lines starting with
# a keyword 'kw1' or 'kw2'.
# Other lines should not be matched.
test_string_1 = """
An unwanted line can contain anything
kw2 par1
kw1 par1 2
another unwanted line
kw1 opt 1
another unwanted line that contains a kw1
kw2 h1
yet another unwanted line
kw1 = Literal("kw1")
kw2 = Literal("kw2")
keywords = (kw1 | kw2)
kw1_record = (kw1 + Word(alphanums) + Word(nums) +
restOfLine.suppress() + LineEnd().suppress())
kw2_record = (kw2 + Word(alphanums) +
restOfLine.suppress() + LineEnd().suppress())
valid_records = (kw1_record | kw2_record)
record = Group(SkipTo(keywords, include=False,
ignore=None, failOn=None).suppress() +
valid_records)
all_records = ZeroOrMore(record)
res = all_records.parseString(test_string_1)
for entry in res:
print entry
if __name__ == '__main__':
main()
The output from this code is
['kw2', 'par1']
['kw1', 'par1', '2']
['kw1', 'opt', '1']
What is missing from the output is
['kw2', 'h1']
I am new to pyparsing and so I am probably missing something obvious. Is
there a way to correct my code so that it does what I want? Or is there
a better way to achieve my aims?
I would be grateful for any suggestions.
Thanks in advance,
James
I am trying to parse a block of text lines where the lines of interest
always begin with a keyword from a set of known keywords. All other
lines can be ignored, even if they contain the keyword which is not the
first entry in that line.
The code that follows almost works but stops when it meets an unwanted
line containing one of the known keywords ('kw1' and 'kw2').
from pyparsing import *
def main():
# A test string where I want to match all lines starting with
# a keyword 'kw1' or 'kw2'.
# Other lines should not be matched.
test_string_1 = """
An unwanted line can contain anything
kw2 par1
kw1 par1 2
another unwanted line
kw1 opt 1
another unwanted line that contains a kw1
kw2 h1
yet another unwanted line
kw1 = Literal("kw1")
kw2 = Literal("kw2")
keywords = (kw1 | kw2)
kw1_record = (kw1 + Word(alphanums) + Word(nums) +
restOfLine.suppress() + LineEnd().suppress())
kw2_record = (kw2 + Word(alphanums) +
restOfLine.suppress() + LineEnd().suppress())
valid_records = (kw1_record | kw2_record)
record = Group(SkipTo(keywords, include=False,
ignore=None, failOn=None).suppress() +
valid_records)
all_records = ZeroOrMore(record)
res = all_records.parseString(test_string_1)
for entry in res:
print entry
if __name__ == '__main__':
main()
The output from this code is
['kw2', 'par1']
['kw1', 'par1', '2']
['kw1', 'opt', '1']
What is missing from the output is
['kw2', 'h1']
I am new to pyparsing and so I am probably missing something obvious. Is
there a way to correct my code so that it does what I want? Or is there
a better way to achieve my aims?
I would be grateful for any suggestions.
Thanks in advance,
James