You might look at one of the variations on parsing that pyparsing
expressions can do.
The typical parser case is one which the parser handles all the input text.
It requires the most work because it has to handle everything in the input.
You can also write a pyparsing parser that only matches part of the input
file, and then scan or search for just those parts. I think this may be
suitable for your case. Look over the following code and see how
searchString and scanString return the matching lines, and how with
scanString (which returns a Python generator - if you're not familiar with
these, look it up), you can pull out the text between parses, since
scanString returns not only the matching text, but also the start and end
locations.
-- Paul
from pyparsing import *
line_of_words = OneOrMore(Word(alphas))
inputText = """\
sldjf lskjflsja lasdfljsdf owiuerowue ndf
122
1203 080182 0123 1023021 013802
02108
aslkjweoiur olsuaperu lsfiwuer kfdsldf
293749237
029 927397 2979 29793732974
9237
82739
sjfdhhwl oewr lwkejrlj wlehrnmb
34982 9392
"""
# find all groups of words using searchString
for line in line_of_words.searchString(inputText):
print line
# prints:
# ['sldjf', 'lskjflsja', 'lasdfljsdf', 'owiuerowue', 'ndf']
# ['aslkjweoiur', 'olsuaperu', 'lsfiwuer', 'kfdsldf']
# ['o']
# ['sjfdhhwl', 'oewr', 'lwkejrlj', 'wlehrnmb']
# find all groups and their start/end locations using scanString
for line,start,end in line_of_words.scanString(inputText):
print line
# prints:
# ['sldjf', 'lskjflsja', 'lasdfljsdf', 'owiuerowue', 'ndf']
# ['aslkjweoiur', 'olsuaperu', 'lsfiwuer', 'kfdsldf']
# ['o']
# ['sjfdhhwl', 'oewr', 'lwkejrlj', 'wlehrnmb']
# use scanString to associate intervening text with matched line
parsedData = []
scanner = line_of_words.scanString(inputText)
lastLine,lastStart,lastEnd = next(scanner)
for line, start, end in scanner:
parsedData.append((lastLine, inputText[lastEnd:start].splitlines()))
lastLine,lastEnd = line,end
# add final group after last parsed line
parsedData.append((lastLine, inputText[lastEnd:].splitlines()))
for line,data in parsedData:
print '-', ' '.join(line)
for d in data:
print ' ', d
# prints
#- sldjf lskjflsja lasdfljsdf owiuerowue ndf
#
# 122
# 1203 080182 0123 1023021 013802
# 02108
#
#- aslkjweoiur olsuaperu lsfiwuer kfdsldf
#
# 293749237
# 029 927397 2979 29793732974
# 9237
#- o
# 82739
#
#- sjfdhhwl oewr lwkejrlj wlehrnmb
#
# 34982 9392
#
-----Original Message-----
From: Hanchel Cheng [mailto:***@broadcom.com]
Sent: Tuesday, November 05, 2013 7:15 PM
To: pyparsing-***@lists.sourceforge.net
Subject: [Pyparsing] Using grammar as a condition for loop
Hello!
I have a text file in a structure like this:
######start#######
[line1 matching grammar]
#[text]
#[text]
[text]
[line2 matching grammar]
#[text]
[etc.]
#######end#######
There can be N amounts of lines with or without the # under each indent with
a line that matches the grammar.
I'm checking for the grammar, then I would like to check all the lines until
the next line that follows the grammar.
Something like...
for line in text_file:
if not(line matches grammar):
do something
Can pyparsing do this? If not, any suggestions? I can give more info if
necessary.
I really appreciate the help!
Kind regards,
Hanchel
----------------------------------------------------------------------------
--
November Webinars for C, C++, Fortran Developers Accelerate application
performance with scalable programming models. Explore techniques for
threading, error checking, porting, and tuning. Get the most from the latest
Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Pyparsing-users mailing list
Pyparsing-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyparsing-users
---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com