[Pyparsing] Problem with Or Longest Match (I think)...

Shane Magrath

2016-07-09 09:45:27 UTC

I'm developing a parser for the Graphviz DOT language and am having
problems with my STMT expression in the grammar fragment below.

In this simplified grammar a STMT can be either a SUBGRAPH or a NODE_STMT.
An example SUBGRAPH expression is "*subgraph cluster01 { n003 ; n004 ; }*"
which is as you can see a composite statement.

My problem is that whilst the SUBGRAPH expression will happily accept the
test example, the STMT expression will not though it is defined as below:

*STMT = SUBGRAPH("SUBGRAPH") ^ NODE_STMT("NODE")*

and the test code runs:
*Testing subgraph statements*
*Match SUBGRAPH at loc 0(1,1)*
*Match STMT_LIST at loc 20(1,21)*
*Matched STMT_LIST -> ['n003', 'n004']*
*Matched SUBGRAPH -> [['subgraph', 'cluster01'], ['n003', 'n004']]*
*([(['subgraph', 'cluster01'], {'SUBGRAPHNAME': [('cluster01', 1)]}),
(['n003', 'n004'], {'NODE': [('n003', 0), ('n004', 1)], 'STMT': [('n003',
0), ('n004', 1)]})], {})*

*Match STMT at loc 0(1,1)*
*Matched STMT -> ['subgraph']*
*Problem Test Sample: LINE= 1 COL= 10*
*subgraph cluster01 { n003 ; n004 ; }*
*ERROR: Expected end of text (at char 9), (line:1, col:10)*

My "belief" is that the STMT expression should preferentially match the
SUBGRAPH expression rather than the NODE_STMT expression but clearly is not.

What am I missing?

BTW - StackOverflow points available :
http://stackoverflow.com/questions/38258218/suspected-pyparsing-longest-match-error

Thanks :-)

*Grammar below:*

LCURL = Literal("{").suppress()
RCURL = Literal("}").suppress()
STMTSEP = Literal(";").suppress()
ID = Word(alphas, alphanums + "_")
SUBGRAPH_KW = Keyword("subgraph", caseless=True)
SUBGRAPH = Forward("SUBGRAPH")

NODE_ID = ID("NODE_ID")
NODE_STMT = NODE_ID("NODE")

STMT = SUBGRAPH("SUBGRAPH") ^ NODE_STMT("NODE")

STMT_LIST = ZeroOrMore(STMT("STMT") + Optional(STMTSEP))

SUBGRAPH << Group(SUBGRAPH_KW + ID("SUBGRAPHNAME")) + Group(LCURL +
STMT_LIST + RCURL)

######################################################
SUBGRAPH.setName("SUBGRAPH")
STMT.setName("STMT")
STMT_LIST.setName("STMT_LIST")
NODE_STMT.setName("NODE_STMT")
ID.setName("ID")
######################################################

print("Testing subgraph statements")
test_ids = [
'''subgraph cluster01 { n003 ; n004 ; }'''
]
################
FRAG_1 = STMT + StringEnd()
################
NODE_STMT.setDebug(True)
SUBGRAPH.setDebug(True)
ID.setDebug(True)
STMT.setDebug(True)
STMT_LIST.setDebug(True)

for test in test_ids:
try:
result = FRAG_1.parseString(test)
pprint.pprint(result)
except ParseException, e:
print("Problem Test Sample: LINE= %s COL= %s" % (e.lineno, e.col))
print (e.line)
print (" " * (e.column - 1) + "^")
print("ERROR: %s" % str(e))