[Pyparsing] Memory issues with 2.0.6

Discussion:

Will McGugan

2015-11-20 00:42:53 UTC

Hi,

My app has a fairly complex grammar to parse expressions. Up to 2.0.5 it
was working well. But in 2.0.6 it gets stuck parsing and starts eating
memory until the process is killed by the OS.

I've pinned pyparsing to 2.0.5 for now. I haven't done any debugging yet,
but I was wondering what had changed since 2.0.5 that could trigger this
kind of behaviour? Any known bugs?

Thanks in advance,

Will McGugan
------------------------------------------------------------------------------

Andrea Censi

2015-11-20 01:03:10 UTC

Permalink

starts eating memory until the process is killed by the OS.

Me too!

I have been wondering why all of a sudden my Travis unit tests were
failing, with the processes being killed.

~

[I also have a case in which a grammar worked (defined as: "parses
string s") in Python 2.7, but doesn't (syntax error with the same
string s) in >=3.3, and the specific errors changed with the latest
update of PyParsing.

http://github.com/AndreaCensi/contracts

So I guess something substantial changed in the latest release.

]

Hi,
My app has a fairly complex grammar to parse expressions. Up to 2.0.5 it
was working well. But in 2.0.6 it gets stuck parsing and starts eating
memory until the process is killed by the OS.
I've pinned pyparsing to 2.0.5 for now. I haven't done any debugging yet,
but I was wondering what had changed since 2.0.5 that could trigger this
kind of behaviour? Any known bugs?
Thanks in advance,
Will McGugan
------------------------------------------------------------------------------
_______________________________________________
Pyparsing-users mailing list
https://lists.sourceforge.net/lists/listinfo/pyparsing-users

--
Andrea Censi | http://andrea.caltech.edu | "Not all those who wander are lost."
research scientist @ LIDS / Massachusetts Institute of Technology

------------------------------------------------------------------------------

Paul McGuire

2015-11-20 09:16:09 UTC

Permalink

There were two logic changes made in 2.0.6:

- a bug in Or (operator ^) was fixed which handles the case where the
longest match fails because of a parse action raising an exception, but an
alternative shorter match succeeds; previously this would erroneously fail
to match, but now successfully returns the shorter alternative match

- a bug in Each (operator &) was fixed that would erroneously return
multiple matches of Optional expressions

There was one additional change that introduced a bug that only affects
users with unicode in their expressions.

If your grammar has complex expressions (especially recursive expressions)
using ^ or & operators, these new bugfixes may be the problem.

Andrea, I'll try to take a look at the PyContracts code that you posted and
see if any glaring areas jump out. will, I hope these descriptions will give
you some clues where to start looking in your grammar. I can also post some
before-after snippets that you can patch into your versions of pyparsing and
rerun your tests.

-- Paul

-----Original Message-----
From: Andrea Censi [mailto:***@cds.caltech.edu]
Sent: Thursday, November 19, 2015 7:03 PM
To: Will McGugan <***@gmail.com>
Cc: Pyparsing-***@lists.sourceforge.net
Subject: Re: [Pyparsing] Memory issues with 2.0.6

starts eating memory until the process is killed by the OS.

Me too!

I have been wondering why all of a sudden my Travis unit tests were failing,
with the processes being killed.
<snip>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

------------------------------------------------------------------------------

Paul McGuire

2015-11-20 14:41:12 UTC

Permalink

The problem is definitely in the "cosmetic-only" change to the returned
error message for MatchFirst and Or (which also manifests as a Unicode
error), and does not even require calling parseString, just streamline().
Thanks for the test case Will, I can repro the problem with it, but am
trying to distill it down to a smaller case to add to my unit tests, and to
work with in fixing the bug.

For now, impatient users can comment out line 2354 in pyparsing.py:

self.errmsg = "Expected " + str(self)

-- Paul

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

------------------------------------------------------------------------------

Andrea Censi

2015-11-20 16:22:00 UTC

Permalink

Post by Paul McGuire
Andrea, I'll try to take a look at the PyContracts code that you posted and
see if any glaring areas jump out. will, I hope these descriptions will give
you some clues where to start looking in your grammar. I can also post some
before-after snippets that you can patch into your versions of pyparsing and
rerun your tests.

Thanks Paul!

1) As for the PyContracts project, where I get syntax errors, this is
my travis project:

https://travis-ci.org/AndreaCensi/contracts

This currently shows the same code working for 2.7,3.2,3.3,pypy, but
failing for 3.4 and 3.5.

Currently the version that is installed by pip is pyparsing-2.0.6-py2.py3.

I'm not an expert at Travis. I wish there was a way to run different
builds with multiple versions of pyparsing.

To look into the grammar, start here:
https://github.com/AndreaCensi/contracts/blob/master/src/contracts/syntax.py

Other parts are in other files. It is fairly complex - I was using
almost all the features of PyParsing.

In the past, what was failing in >=3.4 were tests related to the unary
operator "-". It didn't recognize things like ">=-1". Now the problem
is errors like this:

'array(=4|>=2,<=0)'
=>
pyparsing.ParseException: Expected {{FollowedBy:({{Forward:
{{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({...
{{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward:
{{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}
{{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*"
...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}} "+"} Forward:
{{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({...
{{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward:
{{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}
{{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*"
...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}}) Group:({Forward:
{{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({...
{{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward:
{{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}
{{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*"
...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}} {{"+" Forward:
{{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({...
{{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward:
{{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}
{{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*"
...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}}}...})} | Forward:
{{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({...
{{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward:
{{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}
{{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*"
...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...})
Group:({... {{"*" ...}}...})} | ...}}} (at char 10), (line:1, col:11)

2) As for the other project, the one with out of memory errors, I do
have unicode expressions in literals.
Could the changes in unicode expressions lead to out of memory errors?

~

Just read your last message about the cosmetic bug. When you push out
the fix, I will let you know if the problems above disappear.

------------------------------------------------------------------------------

Martijn Vermaat

2015-11-24 16:32:08 UTC

Permalink

Dear Paul and others,

(Sorry for replying out of thread, I only just subscribed.)

I can report the same problem with Pyparsing 2.0.6 on this grammar:

https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py

Memory usage continues to increase.

best,
Martijn

Post by Paul McGuire
The problem is definitely in the "cosmetic-only" change to the
returned error message for MatchFirst and Or (which also manifests as
a Unicode error), and does not even require calling parseString, just
streamline(). Thanks for the test case Will, I can repro the problem
with it, but am trying to distill it down to a smaller case to add to
my unit tests, and to work with in fixing the bug.
self.errmsg = "Expected " + str(self)

Paul McGuire

2015-11-25 20:00:40 UTC

Permalink

For those who reported having memory and Unicode issues, if possible, please
download the latest committed version of pyparsing from the SourceForge SVN
repo, and see if this resolves your issues.

For the memory problem, it is probably not necessary to actually parse any
text, simply invoke the streamline() method on your top-level grammar
instance:

parser.streamline()

For the Unicode problems, you should stop seeing the UnicodeEncodeError
exceptions when creating your parser.

Thanks for the feedback, everyone - if these changes work out, I'll follow
up with an actual 2.0.7 release in the next day or so.

-- Paul

-----Original Message-----
From: Martijn Vermaat [mailto:***@lumc.nl]
Sent: Tuesday, November 24, 2015 10:32 AM
To: pyparsing-***@lists.sourceforge.net
Subject: Re: [Pyparsing] Memory issues with 2.0.6

Dear Paul and others,

(Sorry for replying out of thread, I only just subscribed.)

I can report the same problem with Pyparsing 2.0.6 on this grammar:

https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py

Memory usage continues to increase.

best,
Martijn

----------------------------------------------------------------------------
--
Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users
amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple
OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
Pyparsing-users mailing list
Pyparsing-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyparsing-users

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Will McGugan

2015-11-26 10:57:40 UTC

Permalink

Hi Paul,

Works for me. Thanks!

Will

Post by Paul McGuire
For those who reported having memory and Unicode issues, if possible, please
download the latest committed version of pyparsing from the SourceForge SVN
repo, and see if this resolves your issues.
For the memory problem, it is probably not necessary to actually parse any
text, simply invoke the streamline() method on your top-level grammar
parser.streamline()
For the Unicode problems, you should stop seeing the UnicodeEncodeError
exceptions when creating your parser.
Thanks for the feedback, everyone - if these changes work out, I'll follow
up with an actual 2.0.7 release in the next day or so.
-- Paul
-----Original Message-----
Sent: Tuesday, November 24, 2015 10:32 AM
Subject: Re: [Pyparsing] Memory issues with 2.0.6
Dear Paul and others,
(Sorry for replying out of thread, I only just subscribed.)
https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py
Memory usage continues to increase.
best,
Martijn

----------------------------------------------------------------------------
--
Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users
amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple
OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
Pyparsing-users mailing list
https://lists.sourceforge.net/lists/listinfo/pyparsing-users
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--
Will McGugan
http://www.willmcgugan.com

Martijn Vermaat

2015-11-26 12:10:34 UTC

Permalink

Dear Paul,

Thanks, the memory issue seems to be resolved.

I do have another problem with 2.0.6 and up which I think is due to the
same change.

On parsing something that's not accepted by the grammar, I get a parse
exception, for example:

ParseException: Expected "IVS" (at char 7), (line:1, col:8)

With the latest SVN I get the same thing, but the exception message
contains a huge expected string (more than 600,000 characters), for
example:

ParseException: Expected {{{{{{Suppress:({["GI"] ^ ["GI:"] ^ ["gi"] ^
["gi:"]}) W:(0123...)} ^ {~{"LRG_"} Combine:({W:(abcd...) W:(0123...)})
[{Suppress:(".") W:(0123...)}]} ^ Combine:({"UD_" W:(abcd...) {{"_"
W:(0123...)}}...})} [{Suppress:("(") Group:({W:(abcd...)
[{{Suppress:("_v") W:(0123...)} ^ {Suppress:("_i") W:(0123...)}}]})
Suppress:(")")}]} ^ {Combine:({"LRG_" W:(0123...)}) [{{Suppress:("t")
W:(0123...)} ^ {Suppress:("p") W:(0123...)}}]}} Suppress:(":")
[{W:(cgmn...) Suppress:(".")}] {Empty {Gro ...

etcetera, you get the idea.

I'd say that this is undesirable.

This is with the same HGVS grammar I linked earlier. Just try parsing
the empty string for example.

https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py

best,
Martijn

----------------------------------------------------------------------------
--
Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users
amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple
OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
Pyparsing-users mailing list
https://lists.sourceforge.net/lists/listinfo/pyparsing-users
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus