Conversation
GitMensch
left a comment
There was a problem hiding this comment.
just a quick note for the first iteration; note that Chucks changes are based on mlio from August 2025, so the "real" version may be easier to get by checking out a previous commit, then replace the file and commit locally, then fetch the newer commit with rebase-merging
|
@chuck-haatvedt passed me the newest file (you may do a diff to add a changelog entry) which looks much better concerning libxml version compat. It is from November 14th: mlio.c with a note
|
GitMensch
left a comment
There was a problem hiding this comment.
Thanks for inspecting / working on necessary changes.
I think we can have those in at least a second commit :-)
This is work by Chuck Haatvedt edited by David Declerck. * mlio.c: modified to support xml parse with xmlss. eliminated the xml_event_data structure and moved that data into the xml_event structure. Created a new enum cob_xml_registers and added it to the add_xml_event_data function. This function was modified to update the xml_event structure. All of the context parser callback functions were modified to use the add_xml_event_data function. the cob_xml_parse and xml_parse functions were modified to support the new end_of_input event required by xmlss. a new eof variable was added to the xml_state structure so that the endDocument callback function could be triggered by the parser in the xml_parse funtction. TODO ==> logic needs to be added to support returning NATIONAL data this needs to support the RETURNING NATIONAL phrase.
* common.h: rename COB_XML_PARSE_XMLNSS into COB_XML_PARSE_XMLSS to match the IBM option name * mlio.c [WITH_XML2]: Fix issues in XML PARSE handling most notably a use after free error if the internal buffer needs to grow during the parsing. Respect the high order half-word for exception XML-CODE. Reduce the number of parsing states by removing useless ones, and encode eof in these states. Handle XML chunks with more than one recoverable error. Trigger ON EXCEPTION code after EXCEPTION XML events. * parser.y: remove the CB_PENDING warning on XML PARSE but still warn for untested XML PARSE RETURNING NATIONAL and XML PARSE VALIDATING. * typeck.c: remove invalid call to cob_check_based for XML-* builtin variable length registers (like XML-TEXT) * codegen.c: remove the uninitialized and unused b_* field for XML-* builtin variable length registers
|
I am taking the responsibility for this PR on OCamlPro's behalf. I applied changes according to your comments and fixed several issues. @GitMensch: is this new version more satisfying ? |
| EXCEPTION +000262345|||| | ||
| START-OF-ELEMENT +000000000|root|pfx0|| | ||
| NAMESPACE-DECLARATION +000000000||pfx1|http://whatever| | ||
| START-OF-ELEMENT +000000000|localElName1|pfx1|http://whatever| | ||
| EXCEPTION +000262345|||| | ||
| START-OF-ELEMENT +000000000|localElName2|pfx2|| | ||
| END-OF-ELEMENT +000000000|localElName2|pfx2|| | ||
| EXCEPTION +000262345|||| | ||
| EXCEPTION +000262345|||| | ||
| START-OF-ELEMENT +000000000|localElName3|pfx3|| | ||
| ATTRIBUTE-NAME +000000000|localAtName4|pfx4|| | ||
| ATTRIBUTE-CHARACTERS +000000000|||| | ||
| CONTENT-CHARACTERS +000000000|c1||| | ||
| EXCEPTION +000262345|||| | ||
| EXCEPTION +000262345|||| |
There was a problem hiding this comment.
The exceptions should have the date from the exception in the register - this is the IBM output (with XMLSS):
EXCEPTION 000264193|pfx0:root|||
START-OF-ELEMENT 000000000|root|pfx0||
NAMESPACE-DECLARATION 000000000||pfx1|http://whatever|
START-OF-ELEMENT 000000000|localElName1|pfx1|http://whatever|
EXCEPTION 000264193|pfx2:localElName2|||
START-OF-ELEMENT 000000000|localElName2|pfx2||
END-OF-ELEMENT 000000000|localElName2|pfx2||
EXCEPTION 000264193|pfx3:localElName3|||
START-OF-ELEMENT 000000000|localElName3|pfx3||
EXCEPTION 000264192|pfx4:localAtName4|||
ATTRIBUTE-NAME 000000000|localAtName4|pfx4||
ATTRIBUTE-CHARACTERS 000000000||||
CONTENT-CHARACTERS 000000000|c1|||
EXCEPTION 000264193|pfx5:localElName5|||
START-OF-ELEMENT 000000000|localElName5|pfx5||
EXCEPTION 000264192|pfx6:localAtName6|||
There was a problem hiding this comment.
Indeed, but I do not really have time right now to implement the mapping between libxml2 and IBM exception codes, and I cannot imagine a meaningful code that uses the XML-TEXT of an EXCEPTION event without first checking the XML-CODE...
I would say that the support for XML PARSE without exception codes is useful enough to merge this PR first and then take care of those EXCEPTION events another time.
The behavior I have implemented simply let the COBOL developer choose between ignoring all recoverable errors, or failing on the first.
That said, I think I made a mistake here by trying to pass the libxml2 error code to COBOL while it is not fully stable, and this will be fixed by my next commit (I should simply tell whether the error is recoverable or not).
There was a problem hiding this comment.
this is not about matching exception codes but to output the part that resulted in an exception in the appropriate register (as done by IBM, MF ... and if I remember correctly also libxml2
There was a problem hiding this comment.
note that we explicit noted in NEWS that the exception codes are not identical to other implementations (I think MF and IBM differ as well)
There was a problem hiding this comment.
Yes but the definition of "the part that resulted in an exception" is very unclear unless you also know which exception is returned.
For me it is currently out of scope to do any kind of exception specific work for EXCEPTION event aside from distinguishing recoverable and non-recoverable.
Besides, IBM documentation says (https://www.ibm.com/docs/en/cobol-zos/6.3.0?topic=registers-xml-event) that for EXCEPTION events, "XML-TEXT or XML-NTEXT contains the document fragment up to the point of the error or anomaly that caused the exception.", but in practice this is contradictory with the output you mentionned, where only the name of the element or attribute is placed in XML-TEXT.
… libxml2 error codes in COBOL
There was a problem hiding this comment.
I consider that my "final" review. There are some things open, but I think we're nearly done to finally get this upstream!
But I'd like to have a review of @chuck-haatvedt as the original author of the code (and the rewrite from my initial event/data handling) before, if possible.
| /* IBM doc states that we should store 1 in XML-INFORMATION on events | ||
| ATTRIBUTE-CHARACTERS and CONTENT-CHARACTERS if the value in XML-TEXT | ||
| is complete. It seems to be always the case with libxml2. */ |
There was a problem hiding this comment.
Is this also true for the push parser (where the COBOL program gives in data, commonly from a line sequential file) where the attribute is split between multiple lines)?
Do we have a testcase for that?
There was a problem hiding this comment.
We have a test case with a push parser (currently badly named "XML PARSE complex XML": I will change that).
The issue is that IBM can split the content of ATTRIBUTE-CHARACTERS and CONTENT-CHARACTERS between several events, and in that case it reports that the *-CHARACTERS event is incomplete by writing 2 in XML-INFORMATION.
In libxml2, as far as I know, we never get incomplete events and emulating those seems out of scope for now, as it requires digging into the internal structure of the parser state (and I don't think this structure is supposed to be stable across versions).
Therefore, we always send only one *-CHARACTERS event, even though IBM states it can send more.
In practice, for most COBOL codes, and especially those following the IBM example I took for the unit test, this practice of combining incomplete event should not alter the behavior since the only meaningful thing to do with partial *-CHARACTERS events is to concatenate them.
Actually, we can even argue that this behavior should be kept even if we support IBM split one day because it allows for simpler COBOL code.
There was a problem hiding this comment.
Is https://deepwiki.com/search/i-currently-get-contentcharact_a2ed1cf8-d2e9-445f-8896-7c8bf724ac6b?mode=deep wrong or does our code work around that?
There was a problem hiding this comment.
It seems partially wrong: the calling function xmlParseTryOrFinish does not call xmlParseCharDataInternal at pushed chunk boundary, but at internal buffer size boundary instead.
This is an issue here though... At internal buffer boundary we should put 2 in the XML-INFORMATION register.
There was a problem hiding this comment.
thanks for adding a test going over the boundary and checking the adjusted code ❤️
There was a problem hiding this comment.
I just thought about another potential quirk of XML-INFORMATION: for the XML file
<test>Try <![CDATA[some]]> wierd things</test>What is the content of XML-INFORMATION of the different CONTENT-CHARACTERS events ?
I don't have an IBM compiler at hand and it is not stated in the documentation whether CDATA text is considered to be a continuation of normal text or not.
There was a problem hiding this comment.
with your test data and
display xml-event xml-code '|' xml-text '|' xml-information
'|' xml-namespace-prefix '|' xml-namespace '|'the result on IBM with xmlss is
START-OF-DOCUMENT 000000000||000000000|||
START-OF-ELEMENT 000000000|test|000000000|||
CONTENT-CHARACTERS 000000000||000000001|||
EXCEPTION 000798761|<test>Try <!|000000000|||
and with compat
START-OF-DOCUMENT 000000000|<test>Try <![CDATA[some]]> wierd things</test> |000000000|||
START-OF-ELEMENT 000000000|test|000000000|||
CONTENT-CHARACTERS 000000000|Try |000000000|||
EXCEPTION 000000136|<test>Try <!|000000000|||
There was a problem hiding this comment.
<![CDATA[some]]><test>Try valid things</test><![CDATA[more]]> leads to
START-OF-DOCUMENT 000000000||000000000|||
EXCEPTION 000798761|<!|000000000|||
compat:
START-OF-DOCUMENT 000000000|<![CDATA[some]]><test>Try valid things</test><![CDATA[more]]>
|000000000|||
EXCEPTION 000000002|<![|000000000|||
EXCEPTION 000000001|<![C|000000000|||
EXCEPTION 000000001|<![CD|000000000|||
EXCEPTION 000000001|<![CDA|000000000|||
EXCEPTION 000000001|<![CDAT|000000000|||
EXCEPTION 000000001|<![CDATA|000000000|||
EXCEPTION 000000001|<![CDATA[|000000000|||
EXCEPTION 000000001|<![CDATA[s|000000000|||
EXCEPTION 000000001|<![CDATA[so|000000000|||
EXCEPTION 000000001|<![CDATA[som|000000000|||
EXCEPTION 000000001|<![CDATA[some|000000000|||
EXCEPTION 000000001|<![CDATA[some]|000000000|||
EXCEPTION 000000001|<![CDATA[some]]|000000000|||
EXCEPTION 000000001|<![CDATA[some]]>|000000000|||
EXCEPTION 000000002|<![CDATA[some]]><test>Try valid things</test><![|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![C|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CD|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDA|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDAT|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[m|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[mo|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[mor|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[more|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[more]|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[more]]|000000000|||
EXCEPTION 000000001|<![CDATA[some]]><test>Try valid things</test><![CDATA[more]]>|000000000|||
:-)
I'm just confused why parsing
1 xml-document-data.
2 pic x(39) value '<?xml version="1.0" encoding="US-ASCII"'.
2 pic x(19) value ' standalone="yes"?>'.
2 pic x(39) value '<!--This document is just an example-->'.
2 pic x(10) value '<sandwich>'.
2 pic x(33) value '<bread type="baker''s best"/>'.
2 pic x(36) value '<?spread We''ll use real mayonnaise?>'.
2 pic x(29) value '<meat>Ham + turkey</meat>'.
2 pic x(34) value '<filling>Cheese, lettuce, tomato, '.
2 pic x(32) value 'and that''s all, Folks!</filling>'.
2 pic x(25) value '<![CDATA[We should add a '.
2 pic x(20) value '<relish> element!]]>'.
2 pic x(28) value '<listprice>$4.99</listprice>'.
2 pic x(25) value '<discount>0.10</discount>'.
2 pic x(31) value '</sandwich>'.with XMLSS does not result in START-OF-CDATA and so on but also raises an exception
START-OF-DOCUMENT 000000000||000000000|||
VERSION-INFORMATION 000000000|1.0|000000000|||
ENCODING-DECLARATION 000000000|US-ASCII|000000000|||
STANDALONE-DECLARATION 000000000|yes|000000000|||
COMMENT 000000000|This document is just an example|000000000|||
START-OF-ELEMENT 000000000|sandwich|000000000|||
START-OF-ELEMENT 000000000|bread|000000000|||
ATTRIBUTE-NAME 000000000|type|000000000|||
ATTRIBUTE-CHARACTERS 000000000|baker's best|000000001|||
END-OF-ELEMENT 000000000|bread|000000000|||
CONTENT-CHARACTERS 000000000| |000000001|||
PROCESSING-INSTRUCTION-TARGET 000000000|spread|000000000|||
PROCESSING-INSTRUCTION-DATA 000000000|We'll use real mayonnaise|000000000|||
START-OF-ELEMENT 000000000|meat|000000000|||
CONTENT-CHARACTERS 000000000|Ham + turkey|000000001|||
END-OF-ELEMENT 000000000|meat|000000000|||
CONTENT-CHARACTERS 000000000| |000000001|||
START-OF-ELEMENT 000000000|filling|000000000|||
CONTENT-CHARACTERS 000000000|Cheese, lettuce, tomato, and that's all, Folks!|000000001|||
END-OF-ELEMENT 000000000|filling|000000000|||
EXCEPTION 000798761|<?xml version="1.0" encoding="US-ASCII" standalone="yes"?><!--This document is j
ust an example--><sandwich><bread type="baker's best"/> <?spread We'll use real mayonnaise?><meat>Ham + turkey</meat
> <filling>Cheese, lettuce, tomato, and that's all, Folks!</filling><!|000000000|||
no matter if I save the file with UTF8 encoding and also mention that in the xml's encoding or not...
There was a problem hiding this comment.
The error code 000798761 corresponds to XRSN_MARKUP_INVALID: An incorrect character is found within markup.
It seems that the XML parser you used for tests is unable to recognize CDATA elements (it always stop after <! as if it was expecting a comment <!-- and nothing else)...
Therefore I will not get any information on the expected behaviour from that :(
There was a problem hiding this comment.
By the way, I think
<![CDATA[some]]><test>Try valid things</test><![CDATA[more]]>is supposed to be invalid XML, unlike what its text suggest (you cannot have content outside the root XML element and CDATA is treated as content)...
On the contrary my "weird" example is unusual but supposedly valid.
|
I am a bit confused as to the changes to the version I supplied to Simon as the code appeared to be working fine before the changes. As for the testsuite, I have attached the sample program I used for testing. I ran it on both GnuCOBOL and MF COBOL. xmlsmpl-3.txt is the test program rename it to xmlsmpl-3.cbl. This is a much better test program as it exercises more of the complex xml elements. set infile=sample_test_complex_split.xml this is the input xml document as a line sequential file. xmlsmpl3-mfcobol.txt is the output from the MF COBOL test xmlsmpl3-gnucobol.txt |
Can you change that from file based to be memory based, please? That way I can easily run it on IBM (files would also work but I'd need to creat a dataset, add the data, handle JCL, ... - in-memory is just much easier) |
…data check later in the parsing.
Note that, compared to IBM, we may merge short contiguous CONTENT-CHARACTERS events across END-OF-INPUT boundaries. This is due to libxml2 internal details. Also improve some tests to check predefined entities and long content.
|
Without forking libxml2, it seems impossible to generate the exact same stream of event as IBM in push parser mode. That said, I guess I found a reasonable compromise between not depending too much on internal libxml2 details and not breaking COBOL code expecting the IBM behavior: the rule is that we allow ourselves to postpone characters delivered by IBM at chunk boundary but we try to guarantee that we do not generate more events than IBM since COBOL code might rely on the fact that some content is never split. Moreover, my last commit should handle XML-INFORMATION correctly notifying whenever there might be more characters later or not. @chuck-haatvedt: Can you tell me what your test is checking that is not already covered by my additions in run_ml.at ? @GitMensch: With that done, I think I have taken into account all your comments. Do you have final remarks ? |
|
this simple patch to mlio.c will add the xml-text line to the EXCEPTION event Note that this is a simple case and should be modified to check all 3 of the str1..3 variables in the err structure. here is the output for XMLup with the above change, I can upload these in a text file tomorrow if that would be easier. here is the same when executed on IBM Z/OS ENTERPRISE COBOL Note that a couple of the EXCEPTION events are in a different order. Also note that the XML-CODE is displayed a bit differently on IBM, perhaps they use an implied PIC +++++++++9 instead of the floating "-" character which would only print the negative sign. Obviously the XML-CODE values are different as well. Personally I think it would be better to use the code value from the err structure as it would all cobol programmers better access to the cause of the error within application code. Chuck Haatvedt |
|
here are the files... |
Also revert the undeclared namespace test to the IBM original one since this is fully implemented now.
|
Since you seem to really want the correct errors for undeclared namespace, I have spent a couple of hours to implement it correctly. I am strongly opposed to exposing libxml2 error codes in COBOL: libxml2 does not guarantee that it will always generate the same errors on the same input across versions, so we will end up having to emulate older libxml2 in order to preserve COBOL code that relies on them. |
|
Just a note: tested the exception/namespace one with MF vc7 -> sigsegv; tested with vc9 COMPAT: --> no exception, namespace as part of the element name XMLSS: --> exception-text always "a fragment of alphanumeric text" (everything parsed until the exception), and on unclear places, ... and the error code does not make sense to me (compared to their docs https://docs.rocketsoftware.com/bundle/visualcobolvs_ug_100/page/rpb1743378344996.html) |
I agree that this will be best.
I disagree - we can have an XML-TEXT which explicit starts with "unhandled internal xml error %d: error text" -> that way we can definitely tell people that this goes away (and they ideally should provide us with a reproducer - as we can then cross-check with IBM) while the contained internal libxml error number is still helpful for developers as they can lookup in the libxml header (or deepwiki) to check what the actual error is. Having no detail information whatsover (the runtime warning may be suppressed or is not easily relatable to the current place by being put into a different output file), for example in a nightly batch job that already took 2 hours, is definitely bad. |
| EXCEPTION +000264192|pfx4:localAtName4||| | ||
| EXCEPTION +000264193|pfx3:localElName3||| | ||
| START-OF-ELEMENT +000000000|localElName3|pfx3|| | ||
| ATTRIBUTE-NAME +000000000|localAtName4|pfx4|| |
There was a problem hiding this comment.
just an observation, not a request to change it:
the order of exceptions is different to IBM where we get the namespace error at the place that uses it, not the place where it is parsed. As noted: that's just a difference; it still would be good to have a note about that (for now possibly in the NEWS entry, later moved to gnucobol.texi).
There was a problem hiding this comment.
Sure, a note like that should also tell that this PR, unlike IBM's parser does not always immediately send incomplete events before END-OF-INPUT and can buffer them.
|
here is a link to the IBM reference manual for XML System Services User's Guide and https://www.ibm.com/docs/en/SSLTBW_3.2.0/pdf/gxla100_v3r2.pdf Chuck Haatvedt |
|
change xmlup.cbl as follows note the change in the first tag... Identification division.
Program-id. XMLup.
Data division.
Working-storage section.
1 d.
2 pic x(40) value '<pfxz:root xmlns:pfx1="http://whatever">'.
2 pic x(19) value '<pfx1:localElName1>'.
2 pic x(20) value '<pfx2:localElName2/>'.
2 pic x(40) value '<pfx3:localElName3 pfx4:localAtName4="">'.
2 pic x(02) value 'c1'.
2 pic x(41) value '<pfx5:localElName5 pfx6:localAtName6=""/>'.
2 pic x(24) value 'c2</pfx3:localElName3>c3'.
2 pic x(32) value '</pfx1:localElName1></pfx0:root>'.
Procedure division.
main.
xml parse d processing procedure h
goback.
h.
display xml-event xml-code '|' xml-text '|'
xml-namespace-prefix '|'
xml-namespace '|'
* In the original IBM example they check specifically the two exceptions
* codes for undeclared namespaces: 264192 and 264193
* We do not yet support these IBM code
* -> ignore all recoverable errors for now
* if xml-event = 'EXCEPTION' and xml-code = 264192 or 264193
move 0 to xml-code
* end-if
.
End program XMLup.when compiled and run with this gnucobol pr it generates the following output however that same code compiled on IBM Enterprise COBOL generates the following output note that the XML-CODE is different from GnuCOBOL on the last EXCEPTION event I think that this demonstrates the difficulty of attempting to map all of the libxml2 err->code values to IBM equivalent values.. Also note that 798773 === x'000C3035' which is a value found in XML System Services User's Guide and Reference in Appendix B. So we would need to create a cross reference mapping of the libxml2 err->code values to those in Appendix B. I think that using the err->code and err->msg would be much more useful for programmers to diagnose any xml errors. Chuck Haatvedt |
|
As long as it is very clear that we do not guarantee any form of stability on those error messages, I agree we could indeed put the libxml2 error text inside XML-TEXT for unhandled errors. I already noticed subtle changes across libxml2 versions for some of the reported errors. For instance version 2.12.7+dfsg+really2.9.14-2.1+deb13u2 you can find on current Debian stable can be very weird for syntax errors on the root element tag. |
I don't think we will ever map all of them, just the few useful ones that COBOL programmers may rely on.
This test suggests that maybe we should use XRC_NOT_WELL_FORMED instead of XRC_FATAL for parsing errors. So probably x'000C0000' would be slightly better for our default unmapped unrecoverable parsing error. |
|
Two quick notes: the snprintf should use the _MAX define, which is one byte less than its matching_BUFF. If you rebase then the warnings and errors in CI should be fixed. |
|
I don't understand any of the two comments:
|
|
CI: correct (it works here in the PR, just not in your own branch). snprintf(err_text_buf, COB_MINI_BUFF, ...
err_text_buf[COB_MINI_MAX] = 0;a construct you'll see in many places where snprintf is used. So... maybe use here as well :-) Note: I want to do a final comparison vs. IBM (may need to wait until Saturday) and maybe a first performance check with some bigger XML (both "in general" and vs. MF) - then merge upstream (not later than next week, currently). |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## gitside-gnucobol-3.x #263 +/- ##
=======================================================
Coverage ? 67.82%
=======================================================
Files ? 34
Lines ? 61565
Branches ? 16043
=======================================================
Hits ? 41756
Misses ? 13851
Partials ? 5958 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
XML-INFORMATION and XML-*NAMESPACE* are not available with MF.
|
Just FYI: I've run some production XML (156MB) through a program just DISPLAYing the event/data. MF is much slower (factor 5-6). >20% of cpu-time came from DISPLAY (putc), if this is taken out of the values then 3.2% is spent in Possibly more checks next week. |
|
here is a link to Wikipedia dumps website where you can download some huge xml file for testing https://dumps.wikimedia.org/enwiki/latest/ this is what I've downloaded for performance testing. 5/07/2026 01:41 PM .05/07/2026 01:08 PM 114,446,966 enwiki-latest-pages-articles-multistream16.xml-p20460153p20570392 05/07/2026 01:40 PM 454,514,835 enwiki-latest-pages-articles-multistream18.xml-p26716198p27121850 05/07/2026 01:40 PM 297,012,315 enwiki-latest-pages-articles-multistream22.xml-p44496246p44788941 3 File(s) 865,974,116 bytes there are larger files available on this site if you want to test with files more that 1 GB |
|
my performance test results using COB_OPEN_FILE, COB_READ_FILE, COB_CLOSE_FILE reading 128KB chunks of data. All displays of xml data / events are removed. The code is just counting bytes read, xml-events returned. Note that I had to fix a bug in fileio.c the cob_sys_read_file does not return the number of bytes fetched into the buffer. XML document size ====> 24 MB F:\AA-minGW32-static\XML>xmlfast |
|
@chuck-haatvedt Have you tested with a different chunk size in xmlio.c (the open question was if we should make that configurable)? |
Simon, from my analysis it appears that this version of xml_parse just parses the chunk passed from the cobol program. Let me know if your inspection of the code is different. I just did a build using the xml_parse code from this ddeclerck:xml_parse code base. |
|
parsing xml documents as raw data does require the xml to be well formed. As my testing showed that as my testing of "raw" non-well formed xml data failed with an EXCEPTION even. So this should be mentioned in the programmers guide so that users are aware of this requirement when passing "raw" data to XML PARSE. |
|
Can you please write a short entry that you'd like to see about that in the documentation? I'm not really sure to understand what you refer to. Also: shouldn't "non valid data" always return an error (or do you only mean bad line breaks not "visible" when passing lsq chunks, but breaking the parsing when having big "non-lsq" chunks that include those)? |
Note: initial commit from Chuck, fixes to come