Trim next element after DocType#603
Conversation
Mingun
left a comment
There was a problem hiding this comment.
Could you add a comment and update changelog. I would like also to have some more formal tests for that situation (can be added in de/mod.rs). Preferably to test each possible situation that you can imagine. I have a feeling, that the same problem can be happened not only for DocType + Comment events.
| fn trim<'a>(&mut self, event: Event<'a>) -> Option<PayloadEvent<'a>> { | ||
| let (event, trim_next_event) = match event { | ||
| Event::DocType(e) => (PayloadEvent::DocType(e), false), | ||
| Event::DocType(e) => (PayloadEvent::DocType(e), true), |
There was a problem hiding this comment.
Could you add a comment describing, why we need trim here because this is not obvious (otherwise it would be done initially)
There was a problem hiding this comment.
Immediately following a Doctype event, the Serde parser expects to read an initial structure from the XML file - it expects a 'Start' tag event. Any comments following the doctype in the stream is parsed to Text/CData events, which leads the parse to fail with an 'ExpectedStart' panic. By skipping (trimming) any Text/CData events the next event the serde deserializer sees will be a Start event, which is always required to start building up a new Struct from the xml.
The Doctype should trim events here just like the Start/End events. Only the CData/Text events should not trim (otherwise any CData/Text following them will be ignored)
See <https://www.w3.org/TR/xml11/#NT-prolog> failures (2): xml_prolog::comments xml_prolog::pi
Otherwise consequent `Text` events (which is possible if their delimited by Comment or PI events, which is skipped) will be merged but not trimmed. That will lead to returning a `Text` event when try to call `deserialize_struct` or `deserialize_map` which will trigger `DeError::ExpectedStart` error. The incorrect trim behavior was introduced in #581, when DocType event began to be processed
Mingun
left a comment
There was a problem hiding this comment.
Thanks for your work! After investigating situation I realized, that actually comment is not needed, because trimming Text events after DocType event just follow common rule. Returning false was an oversight in #583, which introduced this regression.
After investigating I realized that we should add tests for deserialization that uses deserialize_struct or deserialize_map methods (HashMap uses the last) and an XML contains comments or processing instructions in XML prolog
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## master #603 +/- ##
==========================================
+ Coverage 64.38% 64.44% +0.05%
==========================================
Files 36 36
Lines 16914 16916 +2
==========================================
+ Hits 10890 10901 +11
+ Misses 6024 6015 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.28.2` -> `0.29.0` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.29.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#​0290----2023-06-13) [Compare Source](tafia/quick-xml@v0.28.2...v0.29.0) ##### New Features - [#​601]: Add `serde_helper` module to the crate root with some useful utility functions and document using of enum's unit variants as a text content of element. - [#​606]: Implement indentation for `AsyncWrite` trait implementations. ##### Bug Fixes - [#​603]: Fix a regression from [#​581] that an XML comment or a processing instruction between a <!DOCTYPE> and the root element in the file brokes deserialization of structs by returning `DeError::ExpectedStart` - [#​608]: Return a new error `Error::EmptyDocType` on empty doctype instead of crashing because of a debug assertion. ##### Misc Changes - [#​594]: Add a helper macro to help deserialize internally tagged enums with Serde, which doesn't work out-of-box due to serde limitations. [#​581]: tafia/quick-xml#581 [#​594]: tafia/quick-xml#594 [#​601]: tafia/quick-xml#601 [#​603]: tafia/quick-xml#603 [#​606]: tafia/quick-xml#606 [#​608]: tafia/quick-xml#608 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xMTguMCIsInVwZGF0ZWRJblZlciI6IjM1LjExOC4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCJ9--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1940 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Allow comment to exist between DocType and root element
With the new EntityResolver/Doctype changes, putting an xml comment between a doctype and the root element causes the parse to fail. Trimming the next element fixes this.