Last minute geek

last minute tech news from around the net

Thursday, Sep 20th

Last update01:00:00 AM

You are here: English WTF CodeSOD: To Read or Parse

CodeSOD: To Read or Parse

User Rating: / 0
PoorBest 

When JSON started to displace XML as the default data format for the web, my initial reaction was, "Oh, thank goodness." Time passed, and people reinvented schemas for JSON and RPC APIs in JSON and wrote tools which turn JSON schemas into UIs and built databases which store BSON, which is JSON with extra steps, and… it makes you wonder what it was all for.

Then people like Mark send in some code with a subject, "WHY??!??!". It's code which handles some XML, in C#.

Now, a useful fact- C# has a rich set of API- for handling XML, and like most XML APIs, they implement two approaches.

The simplest and most obvious is the DOM-style approach, where you load an entire XML document into memory and construct a DOM out of it. It's easy to manipulate, but for large XML documents can strain the available memory.

The other is the "reader" approach, where you treat the document as a stream, and read through the document, one element at a time. This is a bit trickier for developers, but scales better to large XML files.

So let's say that you're reading a multi-gigabyte XML file. You'd want to quit your job, obviously. But assuming you didn't, you'd want to use the "reader" approach, yes? There's just one problem: the reader approach requires you to go through the document element-by-element, and you can't skip around easily.

public void ReadXml(XmlReader reader) { string xml = reader.ReadOuterXml(); XElement element = XElement.Parse(xml); … }

Someone decided to give us the "best of both worlds". They load the multi-gigabyte file using a reader, but instead of going elementwise through the document, they use ReadOuterXml to pull the entire document in as a string. Once they have the multi-gigabyte string in memory, they then feed it into the XElement.Parse method, which turns the multi-gigabyte string into a multi-gigabyte DOM structure.

You'll be shocked to learn that this code was tested with small testing files, not multi-gigabyte files, worked fine in those conditions, and thus ended up in production.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!

Read all
Comment Policy:
We pre-moderate any comments and welcome all kinds of thoughts, supportive, dissenting, critical or otherwise. We delete or censor comments that are:

* abusive
* off-topic
* contain personal attacks, or against any company or organization
* promote hate of any kind
* use excessively foul language
* is blatantly spam or advertising

We do not discriminate based on the person who is posting, and we never censor comments for political or ideological reasons. We never delete an appropriate comment because we disagree with its viewpoint or ideology, and we never publish an inappropriate comment because we agree with or support its viewpoint or ideology.


Attention spammers: we manually approve all comments. Spamming and blatant advertising will NOT be published on this site and is deleted immediately, you've been warned, do not waste your time here.

Add comment

Security code
Refresh