Last minute geek

last minute tech news from around the net

Saturday, Dec 15th

Last update10:09:00 AM

You are here: English WTF CodeSOD: CDADA


User Rating: / 0

If there’s one big problem with XML, it’s arguably that XML is overspecified. That’s not all bad- it means that every behavior, every option, every approach is documented, schematized, and defined. That might result in something like SOAP, which creates huge, bloated payloads, involves multiple layers of wrapping tags, integrates with discovery schemas, has additional federation and in-built security mechanisms, each of which are themselves defined in XML. And let’s not even start on XSLT and XQuery.

But, it also means that if you have a common task, like embedding arbitrary content in a safe fashion, there’s a well-specified and well-documented way to do it. If you did want to embed arbitrary content in a safe fashion, you could use the <![CDATA [Here is some arbitrary content]]> directive. It’s not a pretty way of doing it, but it means you don’t have to escape anything but ]]>, which is only a problem in certain esoteric programming languages with rude names.

So, there’s an ugly, but perfectly well specified and simple to use method of safely escaping content to store in XML. You know why we’re here. Carl W was going through some of the many, many gigs of XML data files his organization uses, and found:


The specific sequence of mangling operations that were performed aren’t documented anywhere, but you can figure it out. To decode this, you first have to convert the character entities back into actual characters- which really is just the ampersands.

Now you have: &lt%3bPATH&gt%3bSOME_VALUE_HERE&lt%3b/PATH&gt%3b.

This is obviously URL encoded. So we can reverse that, yielding &lt;PATH&gt;SOME_VALUE_HERE&lt;/PATH&gt;.

Now, we can decode the character entities here.


XML documents nest quite neatly, so why even do this escaping rigamarole? If you don't want it as XML, why not use CDATA? Why URL encode any of this? Carl had neither the time nor the documentation to figure it out. He simply changed SOME_VALUE_HERE to NEW_VALUE_HERE, and moved on to the next problem.

[Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today!

Read all
Comment Policy:
We pre-moderate any comments and welcome all kinds of thoughts, supportive, dissenting, critical or otherwise. We delete or censor comments that are:

* abusive
* off-topic
* contain personal attacks, or against any company or organization
* promote hate of any kind
* use excessively foul language
* is blatantly spam or advertising

We do not discriminate based on the person who is posting, and we never censor comments for political or ideological reasons. We never delete an appropriate comment because we disagree with its viewpoint or ideology, and we never publish an inappropriate comment because we agree with or support its viewpoint or ideology.

Attention spammers: we manually approve all comments. Spamming and blatant advertising will NOT be published on this site and is deleted immediately, you've been warned, do not waste your time here.

Add comment

Security code