Monday, September 17, 2007

JSON beats XML, some comments on GeoJSON

I've been setting up some new data services and after experimenting with formatting my data in JSON (Javascript Object Notation) and I'm hooked. Programming mostly in C++ has made XML the easiest data format to use but I'm very unhappy with the hurdles that one must go through to go from XML to the internal representation of the data. I've been doing a lot of Javascript and Python programming lately and am just blown away by how easy it is to create, maintain and share data through the JSON format.

For XML, on the server end one would take an internal data representation in Python and create custom classes to format each data item with the appropriate XML tag, convert it to a string and output the XML tree. Not too hard if it is a relatively shallow dataset without too much data nesting. On the client end, you would write classes that would decode the tree structure of the tags with full knowledge of each tag's data type (string, integer, floating point, etc...) and create a new data structure to mirror the server side data.

For JSON, you take your data structure, usually a dictionary structure with a name associated with a value, dump it directly to JSON with a single procedure call and save it to a file. On the client end, you load the data directly into a similar data structure with a single call. No special decoding classes needed, no XML tag data types (string, integer, float) to have foreknowledge of.

I've been looking at the GeoJSON specification and it looks pretty nice and but with a few caveats. Say you have a LineString feature type, the coordinates are a list of lists which contain 2 coordinate elements (longitude and latitude). This is extremely space and processing time inefficient, which is very important for me with large datasets. I suggest using the standard single list of coordinates like is done in all other formats like GML and KML. Also it would be nice if there were some sort of feature style information specified like color, line width, line style and placemark icons.

I'll definitely be using my modified GeoJSON format when I return to C++ programing for EarthBrowser v3 soon. Sorry XML, please don't feel bad. It's not you, it's me. I just feel like we need a little space.


Sean said...

A JSON array of numbers is only fractionally longer (in bytes) than the equivalent GML posList. After you compress them, the difference is going to be very slight. With XML you have to put the coordinates in a text node because there is no capacity for arrays. JSON has arrays, and it's natural and appropriate to use them.

matt_giger said...

Yes, I agree that a coordinate list should be an array of numbers, but not an array of arrays of numbers. [1,2,3,4,5,6,7,8] rather than [[1,2],[3,4],[5,6],[7,8]]. Sure, each coordinate only adds 2 extra characters to the data stream and can be compressed down. However, when loaded, there is a memory storage overhead for each coordinate list, also there is a processing overhead to extract say 10,000 lists with 2 points each versus 1 list with 20,000 points. I'd guess it is significant too.

Click Here said...

buy instagram followers

marko said...

Simply after you've passed the underlying screening, they request that you present the standard archives that would edify them about your procuring abilities and credit notoriety. cash advance san-diego

Justin said...

What many individuals don't think about is the point at which you have an ARM advance you can really observe a diminishing to your greatest advantage installment. When you have one of these advances you may really begin off with one loan cost that is extremely moderate and afterward when you do see a modification your installment may really go down. cash advance