Showing posts with label GeoJSON. Show all posts
Showing posts with label GeoJSON. Show all posts

Thursday, September 20, 2007

Simplified GeoJSON proposal

I've come up an extremely simplified GeoJSON example which can do everything that the current GeoJSON specification can do but with only 4 core elements. Every element is a feature and can contain a list of points, lines and polygons or other features. Everything else in the GeoJSON spec is pretty much taken from some OGC standard or another. It's important not to get caught up in the past we are at the beginning of a potential standard, as Sean said in his post: "there's only one first chance to get a standard right." The only thing that I think might be questionably useful about this would be the nesting of features.

{
"points":[
[x0,y0], [x1,y1], ..., [xn,yn]
],
"lines":[
[x0,y0, x1,y1, ..., xn,yn],
...
],
"polygons":[
[
[x0,y0, x1,y1, ..., xn,yn],
...
],
[
[x0,y0, x1,y1, ..., xn,yn],
...
],
...
],
"features":[
{
"points":[
[x0,y0], [x1,y1], ..., [xn,yn]
],
"lines":[
...
]
},
...
]
}


Also an optional "crs" coordinate reference system can contain an EPSG code, ESRI WKT (Well Known Text) or a PROJ.4 projection string or all 3. If none is specified, the default is decimal degrees in the WGS84 datum. The coordinate reference system of the parent cascades down the to all children until a child specifies one.

"crs":{
"epsg":"4326",
"wkt":"COMPD_CS["OSGB36 / British National Grid...",
"proj4":"+proj=utm +zone=15 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
}



Another useful item would be an optional "bounds" element that specifies the bounding envelope in the element's crs.
"bounds":[min_lon, min_lat, max_lon, max_lat]


Minimalistic and precise.

GeoJSON redux

Sean Gilles has a post responding to my call for a simplified coordinate representation in GeoJSON. His argument is that clarity of representation is more important that implementation overhead which I agree with to some extent.

However, it seems to me that the reason that JSON is so much better than XML for many purposes is exactly that it *does* take into account implementation overhead, thereby making it easier to exchange internal data structures without the overhead of XML.

For any sane implementation of JSON, the following must be extracted as a list of lists:

[ [x1, y1, z1], ..., [xn, yn, zn] ]

So a list of 10,000 coordinates will give you 10,000 lists of 3 floating points each. Or 10,000 list structures and 30,000 floats. Each list takes time and memory to create and address and extract the elements out of. Whereas:

[ x1, y1, z1, ..., xn, yn, zn ]

gives you the same 30,000 floats but only one list. A big win for any standardized JSON reading algorithm. Creating a custom, context sensitive JSON parser to ignore the structure is more than a little implementation detail.

Strangely, after Sean argues against removing context information to improve clarity, he then requests just that. He suggests that the "type" element that describes what kind of GeoJSON element you are looking at be removed since it should be obvious from the type of request that you made to receive the GeoJSON content. What if one were to receive a set of files with various different sets of data, some single features, some feature sets and perhaps some just geometry elements? If you remove the type field from the geometry elements, how would you know what kind of geometry you have? You couldn't tell a Point from a Polygon without the type field.

All that said, I do have another simplification for GeoJSON that is unrelated to the previous issues. How about we do away with the 'Point', 'LineString' and 'Polygon' geometry types. No really. They are just special cases of 'MultiPoint', 'MultiLineString' and 'MultiPolygon' with one element. I find myself writing code to take the special case single element entities and put them into the more general multi-element entities. That is code I would much rather not write since it introduces complexity and potential bugs. The only real difference in the two is the lack of a "Multi" and an extra set of an enclosing brackets.

Monday, September 17, 2007

JSON beats XML, some comments on GeoJSON

I've been setting up some new data services and after experimenting with formatting my data in JSON (Javascript Object Notation) and I'm hooked. Programming mostly in C++ has made XML the easiest data format to use but I'm very unhappy with the hurdles that one must go through to go from XML to the internal representation of the data. I've been doing a lot of Javascript and Python programming lately and am just blown away by how easy it is to create, maintain and share data through the JSON format.

For XML, on the server end one would take an internal data representation in Python and create custom classes to format each data item with the appropriate XML tag, convert it to a string and output the XML tree. Not too hard if it is a relatively shallow dataset without too much data nesting. On the client end, you would write classes that would decode the tree structure of the tags with full knowledge of each tag's data type (string, integer, floating point, etc...) and create a new data structure to mirror the server side data.

For JSON, you take your data structure, usually a dictionary structure with a name associated with a value, dump it directly to JSON with a single procedure call and save it to a file. On the client end, you load the data directly into a similar data structure with a single call. No special decoding classes needed, no XML tag data types (string, integer, float) to have foreknowledge of.

I've been looking at the GeoJSON specification and it looks pretty nice and but with a few caveats. Say you have a LineString feature type, the coordinates are a list of lists which contain 2 coordinate elements (longitude and latitude). This is extremely space and processing time inefficient, which is very important for me with large datasets. I suggest using the standard single list of coordinates like is done in all other formats like GML and KML. Also it would be nice if there were some sort of feature style information specified like color, line width, line style and placemark icons.

I'll definitely be using my modified GeoJSON format when I return to C++ programing for EarthBrowser v3 soon. Sorry XML, please don't feel bad. It's not you, it's me. I just feel like we need a little space.