Fuse or not to fuse data?

Do you want to have everything in one pile? Or are you claustrophobic and don’t like to squeeze in? Today’s article will be about that topic. But because we are interested in data and working with it, I will simplify the topic to a simple sentence: to fuse or not to fuse data? That’s what this is about. Let’s do it.

VTS Geospatial is a powerful full-stack system for streaming and rendering 3D Geospatial content. It is designed to work with big data and aims to bring a smooth user experience when visualizing 3D geometric reconstructions of reality with street level detail and on planetary scale. It is a first class, production-deployment proven engine for building virtual globes. The core part of the system, the actual engine that powers it, is called VTS Backend and consists of streaming servers and a set of tools for data manipulation and processing.

Since 2012, Melown Technologies has specialized in mass-scale computer vision-based modeling of urban and natural landscapes from aerial imagery. As a major part of this process, we have developed an advanced 3D visualization technology which allows interactive web-based rendering of these landscapes, from street-level detail to planetary scale. The streaming and rendering stack, codenamed VTS, has since been extended to include a native C++ client library and massively powerful server-side data fusion functionality. VTS is not a mere JS library and not a mere geospatial data server; rather, it is a fully integrated rendering and visualization stack providing everything needed to travel the long road from server-side data storage to the client browser or desktop.
DISCOVER MORE

In the previous blogpost, high degree of freedom in spatial reference definitions and resulting benefits was introduced as well as a wide scale of supported geospatial data formats, that VTS users can employ in their applications. VTS Backend can also be considered a high-performance state-of-the art data fusion system, that offers some unique features. What VTS Geospatial has to offer when it comes to Data fusion is the main subject of this article.

Data fusion

But first, let’s look at what we actually mean by data fusion. Data fusion is the process of integrating multiple data sources to produce more complex, consistent, accurate and content-rich dataset.
A general expectation is that an output fused dataset will be considerably more informative than a simple sum of the original individual datasets. It enables to see the overall context, reveal connections between individual data points and allow to visualize the data in a way that makes obvious what would otherwise remain hidden.

Main benefits of data fusion

more complex
consistent
accurate
content-rich dataset

In the context of geospatial applications, primary objectives of data fusion are precise spatial or spatiotemporal alignment of input datasets, mutual information enrichment and due to the highly visual nature of geospatial science, visualization itself. In the pursuit of fulfilling these objectives, a plethora of differences between the input datasets needs to be solved. On the low level, differences in geospatial data might be classified as follows:

Data type (vector, raster, text…)
Dimensionality (2D vs. 3D vs. 4D…)
Data format (actual format expressed in file extension, such as .tiff, .jpg, .shp, .json, .vef, .slpk…)
Spatial resolution (pixel size in raster images, sampling distance in point clouds…)
Temporal resolution (timestamp factor in historical, periodically updated or real-time data)
Spatial reference (coordinate system to which data geometries are related)

Due to the amount of and often also dissimilar nature of individual data properties, a proper alignment and data interconnection is usually not a trivial task. VTS Geospatial was designed to give its users tools to cope with these tasks with as less struggle as possible. As also mentioned in previous post, VTS Geospatial substantially leverages the power of GDAL/OGR and PROJ open-source libraries, which facilitate to address a lot of the data format and spatial reference related fusion challenges.

Two “runners” with different approaches
No-fusion & total fusion

In the very early stages of VTS Geospatial design, different architectural patterns of data fusion were considered. In general, from the server-side standpoint there are two extremes that might be assessed. First is what we call no-fusion approach, where all data are streamed to the client, where all fusion takes place. On the other end there is fuse-them-all approach, where everything is fused together on the server and the client doesn’t need to deal with anything, but just rendering the result.

Pluses & minuses

Both options have of course their advantages and disadvantages. The no-fusion approach gives clients freedom to deal with the data independently, which elevate capabilities of data visualization and interpretation on the front-end.
But it is also very resource intensive, cause redundant rendering and bandwidth and would often require a robust client-side application in order to achieve a nice-looking result. Contrary to that, fuse-them-all arrangement allows to impose the lowest possible load on the client, which is suitable for web-applications, but it also implies a higher computation load on the backend infrastructure and usually leads to static data with limited to none possibility for clients to interact with.

Which way are we going?

VTS Geospatial started with the fuse-them-all approach, but the limitations brought about by that decision soon become too restrictive. Maintenance and updates of large fused datasets became tedious. We also needed to offer more dynamic content to the users and let them interact with it, than provide just a ‘dumb displaying’. A more balanced, middle ground solution was adopted to leverage benefits and cut down on drawbacks of both architectural patterns.