Guest article by Steven Offen

I have recently been on training for a new federated data platform solution called fraXses. The promise was that the solution could consolidate data from any data source without any ETL development simply by using a configuration GUI. These solutions have been spoken about for many years but this is the first I have seen, that is able to deliver on the promises.

By the end of the 2-day training session, all of us (and some of us were not overly technical) were able to:

  1. Connect to a SQLServer, MemSQL, a JSON and CSV files. (15 min)
  2. Run the automated data discovery module over the sources and publish the data into federated data objects ready from consumption by downstream applications or BI tools. (30 min)
  3. Build some basic BI reports on the consolidated data sets that could be run in real time by consuming data from the sources on the click of the refresh button. (15 min)

Yes, these were training data sources which weren’t complex, however, the speed and ease at which we were able to accomplish this was truly amazing.

If this was done in a traditional DW method it would have involved building:

  • Source system extracts or ETL jobs to extract the data from source.
  • More ETL jobs (20+) to transform the source data into a conformed structure
  • Design and build of the Data Warehouse tables to house the consolidate data.
  • BI layer design to enable to publishing of the data through the BI tools.

And even if you manage this, the outcome is a batch process which needs to be run, usually nightly.

To put this into perspective a recent end to end implementation of the fraXses solution was 8 weeks.

This included standing up the platform, integrating six data sources including Postgres, SQLserver, Oracle, Access and CSV data sources. A total of 30,000 tables and all of this with no ETL development work.

In comparison I would suggest a solution of this size would take at least a year to implement using traditional Data Warehouse methods.

What the fraXses team have managed to build is a configurable federated data solution based on leading technologies such as Kafka (streaming), MEMSQL (in-memory and columnar DB) and Spark for the engine. The solution is exceptionally fast, fast enough to be able to read from a source in real-time and publish to a consumer such as a web application or BI tool in sub-second response times. Gone are the days of an overnight batch run, unless of course a business requires it.

In the image below you can see what is possible with the solution. The timeline across the top includes all the email or voice conversations about a topic the user can select. If you click on the email or phone icon this automatically displays the email or plays the voice conversation from the various data sources. The voice conversations are also transposed to text so you can fast forward the voice playback to the point that the searched word is mentioned. The data source used below are the publicly available emails and voice conversations from Enron. All this is available in real-time, at the click of a button, something I couldn’t even imagine a few years back.

Nice to read a concise explanation of this federated Data Virtualisation solution. Federation University was the 1st uni in Australia to embrace this innovative solution (brag rights to follow hehee!) with a simple “click … refresh”, & got Big Data at our fingertips. We leapfrogged over the traditional Data Warehouse investment saving a fortune, are now more responsive, adaptive, innovative and ready to rapidly expand upon this platform to offer DEEP insight’s into business, our valued students & research. Who ever said universities are too traditional, FedUni is “breaking the mould”. So watch this space! DWH is dead! The Data Warehouse is dead!

Andrew Tully -Executive Director Information Technology & Business Services (CIO) at Federation University Australia