Building an online datahub with Spark

Organisations demand accurate, timely, high quality data on which to base their

decisions. Building an effective, online data hub to facilitate access to this data

means ensuring solution scalability and reliability. It also means building for data

trustworthiness.

This paper addresses the value, use cases and challenges associated with building

an enterprise data hub – whether on the public cloud or on-premise – based on

Apache Spark.

Apache Spark is an open source software development framework and runtime

that helps users develop parallel, distributed data processing and machine

learning applications to run at scale. Spark combines capabilities for in-memory

distributed, grid data processing with the ability to spill intermediary datasets to

disk if required.

In this whitepaper, we cover:

Further reading:

Contact information

First name:
Last name:
Work email:
Company name:
Job title:
Mobile/cell phone number:
Country:
I agree to receive information about Canonical's products and services.
Website:
Name:
In submitting this form, I confirm that I have read and agree to Canonical's Privacy Notice and Privacy Policy.