Introducing Data Science Stack: set up an ML environment with 3 commands on Ubuntu
Canonical
on 17 September 2024
Tags: AI/ML , Data science , MLOps
Canonical, the publisher of Ubuntu, today announced the general availability of Data Science Stack (DSS), an out-of-the-box solution for data science that enables ML environments on your AI workstation. It is fully open source, free to use and native to Ubuntu. It is also accessible on other Linux distributions, on Windows using Windows Subsystem Linux (WSL), and on macOS with Multipass. DSS is a command line interface-based tool that bundles Jupyter Notebooks, MLflow and frameworks like PyTtorch and TensorFflow on top of an orchestration later. Canonical provides security maintenance for all of the packages included in the solution, ensuring timely vulnerability patching and protection of both the software and created artefacts.
Your ML environment in just three commands
AI adoption is widespread, but so are the challenges in successfully implementing it. Consider the following statistics from Deloitte:
- 40% of organisations are adopting AI
- 25% of AI practitioners are blocked by package dependencies
- 24% of AI practitioners struggle to access compute resources
In light of this context, business leaders are feeling the pressure to quickly get AI capability up to speed and showcase return-on-investment from AI projects. Shortening the time required to set up ML environments is crucial to accelerating project delivery and the initial exploration phase of AI within organisations. That’s why we created the Data Science Stack (DSS).
Data Science Stack (DSS) can be set up with just three commands, enabling quick initial exploration on AI workstations. Practitioners will only need to set up their container orchestration layer, install the DSS CLI and initialise the Data Science Stack in order to access the environment. This can be done in 10-30 minutes, depending on the practitioner’s experience level.
Canonical’s Silicon Alliance ecosystem manager, Chris Schnabel, elaborated, “This takes away the burden of managing any of the package dependencies or setting up the compute resources, thanks to the simple commands that AI practitioners can run. By default, DSS includes access to Jupyter Notebook for model development, MLflow for experiment tracking and model registry and ML frameworks such as Pytorch and Tensorflow. However, users can customise Data Science Stack and add new libraries depending on their use case.”
For practitioners, DSS is helpful to get familiar with tools they can use for large scale ML environments. DSS also provides migration paths, helping them grow their AI initiatives as projects mature.
Optimised to work on any hardware type
We created Data Science Stack to work on any hardware type, in order to optimise the user experience and enable users to get the best performance on their hardware of choice. DSS uses optimised ML frameworks from different vendors like PyTorch and TensorFlow, in order to provide users choice of the most popular distributions and achieve the highest performance levels possible. In the case of Intel, they drive their hardware optimizations upstream to these community projects. However, in order to get earlier access to performance enhancements and capabilities like Intel GPU support in advance of arrival upstream, AI practitioners can also access ITEX and IPEX, Intel’s distributions of PyTorch and Tensorflow. IPEX and ITEX add improved optimisation performance based on the hardware, taking advantage of the Advanced Vector Extensions (AVX), Vector Neural Network Instructions (VNNI) and Advanced Matrix Extensions (AMX). By integrating these extensions, in addition to GPU acceleration, DSS benefits from acceleration for operations prevalent in AI use cases, reducing the model training time and accelerating the experimentation phase of ML projects.
“Canonical’s Data Science Stack provides an essential foundation for AI practitioners to quickly advance their machine learning and data science capabilities,” Arun Gupta, Vice President and General Manager for Open Ecosystem at Intel. “By aligning with upstream PyTorch and TensorFlow, we ensure that developers are working with the most progressive tools available. Our collaboration through the OPEA project amplifies this impact, streamlining AI development and making innovation more accessible for everyone.”
AI workstations are a strategic product for many computer manufacturers. Solutions like Data Science Stack enable them to offer a seamless experience on any device, helping them diversify the chosen GPU, without affecting the user experience.
Get security and support from a single vendor
McKinsey reports that 51% of organisations using AI consider cybersecurity to be the highest risk they need to mitigate, followed by regulatory compliance at 36%. This affects all layers and scales of ML development, from AI workstations to data centres or edge devices.
When data scientists set up their ML environments, they use containers and open source tools from different sources, without necessarily considering the security risks. Within an enterprise, security can also quickly turn into a burden for sysadmins, who need to deploy and maintain a variety of tools – which are often different due to lack of standards. DSS provides a consistent architecture for ML environments that can be rolled out at scale on many machines.
Ubuntu is the most adopted Linux distribution (source: StackOverflow report), with a high number of AI/ML practitioners using it for their projects. As business leaders allocate budget for ML projects and professionals start doing initial exploration, they will start deploying solutions on workstations as well.
Organisations can purchase security maintenance and support through Ubuntu Pro. Enterprises benefit from enterprise support for their ML solution on environments so that they can resolve issues in a timely manner, in line with Canonical’s SLAs.
To learn more about Data Science Stack, look at our webinar. We’ll provide a rundown of the features you can benefit from and demonstrate the three-command setup.
Further reading
Talk to us today
Interested in running Ubuntu in your organisation?
Newsletter signup
Related posts
Canonical joins OPEA to enable Enterprise AI
Canonical is committed to enabling organizations to secure and scale their AI/ML projects in production. This is why we are pleased to announce that we have...
Top 5 reasons to use Ubuntu for your AI/ML projects
For 20 years, Ubuntu has been at the cutting edge of technology. Pioneers looking to innovate new technologies and ideas choose Ubuntu as the medium to do it,...
Solving newcomer data science challenges with Canonical’s Data Science Stack – now in beta
Data science is one of the most exciting topics of the last century. With its utility in industries of all kinds, it’s easy to see why it has been rated as...