What is Cloud?
What exactly is ‘Cloud’? And what is ‘Cloud Computing’? To put it simply, ‘Cloud’ means ‘the internet’ and ‘Cloud Computing’ is the delivery of computing services over the internet. Cloud Computing enables users to rent physical data servers, storage, databases and computing power from cloud providers under a pay-as-you-go payment scheme. Sometimes the cloud providers also supply software and analytics on the cloud.
According to Microsoft Azure, cloud services can be classified into four main categories: Infrastructure as a service (IaaS), Platform as a service (PaaS), Serverless Computing and Software as a service (SaaS). IaaS refers to the renting of IT infrastructure including servers, virtual machines, storage, networks and operating systems. PaaS provides an environment for developing and managing web or mobile software applications. Serverless Computing focuses on providing the management and maintenance of infrastructures needed for app development. SaaS involves supplying software applications on demand over the internet.
Who are the major players?
The main players in the Cloud Computing market include Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), IBM Cloud, VMware Cloud, Oracle Cloud and Alibaba Cloud. According to Canalys, a global technology market analyst firm, the 2019 world-wide market shares for the main players are estimated to be 32.3% for AWS, 16.9% for Microsoft Azure and 5.8% for Google Cloud Platform. AWS still leads the race, building on its advantage of having pioneered cloud services in 2006, seven years before its competitors entered the market. However, in terms of sales growth, Google Cloud Platform has experienced 88% YoY growth from 2018 to 2019, quickly expanding its footprint in the cloud computing industry. The landscape will look particularly interesting in the coming years, especially with Alibaba, the newly emerged Chinese company, following close behind with 4.9% market share. But this article will only focus on the top 3 players, namely Amazon Web Services, Microsoft’s Azure and Google Cloud Platform.
Cloud Computing for Data Scientists
Cloud Computing is becoming increasingly vital and not just for software developers; indeed, it is becoming particularly relevant in the field of big data analytics: Cloud computing makes expanding computing power and deploying data solutions much easier and is therefore handy for data scientists who are digging into large datasets.
Each of the three major cloud providers has a set of powerful tools for data scientists.
For AWS, widely known tools include Redshift, EC2, EMR, S3, Data Pipeline and Database Migration Service. Representative customers of these services include Standard Chartered Bank and S&P Global Ratings in the Financial Services sector, Skyscanner in the Travel & Hospitality sector, Nielsen in the Marketing and Advertising sector, Royal Dutch Shell in the Energy sector and The Guardian in the Media sector.
Microsoft Azure, on the other hand, provides AzureSQL, DocumentDB, AzureTable and AzureBlob for data storage purposes, HDinsight as a HortonWorks distribution of Hadoop (including Hive, MapReduce, Spark,etc.) and AzureML for an easy implementation of machine learning algorithms. A plus of using Azure is that all tools mentioned above can be integrated with Microsoft Excel and Power BI making results easier to visualize and more accessible for individuals with different technical skills. Some of the main customers of its data-related services in the UK are Concentra, NEL and Presence Orb.
For Google Cloud Platform, widely utilized services include Google BigQuery for data collection and exploration, Vision/Speech/Translate/Natural Language API for data extraction and transformation, Cloud Dataprep, Cloud Dataflow and Apache Beam for data cleansing, Data Studio for visualization, and Tensorflow for machine learning purposes. Customers in Europe include HSBC in the Financial Services sector, Sky U.K. and ITV in the Media sector, Philips in the Manufacturing sector, and AB InBev, Burger King, Ferrero and Morrisons in the Retail and Consumer Goods sector.
Skillset needed on the Cloud
According to a study of job listings on SimplyHired, Indeed, Monster, and LinkedIn in December 2019, demand for cloud platform skills is rising, and AWS showed up in around 20% of job listings with the key word ‘Data Scientist’, while Azure showed up in around 10%.
Therefore, a qualified data scientist needs to hone their cloud computing skills. Specifically, this entails learning to perform a series of tasks in the data pipeline on the cloud: from data acquisition, data cleansing, data transformation and data mining, to model training and testing, utilizing the toolkits provided by major cloud platforms, especially by AWS and Azure. Ali Hussnain, Associate Consultant at Fyte, sat down with Paul van Loon, Head of Analytics at Forecast to discuss what this means for those working in or seeking roles in Data Science.
Cloud Computing and Architecture for Data Scientists https://www.datacamp.com/community/blog/data-science-cloud
Amazon Web Services (AWS) – Cloud Computing Services https://aws.amazon.com/
See the amazing things people are doing with Azure https://azure.microsoft.com/en-gb/case-studies/?country=UnitedKingdom
Google Cloud customers https://cloud.google.com/customers#/