Tried and tested way to break into Data Science
Career Starting Point
A bachelor’s or master’s degree in any quantitative subject could be a good starting point for a data science career. The most relevant subjects include Computer Science, Mathematics, Physics and Statistics. However, this is not to say your career path is doomed if you have studied something else.
The Data Science Life Cycle
For one thing, the concept of Data Science is relatively new: the definition of ‘Data Scientist’ (coined in 2008 and popularised in 2012) continues to evolve. Now the concept of Data Science involves a whole spectrum of tasks which you might have accomplished on your way to earning your degree. According to the School of information at Berkeley, the Data Science Life Cycle has five stages:
1. Capture: In this stage, the data scientist acquires the data via one or more means: by actively creating new data, obtaining existing data produced by organisations or receiving data created by devices.
2. Maintain: This stage is mainly about data integration and pre-processing without deriving commercial values from it just yet. The idea is to prepare your data for the synthesis and usage in the next stage. Maintenance tasks include data warehousing, data cleansing and the extract-transform-load (ETL) processes where data staging occurs.
3. Process: This stage is all about data mining, namely, discovering patterns in the dataset. A variety of methods are adopted: be it simple summarization or more advanced methods like clustering and classification.
4. Analyse: In this stage people extract insights from the dataset using both quantitative and qualitative analysis. Tasks range from Exploratory Data Analysis (EDA) and confirmatory analysis to regressions and predictive modelling. Text analytics and Natural Language Processing are also becoming increasingly popular in the context of customer comment or legal document analysis.
5. Communicate: The last stage is about communicating your results to stakeholders. This includes data reporting and visualization, and usually some business intelligence and commercial impact analysis to inform decision makers.
Now that you have a basic understanding of the tasks a Data Scientist performs on a day-to-day basis, it is clear to see that a STEM subject could help you kick-start your career as long as it includes in-depth knowledge of maths and statistics, some solid programming, and also some comprehensive practices of data collection, data cleaning, data analysis and data reporting. Hands-on experience working with different datasets would certainly help. There is also an abundance of books, handouts and online courses out there for you to learn along the way.
Core Competencies as a Data Scientist
What kind of data scientist is the market looking for? Before we land onto this subject, we need to look at the diverse types of careers related to data. Not just Data Scientists – there are also Business Analysts, Data Developers and Data Engineers. The skillset and knowledge required for each of the job family varies.
A conference paper by De Mauro et al.1 analysed vast amounts of job posts published online and shed some light on the skills required to thrive in the data industry. For example, Business Analysts are leaning more towards the commercial side of the picture: they are often equipped with effective communication skills and financial acumen to transform insights from the data into business impacts. However, on your career path as a data scientist, you probably need more than that — the focus of your skillset is on analytical methods and the ability of transforming data into actual insights.
As a data scientist, you need to be proficient in utilizing data warehouses and be adept at querying or extracting data from databases — whether in the cloud or on your local machine. It is also your responsibility to leverage the data at hand: to identify patterns, extract information despite noises and design models and implement them for descriptive or predictive purposes with the business context in mind. Therefore, a solid understanding of statistics is a must, and you will also need some programming skills to implement your model in R, Python or other languages. Furthermore, as a data scientist you would seek to continually improve your metrics and statistical models and integrate research and learning as part of that process – so a ‘scientist mindset’ would be helpful as you accumulate expertise through trial and error.