Sunday, April 19, 2015

What is a Data Scientist?

Data Scientist has been called the Sexiest Job of the 21st Century... but many people (including some companies trying to hire data scientists) don't really understand what this job means.  The term is used to describe a wide variety of roles; A data scientist at one company doesn't necessarily do the same thing as a data scientist at another company.

Below I break down some of the different 'types' of data scientist jobs there are and the skills needed for these various roles.  Please note that this list is not exhaustive, and sometimes a data science position expects someone to fill multiple of the below roles:

Data Analyst
  • Derive business insight from data. 
  • Work across all teams within an organization. 
  • Answer questions using analysis of data.
  • Design and perform experiments and tests. 
  • Create forecasts and models.  
  • Prioritize which questions and analyses are actionable and valuable.
  • Help teams/executives make data-driven decisions.
  • Communicate results across the company to technical and non-technical people.
Required Skills: SQL, Statistics, programming, data management, data analysis, data modeling, data visualization, experimental design, decision making, prioritization, project management, product development, communication. 

Data Architect
  • Design systems to get raw data into an easily analyzable form.  
  • Act as a bridge between engineers and analysts. 
  • Organize data into useful database tables for analysis.
  • Optimize data sets for efficient analysis.
  • Create ETL Systems for your data sets. 
Required Skills: SQL, computer programming, backend software engineering, database design, database management, data optimization, data modeling.

Data Engineer
  • Work with analysts to build internal tools for analyzing, visualizing, and sharing data. 
  • Design and maintain A/B testing systems.
  • Work with engineers to insure that the right data is being collected.
  • Create systems which allow analyst work to scale.
  • Work with data architects / operations to insure the data is organized for optimal analysis.
Required Skills: SQL, computer programming, full-stack software engineering, data visualization, database management, communication.

Domain Experts
People who have advanced specializations like:
Required Skills: Advanced degree in computer science, math, statistics, finance, or economics. Programming, statistics, data management, data analysis, data modeling, data visualization, experimental design, communication.

Someone who does all of the above.
(basically impossible to find, thus the name)

Some FAQs about Data Science Jobs

Q: Do I need an advanced degree to be a data scientist?
A: No
In general, the tech industry cares very little about degrees or pedigree.  I have been at companies where the CTO didn't graduate high school, and I have been at companies where many people have PhDs.  I would be very surprised if any company would not consider an (otherwise qualified) candidate for a data role just because they didn't have a certain degree.

That being said, many of the skills required to be a data scientist overlap with the skills required to be a scientific researcher.  People who have worked as researchers (either because they did an advanced degree, or because they worked in a lab) tend to be good candidates for data science jobs.

However, you can get this experience in a lot of other ways.  For instance, many people start out as (junior) analysts where they pick up many of the skills needed to be data scientists and then become data scientists after 3-5 years of industry experience.  Some people start out as software developers and work on more and more data-oriented projects and get into data science through engineering. Other people start out as financial, marketing, or business analysts and become data scientists through that path.

If you want to be a data scientist, I would look at the skills required for the above roles and then find ways to develop those skills either through schooling, your current job, or self-directed projects.

Q: What are the most important skills needed to be a data scientist?
A: SQL, Data Analysis, Programming
Of course, this depends on exactly what is expected for a particular role.  Some data science roles are more analyst-oriented, some are more engineering-oriented, some are more specialized.  This is something you can assess from the job description, but should also be discussed during the interview process.

However, in general I would say the most important skills are the following:
1) SQL
2) Data Analysis
3) Statistical Programming

So if you were going to learn one thing, I would say learn SQL.  Then do a project that involves analyzing a data set and deriving results.  Then learn a programming language like Python or R.

Q:  How do I develop these skills?
A: Online course, hackathons, meet-ups, volunteer organizations.
There are a lot of great online resources which can help you develop these skills.