Must Know Statistical Concepts to Become a Data Scientist
By Aditi Bhat
Big Data! Artificial Intelligence! Data Analytics! Sassy Nerd! Digital Wizardry! Predictions! Sea Of Probabilities!
All this comes to our mind, when we think about Data Science.
So, who is responsible for the clean data available in an interesting and understandable manner? A data scientist, of course!
“Data Scientist = Statistician + Programmer + Coach + Storyteller + Artist”
What do data scientists do?
The primary need of the hour is to understand data. It is also important to convert unclean data into a compact and comprehensible form that can be further analyzed with the help of data visualization tools. Clean data is the foundation on which estimations, predictions and thus strategies are built. Also, to get clean data, one must know statistics as it’s the foundation of data science program.
Here we're learning to walk into the world of data science with baby steps into statistics. The basic statistical concepts one must know to become a data scientist are:
1. Descriptive Analysis:
Exploratory data analysis is an approach that analyses a given data set by summarizing its characteristics with visual methods. It could also represent the entire data set with its features or just a part of the data set sample.
So there are two ways of describing data:
- Measures of central tendency – Mean, median, mode.
- Measures of variability – Used to analyze data spread or variability in a data set.
2. Statistical Inference:
From the data described or visualized during the descriptive analysis, we try to understand the characteristics of the data set.
For example, to predict the number of voters going to support a candidate, the data would include, clearly defined population of interest, a clear parameter chosen by data scientists, and an estimate of the population having that set parameter.
3. Bayesian Statistics:
Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It also provides tools to update the existing beliefs with the evidence of new data. This approach, consequently, allows for better accounting of uncertainty, more intuitive results, comprehensible meaning, and more explicit statements of assumptions.
To understand this thoroughly you’d have to study:
- Conditional Probability
- Bayes Theorem
4. Experimental Design:
Experimental Design is the concept of designing and planning experiments to yield the cause and effect of the relationship between variables in a study.
The best example of this is everyday cooking. Consider the dish being prepared to be the end result.
A certain process is designed to attain the goal, parameters such as salt, sugar and other flavoring could be varied to end up with a different result every time – i.e. understanding the various causes and effects.
Tell us about the statistical concepts you use to solve complex problems in the comments section below.