An Introduction to Statistics and Data Science and Differences between Them
By Maroje Portada on February 16, 2022
There are many long and complicated definitions of statistics. Those are less interesting for anyone not well versed in this field. Here are several simple definitions instead:
- Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data. (https://www.merriam-webster.com/dictionary/statistics)
- Statistics is a collection of quantitative data. (https://www.merriam-webster.com/dictionary/statistics)
- Statistics is a set of mathematical methods and tools that enable us to answer important questions about data. (https://www.freecodecamp.org/news/statistics-for-data-science/)
Within statistics there are two branches: descriptive and inferential statistics.
Descriptive statistics provides methods to organize, summarize and present raw data into something more convenient and informative called information. That information can then be interpreted and shared. In descriptive statistics it is possible to use many different graphical and numerical techniques to describe the data.
Inferential statistics offers us methods to examine small samples of data and make estimations or draw conclusions about bigger sets of data called population. Those estimations and conclusions can be true but it isn’t always so.
What Can We Use Statistics for?
We can find statistics in our lives much more than we are aware. Weather prediction, election polls, estimation of economic growth, stock price on markets, demographics, sports statistics, behavior of users on social networks, trendy topics, successful sales on social networks and much more.
Wherever there is data, there is potential for use of statistics. There are many complex problems in all parts of our lives that can be solved with statistics. It is important to notice that statistics helps in making more concrete decisions with less risk and uncertainty. While intuition is useful, we should always use as much information as we can to make better decisions.
What Is Data Science?
After covering the topic of statistics, it is time to say something about data science as well. One simple definition of data science considers it a multidisciplinary field that combines some technical skills with soft skills to extract information from structured and unstructured data.
The principal purpose of data science is to find patterns between the data. It is still expanding and its evolution is heavily dependent on development of technology, especially computer science and programming languages.
Similarities and Differences between Data Scientists and Statisticians
Fields of work for data scientists and statisticians are quite closely related even to the point of often being considered as synonyms, but that is mostly not true - there are also many differences to distinguish the two.
What are similarities between data scientists and statisticians?
Both roles:
- need some degree of understanding of mathematics;
- investigate problems;
- analyse data;
- analyse trends;
- create forecasts;
- use visualisations;
- often report their findings to non-technical users;
What are the differences?
- Data scientists use computer science, algorithms or machine learning more than statisticians.
- Data scientists are more involved in creation and use of data systems, while statisticians focus more on the equations and mathematical models that they use for their analysis.
- Data scientists more often use big data, while statisticians typically use smaller data sets.
- Data scientists compare many methods to create the best machine learning model while statisticians more often improve a single model until it befits their data set.
- Statisticians focus more on quantifying uncertainty and making inferences.
Final Thoughts
Statistics and data science have lots of things in common. Use of mathematics, investigation of problems and data analysis are just a few of them. There are also differences like the level of information technology used, usual size of data sets and approach to the learning model.
Most certainly data science and statistics will continue to coexist and to some extent influence one another.
The goal of this topic was to bring those areas closer to people who don’t know much about them in the simplest possible way. Would you like to share your experience with statistics and data science? Your thoughts and comments are more than welcome.
Recent Blog Posts
We build AI for your needs
Partner with us to develop an AI solution specifically tailored to your business.
Contact us