Everything You Wanted to Know About Data Science

Nomar

2021-07-31

Let's talk about the mismatch of expectations, grades and interviews, as well as about what tasks data Scientists solve.

What is Data Science?
Who is Data Scientist?
Computer Science
Tech

Everything You Wanted to Know About Data Science Webmaster CAGE — Today I want to talk about Data Science: what is it all about in the eyes of candidates, employers and experts.

What is Data Science?

Perhaps the most concise definition that I have been able to find on the Internet: {hihlight} Data science is the discipline that makes data useful.

I think that if you find the intersection of different definitions of what is Data Science, then they will be only one word - data. All of this suggests that the breadth of Data Science is enormous. Agree, but this is not good for anyone: neither for you, nor for the business. This latitude does not provide any information about your potential activity. After all, you can do whatever you want with the data. You can build complex reports or jiggle tables using SQL. You can predict taxi demand by a constant or build complex mathematical models of dynamic pricing. And you can also configure streaming data processing for high-load services operating in real time. Check Data science from scratch: first principles with Python PDF book to learn Data Science basics.

In general, where does the word "science" have to do with it? Of course, Data Science has a very serious mathematical apparatus under the hood: optimization theory, linear algebra, mathematical statistics and other areas of mathematics. But only a few are engaged in real academic work. Business does not need scientific work, but problem solving. Only giants can afford a staff of employees who will only do what to study and write scientific papers, invent new and improve the current algorithms and methods of machine learning.

Unfortunately, many experts in this field at various events often associate Data Science primarily with building models using machine learning algorithms and rarely tell the most important thing, in my opinion, where the need for a particular task arose from, how it was is formulated in "mathematical language", how it is all implemented in operation, how to conduct an honest experiment in order to correctly assess the business effect - cv2.puttext is a good example.

Who is Data Scientist?

When we realized that we didn't understand anything, it's worth talking about data scientists - data scientists.

Some believe that this position involves the construction of neural networks in Jupyter Notebook'e. Others expect such specialists to come and complete all tasks on a turnkey basis. And still others just want to have such fashionable guys on the staff. Such a different understanding of the position or misunderstanding at all can harm both you, as a candidate, and the company when hiring.

Computer Science

Computer Science is a certain area of closely related disciplines, but for some reason no one is looking for a Computer Scientist's job. They are looking for a job developer, tester, DevOps, architects. Even developers are looking for frontend and backend developers, to the extent that they are looking for a backend C ++ developer. Why is it good? Because even from the job title it is 90% clear what the C ++ backend developer will be doing. This provides a lot of information and reduces entropy. And if you are suddenly looking for a Computer Scientist, what is it in Russian, a computer scientist? This is something from the nineties or zero. "Our printer is broken, call a computer technician." We also strongle recomend to learn ore about pandas reset_index.

Tech

A problem emerges from all this. If you go to 10 interviews, not even necessarily to different companies in which they are looking for Data Scientist, then you will understand that at each interview you will be expected to be completely different, and in the end you will have completely different tasks. Somewhere you will be offered 200 Excel files as part of the AI transformation. Elsewhere, they will offer to raise the cluster by several petabytes. In your third interview, you will be told what is expected of you to visualize metrics in Tableau. On the fourth, you will be asked to build a real-time recommendation system that will work under a load of several thousand requests per second. The fifth interview will have computer vision problems, and the sixth will have to write complex SQL scripts. In the seventh company you will be forced to read articles, build beautiful Jupyter notebooks and write some kind of forecasts. And somewhere else, you can collect these calculations in a Docker container, and use Kubernetes to deploy your service to many machines.