Python and SQL serves as foundational toolkit for data science. Business and domain understanding is the key to any data science project / initiative.
Why ?
If you want to become data scientist because of financial aspect, then there are many other jobs/roles which can help draw higher compensation packages.
But, if you want to get into this, because you think you can talk to data and read between numbers (like reading between lines), it is definitely worth getting into it.
However, it is worth to really understand few basics before you take decision and this is exactly we are here to help and even guide you.
Skills to acquire
There are 4 skills which any data scientist should be acquiring and believe me, these qualities can be acquired by anyone.
1. Problem solving attitude: A Data Scientist has to have a problem solving attitude.
2. Communication: Data Scientist should possess excellent communicate skills because he/she has to present his recommendations (derived from technical steps) to set of non-technical people. Hence, his articulation has to be perfect so that any layman can see the same perspective which a data scientist is carrying. He/She should also be a story teller which makes the final presentation interesting.
3. Programming: This skill makes any data scientist accomplish the task in stipulated time and efficiently. Good programming skills ensure that the solution or model is space and time efficient. It means that the solution should not consume humungous storage or compute to show final results. This includes SQL as well because SQL is required to play around with data spread across multiple tables or even across multiple databases. A model is a pattern denoted in mathematical equation and is outcome of machine learning process.
4. Mathematics / Statistics: Mathematics and Statistics are building blocks of any Data Science project. Probability, differential calculus, hypothesis testing are few areas which help data scientist to debug and fine tune the hyper-parameters effectively.
1. Defining the problem in terms of Data: We all know that well begun is half done. Hence, it is very important quite important that we translate our business problem into Data Science problem. This means that we know what decision we need to take, for eg – whether we need to predict sales or we need to decide whether a product has to be added or not, whether we need to decide to open the store on Sunday or not, whether we acquire a startup or not and so on .
2. Data Collection: Once we have defined Data Science problem, we start collecting data. Undoubtedly, we know that data collection is a foundational building block to any data science problem to solve. Data can be collected through structured format (databases, available datasets, internet history) or unstructured formats (videos, blogs, etc) .
3. Cleansing: Cleaning/Scrubbing the data refers to the process where the data doesn’t have NULL values, data doesn’t too many outliers, irrelevant columns have been removed and so on .
4. Exploratory Data Analysis: Visualize the data with the help of charts to identify patterns, outliers or key insight basis which further actions are required explained in next step (Feature Engineering)
5. Feature Engineering: This step is done to achieve right set of features with the help of following - a) add more records in the data-set, b) add more features, c) group operations (for eg max, min, pivot) d) normalization / scaling of data, e) logarithmic or exponential transformation, f) re-engineer the features for eg dimensionality reduction or identifying collinearity, g) one-hot encoding and there are few more .
6. Algorithm Selection: There are multiple algorithms which can be applied for a single business problem hence we can test multiple algorithms. For eg in case of classification, we can use logistics regression or decision trees or Naive Bayes depending on which algorithm gives better accuracy.
7. Modelling: This includes training the model which means finding right set of weights (associated with columns/features/data) to create a generalized equation (see our page on Machine Learning).
8. Business Insights: Finally, create set of recommendations for the business problem in a pack of MS-PowerPoint slides. This may include the charts from EDA (exploratory data analysis) and output of machine learning model. However, the final deliverable of data science project is the insights aiding to take business decisions.
1. Data Analyst: This role is responsible of data cleaning, visualization and transformation so that the analysis can be performed. In an ideal world, Data Analyst should be able to do analysis and come with some recommendations however usually, data analysts only do cleaning, visualization and transformation.
2. Data Engineer: This role is responsible for creating environment and platform so that models and analysis can be run. They are usually responsible for setting up Big data environment or infrastructure so that Data Scientists can run their algorithms.
3. Database Administrator: This is quite commonly known term and is responsible for managing database, which is structured data.
4. Machine Learning Engineer: This role is responsible for creating predictive models like classification, regression, clustering using algorithms such as decision trees, support vector machines, K-Means, etc. This role touches upon creating statistical inferences which benefits him/her while improving the accuracy of a model.
5. Data Architect: This role is more towards developing design and architecture of the enterprise data lake and data warehouse.
6. Business Analyst: This is a techno-business role where this person captures business requirements and environmental dimensions. He/She also translates business problem into scope and upto some extent gives data science shape to business problem.
7. Data Scientist: Undoubtedly, this is one of the most demanding jobs/roles as of now. This role is responsible for giving final recommendations or crafting a model which gives recommendations. A data scientist performs data analysis, performs statistical test, creates models and finally presents to business as well.
8. Data Consultant: This role is a scientist with extra-ordinary communication skills to convert prospects into deal.