For coaching, please contact us at +91-9266245588


Machine Learning

When computers create patterns in the form of mathematical equation, based on historical data, it is referred as Machine Learning.

image
OVERVIEW

Basics # 1

Why Machine Learning
We know that there is humungous amount of data getting generated every minute for eg, retail payments, GPS, photos, blogs, videos, e-commerce, investments, insurance, healthcare, accounting, logistics, utilities and much more. Just because there is so much data, there lies opportunity to become predictive across all these aspects. Being predictive means being ready for future by taking right decisions in the present.

Definition
Machine Learning is study of computer science, statistics & mathematics to either make predictions or cluster data. Most widely used definition is that Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

Build your career in the music industry Categories of Machine Learning
Lets understand that machine learning works on data and that too on numeric data. This means that all text has to be converted into numeric data and then machine learning algorithm will be applied. This is discussed in the later section of Process.

1. Supervised Learning: This category of algorithms are required when we have independent variable (output) assigned to each set of dependent variables (features or columns in data-set). The problem is to predict the independent variable (output) given a new set of dependent variables (features).

2. Unsupervised Learning: This category of algorithms are required or used when data-set is required to be clustered or segregated into categories. For eg, if we have categorize students of a school in multiple categories based on their characteristics (address, height, weight, age, marks obtained last year, drawing skill, sports medals, music skill, wears spectacles or not, etc. Basis data in these features, a unsupervised model can categorize students into 3 or 4 categories viz a) studious, b) athlete, c) artist.

3. Reinforcement Learning: Reinforcement learning is subset of machine learning where the model calculates all possible paths/options to reach/calculate the destination and then choosing the path/option which gives rewards (positive points) with least penalties (negative points)

Overview

Basics # 2 (Glossary)

1. Independent Variable: There are set of variables/fields (often referred as features) which when combine derive the output of the data. For eg, rain prediction at any place can be forecasted with the help of multiple fields/variables like geographical status of the areas (tropical,coastal, mountain, etc), month of year, previous day’s status, humidity level, etc. These fields/variables are called dependent variables or features. There are usually multiple independent variables in any data-set.


2. Dependent Variable: The output of the data is dependent variable. In above example, dependent variable will be rain forecast (Yes or No) or how much are we expecting rain (measured in mm). There is usually 1 dependent variable in any data-set however there could be multiple dependent variables as well.


3. Data-set: Combination of independent variables and independent variables is called data-set. For machine learning problem, it is usually in the tabular form every row is one data entry and every column is one feature (or independent variable). For eg, in above rain forecasting example, 300 examples (300 days of data) or rows and 5 cols (across 4 features and 1 output).


4. Training Data: Complete data-set is divided into 3 parts and training data is usually the biggest chunk of the divided set. This is called training data because usually machine learning algorithm works on this set and creates its model (technically called equation).


5. Validation Data: This is the 02nd chunk of data (from the bigger set of complete data-set) which is used to validate the accuracy or correctness of the model created. The model or equation (created during the training) is run on this validation set and while the model is being run, model changes hyper-parameters to improve the accuracy further.


6. Test Data: This is the final chunk of data-set on which the model is run to predict the accuracy score.

7. Fitting the data (or training): Whenever any says that data is being fit or data is being trained, it means that machine learning algorithm is creating a model or creating a generalized equation to which a data can fit. For eg, the equation of a circle in 2-dimensional space is (x-h)2 - (y-k)2 = r2 where r is radius and circle is centered as (h,k). Now, this equation is a generalized equation where we put any x,y and it will create a circle. Similarly, after a model is created, whenever we put new values of independent variables, machine learning model will give value of dependent variable.


8. Loss: Loss is referred as difference between predicted value and actual value of a single training record. This gives an estimate how far is the predicted value from the actual value.


9. Cost Function: Cost function is the average of all losses across all training samples.


10. Optimization: It is a process of minimizing loss by adjusting weights or parameters. It is achieved by taking partial derivatives of all the weights with respect to the cost function.


11. Parameters: Parameters are the weights associated with each independent variable. These weights are changed with every iteration of optimization

;