Artificial Intelligence is about working with images, audios, videos and text. Cloud providers like AWS, Azure, GCP (Google), IBM have powerful ready to use APIs which can do work worth of weeks in hours.
Why Artificial Intelligence
We know that there is humungous amount of data getting generated every minute for eg, retail payments, GPS, photos, blogs, videos, e-commerce, investments, insurance, healthcare, accounting, logistics, utilities and much more. This data is in the form of images, videos, audio, text, blogs, etc through social medium like Facebook, LinkedIn, Twitter, Quora, Stack-overflow, Instagram, Youtube and many other. These fundamental data elements can be used with the help of complex algorithms to create intelligent solutions. For eg, broken car images can be used to create a solution where a customer will just upload its broken car image and the claim will get approved. Another example is to collect feedback and comments from twitter on real time and create a pipeline which alerts specific organizations about their reputation. This can be so powerful that any organization can take on time decisions and will have a huge positive impact on customer satisfaction.
Definition
There are many technical definitions available online however in my view, Artificial Intelligence is a technical solution where we replicate 2 human senses namely eyes (images/text) and speech (voice/text). If we compare these senses to technology world, we can associate Eyes with Computer Vision and Speech with Natural Language Processing.
According to Jeremy Achin, CEO of DataRobot, an AI is a computer system that is able to perform tasks that ordinarily require human intelligence. These artificial intelligence systems are powered by deep learning (subset of machine learning) fundamentally.
AI is an interdisciplinary science with multiple approaches, but advancements in machine learning and deep learning are creating a paradigm shift in virtually every sector of technology.
Technology / Algorithms
Artificial Intelligence is a theory which works on the concept of deep learning, which is a subset of machine learning.
Deep Learning is called deep learning because it forms multiple layers of learning (we will explain layers later in this section). For eg, input goes to layer A and output of layer A is fed to layer B and so on until the ultimate output is generated. In deep learning algorithms, we define the number of layers (for eg 32 or 64 or 50) before the output is generated. Deep learning is achieved with the help of complex algorithms like Convolutional Neural Networks, Recurrent Neural Networks, Long Short term Memory Units, Multi-layer perceptron, etc. Neural Networks is a category of algorithms which aid in deep learning and fundamental to AI.
Relevance of Neural Network
The terminology of Neural network is derived from human brain where there are approx 86 billion neurons connected with each other and set of neurons are responsible to send signals from organs to brain and vice versa.
The way neurons in brain carry information back and forth, neurons do the same in a neural network as in carrying data back and forth finally helping to generate output to the output layer. This is the reason that these algorithms are categorized as “Neural Networks”.
1. Training Data: Whether it is machine learning or deep learning, it is inevitable that data forms the input. This means that for any AI system also, training data becomes quite important. For eg, if we want to read MRI scans and assess if any new scan has a disease or not, we will have to arrange for thousands of MRI scans, feed them in the neural network so that it can create a model. This model now will be able to predict whether the newly fed MRI scan has disease or not. The accuracy of this model will depend on variety and number of initial scans or training data being fed (of course there are other factors as well but primarily training data is critical).
2. Neural Network: As explained above, neural networks is a category of algorithms which aid in deep learning. Neural networks are usually sequential which means that output of one layer is the input of another layer. Non sequential neural networks do exist but we may add details sometime later in future. Continuing on neural network, training data for the input layer of neural network
3. Layers: As mentioned above, layers form the structure of neural networks where data travels. The algorithm starts from input layer, moves to next layer and then to next and so on until it reaches the final output layer. The process doesnt end here as then the process of back-propagation starts (explained below)
4. Hidden Layers: All layers between output and input layers are called hidden layers.
5. Neurons: A business problem is comprised of either an image or collection of sentences or audio clip which has to be either segmented or classified. These data types (image, audio or text) are broken into binary values (0s and 1s) which are assigned to neurons. Every neuron (to begin with) will have values of single input (image or audio or video or text word/sentence). Every layer has numerous neurons which undergo mathematical functions (explained as activation function later in this section) at each layer to finally form the output.
6. Features: It is important to note that in a machine learning algorithm, features of columns of data are pre-defined however in deep learning, features are created on real time basis the patterns it observe in training data. Continuing from above, data (in the form of neurons) travel across layers and features are formed at each layer. When I say, features are formed, I mean that prominent areas in an image or important recurring words in a sentence get highlighted. This means they start getting higher numeric values as compared to parts of image or words in sentence which do not have relevance.
7. Weights: Weights are the set of values which get evolved (usually initialized to values closer to zeros like 0.01) with every iteration of optimization (explained below). These values get updated during the backward propagation (makes sense to read forward propagation and then read backward propagation to understand it).
8. Forward Propagation: While moving from 1 layer to next layer, we calculate weights and bias. Image that we have 4 layers in total (1 input, 2 hidden layers and 1 output). If there are 3 neurons in input layer (layer 1) and 4 neurons in 01st hidden layer (layer 2), then there will be 12 (3 X 4) weights (+ 3 bias weights, equivalent to # of previous layer’s neurons) until layer 2. And, if there are 6 neurons in layer 3, there were be 24 weights hitting layer 3 (+ 4 bias weights, equivalent to # of previous layer’s neurons). Assuming that there will be 2 neurons in final output layer (layer 4), there will be 12 weights hitting layer 4 (+ 6 bias weight, equivalent to # of previous layer’s neurons). Read next section of how these weights get updated .
9. Activation function: it is important to note that weights and inputs from previous layer gets into a dot product and then bias weights are added. The output is then fed into an activation function which basically outputs a number. This mathematical operation is performed at each neuron at each layer. Examples of these activation functions are sigmoid, tanh, ReLu, Maxout, ELU and few more.
10. Backward Propagation (including Loss and Cost Function): At the final layer, Loss is calculated, which is difference between predicted value and actual value of a single training record. This gives an estimate how far is the predicted value from the actual value. Cost function is the average of all losses across all training samples. With this value, reverse pass of the same first iteration (after forward propagation) starts, called backward propagation. We take partial derivative of cost function with respect to the weight of neuron of layer 3 hitting neuron of final layer. This partial derivative will help in finding the change of cost function with every step change of this weight. Similarly, we calculate partial derivatives of all weights while we move backward. Once, we have partial derivative for each neuron at layer 3, we update weights of layer 3 by subtracting partial derivative from original weights (of course after multiplication of a constant with partial derivative). Similarly, we repeat this process from layer 3 to layer 2 and when we reach layer 1, it is then called 1 iteration. We may do 100 or 500 similar iterations (depending on complexity of problem) to finally reach a situation when weights stop changing (which means that we have reached global minima).
11. Optimization: Optimization means to reduce the cost function so that our predicted values are closer to actual values. This will mean that model has created a mathematical equation which can be applied to new data set also. Optimization means to take partial derivatives of all features with respect to cost function