According to a recent study, machine learning algorithms are expected to replace 25% of the jobs across the world, in the next 10 years. With the rapid growth of big data and availability of programming tools like Python and R –machine learning is gaining mainstream presence for data scientists. Machine learning applications are highly automated and self-modifying which continue to improve over time with minimal human intervention as they learn with more data. For instance, Netflix’s recommendation algorithm learns more about the likes and dislikes of a viewer based on the shows every viewer watches. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. For beginners who are struggling to understand the basics of machine learning, here is a brief discussion on the top machine learning algorithms used by data scientists.
A machine learning algorithm can be related to any other algorithm in computer science. An ML algorithm is a procedure that runs on data and is used for building a production-ready machine learning model. If you think of machine learning as the train to accomplish a task then machine learning algorithms are the engines driving the accomplishment of the task. Which type of machine learning algorithm works best depends on the business problem you are solving, the nature of the dataset, and the resources available at hand.
Machine Learning algorithms are classified as –
Machine learning algorithms that make predictions on a given set of samples. Supervised machine learning algorithm searches for patterns within the value labels assigned to data points. Some popular machine learning algorithms for supervised learning include SVM for classification problems, Linear Regression for regression problems, and Random forest for regression and classification problems. Supervised Learning is when the data set contains annotations with output classes that form the cardinal out classes. E.g. In case of sentiment analysis, the output classes are happy, sad, angry etc.
There are no labels associated with data points. These machine learning algorithms organize the data into a group of clusters to describe its structure and make complex data look simple and organized for analysis. Unsupervised learning is where the output classes are undefined. The best example for such a classification is clustering. Clustering groups similar objects/data together, thus forming segregated clusters. Clustering also helps in finding biases in the data set. Biases are inherent dependencies in the data set that links the occurrence of values in some way.
Unsupervised learning is relatively harder, and sometimes the clusters obtained are difficult to understand because of the lack of labels or classes.
Reinforcement Learning steers through learning a real-world problem using rewards and punishments are reinforcements. Ideally, there is a job or activity that needs to be learned or mastered. The model is rewarded if it completes the job and punished when it fails. The problem with Reinforcement Learning is to figure out what kind of rewards and punishment would be suited for the model.
These algorithms choose an action, based on each data point and later learn how good the decision was. Over time, the algorithm changes its strategy to learn better and achieve the best reward.
Recommended Reading:
Common Machine Learning Algorithms Infographic
It would be difficult and practically impossible to classify a web page, a document, an email, or any other lengthy text notes manually. This is where the Naïve Bayes Classifier machine learning algorithm comes to the rescue. A classifier is a function that allocates a population’s element value from one of the available categories. For instance, Spam Filtering and weather forecast are some of the popular applications of the Naïve Bayes algorithm. Spam filter here is a classifier that assigns a label “Spam” or “Not Spam” to all the emails.
Naïve Bayes Classifier is amongst the most popular learning method grouped by similarities, that works on the popular Bayes Theorem of Probability- to build machine learning models particularly for disease prediction and document classification. It is a simple classification of words based on the Bayes Probability Theorem for subjective analysis of content. This classification algorithm uses probabilities using the Bayes theorem. The basic assumption for Naive Bayesian algorithms is that all the features are considered to be independent of each other. It is a very simple algorithm and it is easy to implement. It is particularly useful for large datasets and can be implemented for text datasets.
Bayes theorem gives a way to calculate posterior probability P(A|B) from P(A), P(B) and P(B|A).
The formula is given by: P(A|B) = P(B|A) * P(A) / P(B)
Where P(A|B) is the posterior probability of A given B, P(A) is the prior probability, P(B|A) is the likelihood which is the probability of B given A and P(B) is the prior probability of B.
Learn More About Classification Algorithms
Data Science Libraries in Python to implement Naïve Bayes – Sci-Kit Learn
Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects
K-means is a popularly used unsupervised machine learning algorithm for cluster analysis. K-Means is a non-deterministic and iterative method. The algorithm operates on a given data set through a pre-defined number of clusters, k. The output of the K Means algorithm is k clusters with input data partitioned among the clusters. For instance, let’s consider K-Means Clustering for Wikipedia Search results. The search term “Jaguar” on Wikipedia will return all pages containing the word Jaguar which can refer to Jaguar as a Car, Jaguar as Mac OS version, and Jaguar as an Animal. K Means clustering algorithm can be applied to group the web pages that talk about similar concepts. So, the algorithm will group all web pages that talk about Jaguar as an Animal into one cluster, Jaguar as a Car into another cluster, and so on.
For any new incoming data point, the data point is classified according to its proximity to the nearby classes. Datapoints inside a cluster will exhibit similar characteristics while the other clusters will have different properties. The basic example of clustering would be the grouping of the same kind of customers in a certain class for any kind of marketing campaign. It is also a useful algorithm for document clustering.
The steps followed in the k means algorithm are as follows -
i) The sum of squared distance between the centroid and the data point is computed
ii) Assign each data point to the cluster that is closer to the other cluster
iii) Compute the centroid for the cluster by taking the average of all the data points in the cluster
We can find the optimal number of clusters k by plotting the value of sum squared distance which decreases gradually to come to an optimal number k.
K Means Clustering algorithm is used by most of the search engines like Yahoo, Google to cluster web pages by similarity and identify the ‘relevance rate’ of search results. This helps search engines reduce the computational time for the users.
Data Science Libraries in Python to implement K-Means Clustering – SciPy, Sci-Kit Learn, Python Wrapper
Data Science Libraries in R to implement K-Means Clustering – stats
Support Vector Machine is a supervised machine learning algorithm for classification or regression problems where the dataset teaches SVM about the classes so that SVM can classify any new data. It works by classifying the data into different classes by finding a line (hyperplane) which separates the training data set into classes. As there are many such linear hyperplanes, SVM algorithm tries to maximize the distance between the various classes that are involved and this is referred as margin maximization. If the line that maximizes the distance between the classes is identified, the probability to generalize well to unseen data is increased.
SVM is commonly used for stock market forecasting by various financial institutions. For instance, it can be used to compare the relative performance of the stocks when compared to performance of other stocks in the same sector. The relative comparison of stocks helps manage investment making decisions based on the classifications made by the SVM learning algorithm.
Data Science Libraries in Python to implement Support Vector Machine –SciKit Learn, PyML , SVM^{Struct } Python , LIBSVM
Free access to solved code Python and R examples can be found here (these are ready-to-use for your Data Science and ML projects)
Apriori algorithm is an unsupervised machine learning algorithm that generates association rules from a given data set. Association rule implies that if an item A occurs, then item B also occurs with a certain probability. Most of the association rules generated are in the IF_THEN format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it. For the algorithm to derive such conclusions, it first observes the number of people who bought an iPad case while purchasing an iPad. This way a ratio is derived like out of the 100 people who purchased an iPad, 85 people also purchased an iPad case.
Data Science Libraries in Python to implement Apriori Machine Learning Algorithm – There is a python implementation for Apriori in PyPi
Data Science Libraries in R to implement Apriori Machine Learning Algorithm – arules
Linear Regression algorithm shows the relationship between 2 variables and how the change in one variable impacts the other. The algorithm shows the impact on the dependent variable on changing the independent variable. The independent variables are referred as explanatory variables, as they explain the factors the impact the dependent variable. The dependent variable is often referred to as the factor of interest or predictor. Linear regression is used for estimating real continuous values. The most common examples of linear regression are housing price predictions, sales predictions, weather predictions, employee salary estimations, etc. The basic goal for linear regression is to fit the best line amongst the predictions. The equation for linear regression is Y=a*x+b where y is the dependent variable and x is the set of independent variables. a is the slope and b is the intercept.
The best example from human lives would be the way a child would solve a simple problem like - ordering the children in class height orderwise without asking the heights of the children. The child will be able to solve this problem by visually looking at the heights of the children and subsequently arranging them height-wise. This is how you can perceive linear regression in a real-life scenario. The weights which are the heights and the build of the children have been learnt by the child gradually. Looking back at the equation, a and b are the coefficients that will be learned through the regression model by minimizing the sum of squared errors in the model values.
The graph below shows the relation between the number of umbrellas sold and the rainfall in a particular region -
Linear Regression finds great use in business, for sales forecasting based on the trends. If a company observes steady increase in sales every month - a linear regression analysis of the monthly sales data helps the company forecast sales in upcoming months.
Linear Regression helps assess risk involved in insurance or financial domain. A health insurance company can do a linear regression analysis on the number of claims per customer against age. This analysis helps insurance companies find, that older customers tend to make more insurance claims. Such analysis results play a vital role in important business decisions and are made to account for risk.
Data Science Libraries in Python to implement Linear Regression – statsmodel and SciKit
Data Science Libraries in R to implement Linear Regression – stats
Explanations about the top machine learning algorithms will continue, as it is a work in progress. Stay tuned to our blog to learn more about the popular machine learning algorithms and their applications!!!
The name of this algorithm could be a little confusing in the sense that the Logistic Regression machine learning algorithm is for classification tasks and not regression problems. The name ‘Regression’ here implies that a linear model is fit into the feature space. This algorithm applies a logistic function to a linear combination of features to predict the outcome of a categorical dependent variable based on predictor variables.
The odds or probabilities that describe the outcome of a single trial are modeled as a function of explanatory variables. Logistic regression algorithms help estimate the probability of falling into a specific level of the categorical dependent variable based on the given predictor variables.
Just suppose that you want to predict if there will be a snowfall tomorrow in New York. Here the outcome of the prediction is not a continuous number because there will either be snowfall or no snowfall and hence linear regression cannot be applied. Here the outcome variable is one of the several categories and using logistic regression helps.
Let us consider a simple example where a cake manufacturer wants to find out if baking a cake at 160°C, 180°C and 200°C will produce a ‘hard’ or ‘soft’ variety of cake ( assuming the fact that the bakery sells both the varieties of cake with different names and prices). Logistic regression is a perfect fit in this scenario instead of other statistical techniques. For example, if the manufacturers produce 2 cake batches wherein the first batch contains 20 cakes (of which 7 were hard and 13 were soft ) and the second batch of cake produced consisted of 80 cakes (of which 41 were hard and 39 were soft cakes). Here in this case if a linear regression algorithm is used it will give equal importance to both the batches of cakes regardless of the number of cakes in each batch. Applying a logistic regression algorithm will consider this factor and give the second batch of cakes more weightage than the first batch.
The Data Science libraries in Python language to implement Logistic Regression Machine Learning Algorithm is Sci-Kit Learn.
The Data Science libraries in R language to implement Logistic Regression Machine Learning Algorithm is stats package (glm () function)
Free access to solved code Python and R examples can be found here (these are ready-to-use for your Data Science and ML projects)
Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualization
You are making a weekend plan to visit the best restaurant in town as your parents are visiting but you are hesitant in making a decision on which restaurant to choose. Whenever you want to visit a restaurant you ask your friend Tyrion if he thinks you will like a particular place. To answer your question, Tyrion first has to find out, the kind of restaurants you like. You give him a list of restaurants that you have visited and tell him whether you liked each restaurant or not (giving a labelled training dataset). When you ask Tyrion that whether you will like a particular restaurant R or not, he asks you various questions like “Is “R” a roof top restaurant?” , “Does restaurant “R” serve Italian cuisine?”, “Does R have live music?”, “Is restaurant R open till midnight?” and so on. Tyrion asks you several informative questions to maximize the information gain and gives you YES or NO answer based on your answers to the questionnaire. Here Tyrion is a decision tree for your favourite restaurant preferences.
A decision tree is a graphical representation that makes use of branching methodology to exemplify all possible outcomes of a decision, based on certain conditions. In a decision tree, the internal node represents a test on the attribute, each branch of the tree represents the outcome of the test and the leaf node represents a particular class label i.e. the decision made after computing all of the attributes. The classification rules are represented through the path from root to the leaf node.
Classification Trees- These are considered as the default kind of decision trees used to separate a dataset into different classes, based on the response variable. These are generally used when the response variable is categorical in nature.
Regression Trees-When the response or target variable is continuous or numerical, regression trees are used. These are generally used in predictive type of problems when compared to classification.
Decision trees can also be classified into two types, based on the type of target variable- Continuous Variable Decision Trees and Binary Variable Decision Trees. It is the target variable that helps decide what kind of decision tree would be required for a particular problem.
Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro
The Data Science libraries in Python language to implement Decision Tree Machine Learning Algorithm are – SciPy and Sci-Kit Learn.
The Data Science libraries in R language to implement Decision Tree Machine Learning Algorithm is caret.
Let’s continue with the same example that we used in decision trees, to explain how Random Forest Machine Learning Algorithm works. Tyrion is a decision tree for your restaurant preferences. However, Tyrion being a human being does not always generalize your restaurant preferences with accuracy. To get more accurate restaurant recommendation, you ask a couple of your friends and decide to visit the restaurant R, if most of them say that you will like it. Instead of just asking Tyrion, you would like to ask Jon Snow, Sandor, Bronn and Bran who vote on whether you will like the restaurant R or not. This implies that you have built an ensemble classifier of decision trees - also known as a forest.
You don’t want all your friends to give you the same answer - so you provide each of your friends with slightly varying data. You are also not sure of your restaurant preferences and are in a dilemma.You told Tyrion that you like Open Roof Top restaurants but maybe, just because it was summer when you visited the restaurant you could have liked it then. You may not be a fan of the restaurant during the chilly winters. Thus, all your friends should not make use of the data point that you like open roof top restaurants, to make their recommendations for your restaurant preferences.
By providing your friends with slightly different data on your restaurant preferences, you make your friends ask you different questions at different times. In this case just by slightly altering your restaurant preferences, you are injecting randomness at model level (unlike randomness at data level in case of decision trees). Your group of friends now form a random forest of your restaurant preferences.
Random Forest is the go to machine learning algorithm that uses a bagging approach to create a bunch of decision trees with random subset of the data. A model is trained several times on random sample of the dataset to achieve good prediction performance from the random forest algorithm.In this ensemble learning method, the output of all the decision trees in the random forest, is combined to make the final prediction. The final prediction of the random forest algorithm is derived by polling the results of each decision tree or just by going with a prediction that appears the most times in the decision trees.
For instance, in the above example - if 5 friends decide that you will like restaurant R but only 2 friends decide that you will not like the restaurant then the final prediction is that, you will like restaurant R as majority always wins.
Data Science libraries in Python language to implement Random Forest Machine Learning Algorithm is Sci-Kit Learn.
Data Science libraries in R language to implement Random Forest Machine Learning Algorithm is randomForest.
The human brain has a highly complex and non-linear parallel computer that can organize the structural constituents i.e. the neurons interconnected in a complex manner between each other. Let us take a simple example of face recognition-whenever we meet a person, a person who is known to us can be easily recognized with his name or he works at XYZ place or based on his relationship with you. We may be knowing thousands of people, the task requires the human brain to immediately recognize the person (face recognition). Now, suppose instead of the human brain doing it, if a computer is asked to perform this task. It is not going to be an easy computation for the machine as it does not know the person. You have to teach the computer that there are images of different people. If you know 10,000 people then you have to feed all the 10,000 photographs into the computer. Now, whenever you meet a person you capture an image of the person and feed it to the computer. The computer matches this photograph with all the 10,000 photographs that you have already fed into the database. At the end of all the computations-it gives the result with the photograph that best resembles the person. This could take several hours or more depending on the number of images present in the database. The complexity of the task will increase with the increase in the number of images in the database. However, a human brain can recognize it instantly.
Can we recognize this instantly using a computer? Is it that the computation capability that exists in humans different from the computers? If you consider the processing speed of a silicon IC it is of the order of 10^{-9 }(order of nanoseconds) whereas the processing speed of a human neuron is 6 times slower than typical IC’s i.e. 10^{-3 }(order of milliseconds). In that case, there is a puzzling question then how is that the processing time of the human brain faster than that of a computer. Typically, there are 10 billion neurons with approximately 60 trillion interconnections inside the human brain but still, it processes faster than the computer. This is because the network of neurons in the human brain is massively parallel.
Now the question is that is it possible to mimic the massively parallel nature of the human brain using computer software. It is not that easy as we cannot really think of putting so many processing units and realizing them in a massively parallel fashion. All that can be done within a limitation is interconnecting a network of processors. Instead of considering the structure of a human brain in totality, only a very small part of the human brain can be mimicked to do a very specific task. We can make neurons but they will be different from the biological neuron of the human brain. This can be achieved using Artificial Neural Networks. By artificial we inherently mean something that is different from the biological neurons. ANN’s are nothing but simulated brains that can be programmed the way we want. By defining rules to mimic the behavior of the human brain, data scientists can solve real-world problems that could have never been considered before.
It is a subfield of artificial intelligence which is modeled after the brain. It is a computational network consisting of neurons interconnected to each other. This interconnected structure is used for making various predictions for both regressions as well as classification problems. The ANN consists of various layers - the input layer, the hidden layer, and the output layers. The hidden layers could be more than 1 in number. The hidden layer is the place where all the mathematics of the neural network takes place. The basic formulas of weights and biases are added here along with the application of the activation functions. These activation functions are responsible for delivering the output in a structured and trimmed manner. It is majorly used for solving non-linear problems - handwriting recognition, traveling salesman problems, etc. ANNs involves complex mathematical calculations and are highly compute-intensive in nature.
Imagine you are walking on a walkway and you see a pillar (assume that you have never seen a pillar before). You walk into the pillar and hit it. Now, the next time whenever you see a pillar you stay a few meters away from the pillar and continue walking on the side. This time your shoulder hits the pillar and you are hurt again. Again when you see the pillar you ensure that you don’t hit it but this time on your path you hit a letter-box (assuming that you have never seen a letter-box before). You walk into it and the complete process repeats again. This is how an artificial neural network works, it is given several examples and it tries to get the same answer. Whenever it is wrong, an error is calculated. So, next time for a similar example the value at the synapse (weighted values through which neurons are connected in the network) and neuron is propagated backward i.e. backpropagation takes place. Thus, an ANN requires lots of examples and learning and they can be in millions or billions for real-world applications.
Recommended Reading: Types of Neural Networks
It is very difficult to reverse engineer artificial neural network algorithms. If your ANN machine learning algorithm learns that the image of a dog is actually a cat then it is very difficult to determine “why”. All than can be done is continuously tweak or train the ANN further.
Artificial Neural Networks are among the hottest machine learning algorithms in use today solving problems of classification to pattern recognition. They are extensively used in research and other application areas like –
KNN is the simplest classification algorithm. It is also used for the prediction of continuous values like regression. Distance-based measures are used in K Nearest Neighbors to get the closest correct prediction. The final prediction value is chosen on the basis of the k neighbors. The various distance measures used are Euclidean, Manhattan, Minkowski, and Hamming distance. The first three are continuous functions while Hamming distance is used for categorical variables. Choosing the value of K is the most important task in this algorithm. It is often referred to as the lazy learner algorithm.
Image Credit: medium.com
As shown in the diagram above, the distances from the new point are calculated with each of the classes. Lesser the distance, accordingly the new point will be assigned to the class that is closer to the point.
Gradient Boosting Classifier uses the boosting methodology where the trees which are created follow the decision tree method with minor changes. The weak learners from every tree are subsequently given more weightage and given to the next tree in succession so that the predictions for the trees are improved versions from the previous ones. It uses the weighted average for calculating the final predictions. Boosting is used when we have a large amount of data with high predictions.
XgBoost is an advanced implementation of gradient boosting algorithms. It is different from gradient boosting in its calculations as it applies the regularization technique internally. Xgboost is referred to as regularized boosting technique
Pros
Cons
CatBoost is an open-source gradient boosting library which is used for training large amounts of data using ML. It supports the direct usage of categorical variables. It gives a very high performance in comparison to the other boosting algorithms. It is very easy to implement and run. It is a model developed by Yandex. It provides support for out-of-the-box descriptive data formats and it does not require much training. It gives a good performance with a lesser number of training iterations.
LightGBM is a gradient boosting framework that uses decision tree algorithms. As the name suggests, its training speed is very fast and can be used for training large datasets.
Pros
Cons
Linear Discriminant Analysis or LDA is a machine learning algorithm that provides an indirect approach to solve a classification machine learning problem. To predict the probability, P_{n}(X) that a given feature, X belongs to a given class Y_{n} or not, it assumes a density function of all the features that belong to that class. It then uses this density function, f_{n}(X) to predict the probability P_{n}(X) using
where π_{n }is the overall or prior probability that a randomly picked observation belongs to n^{th }class.
Let the dataset have only feature variable, then the LDA assumes a Gaussian distribution function for f_{n}(X) having a class-specific mean vector (ðœ‡_{n}) and a covariance matrix that is applicable for all N classes. After that to assign a class to an observation from the test dataset, it evaluates the discriminant function
The LDA classifier then predicts the class of the test variable for which the value of the discriminant function is the largest. We call the machine algorithm as “linear” discriminant analysis because, in the discriminant function, the component functions are all linear functions of x.
Note: In the case of more than one variable, LDA assumes a multivariate gaussian function and the discriminant function is scaled accordingly.
Advantages of Linear Discriminant Analysis Machine Learning Algorithm
1. It works well for machine learning problems where the classes to be assigned are well-separated.
2. If the number of feature variables in the given dataset is small and fits the normal distribution well, the LDA provides stable results.
3. It is easy to understand and simple to use.
Disadvantages of Linear Discriminant Analysis Machine Learning Algorithm
1. It requires the feature variables to follow the Gaussian distribution and thus has limited applications.
2. It does not perform very well on datasets having a small number of target variables.
Applications of Linear Discriminant Analysis Machine Learning Algorithm
Classifying the Iris Flowers: The famous Iris Dataset contains four features (sepal length, petal length, sepal width, petal width) of three types of Iris flowers. You can use LDA to classify these flowers based on the given four features.
This machine-learning algorithm is similar to the LDA machine learning algorithm that we discussed above. Similar to LDA, the QDA algorithm assumes that feature variables belonging to a particular class obey the gaussian distribution function and utilizes Bayes’ theorem for predicting the covariance matrix.
However, in contrast to LDA, QDA presumes that each class in the target variables has its covariance matrix. And, if one implements this assumption to evaluate the expression for the discriminant function, they will obtain a few quadratic functions of x in the formula. Hence, the word linear is replaced by quadratic. Learn how to implement this algorithm
Advantages of Quadratic Discriminant Analysis Machine Learning Algorithm
1. It performs well for machine learning problems where the size of the training set is large.
2. QDA advises for machine learning problems where the feature variables in the given dataset clearly don’t seem to have a common covariance matrix for N classes.
3. It helps in deducing the quadratic decision boundary.
4. It servers as a good compromise between the KNN, LDA, and Logistic regression machine learning algorithms.
5. It gives better results when there is non-linearity in the feature variables.
Disadvantages of Quadratic Discriminant Analysis Machine Learning Algorithm
1. The results are greatly affected if the feature variables do not obey the gaussian distribution function.
2. It performs very well on datasets having feature variables are that are uncorrelated.
Applications of Quadratic Discriminant Analysis Machine Learning Algorithm
Classification of Wine: Yes, one can use the QDA machine learning algorithm to learn how to classify wine with Python’s sklearn library. Check out our free recipe: How to classify wine using sklearn LDA and QDA model? know more.
Perform QDA on Iris Dataset: You can use the Iris Dataset to understand not only the LDA machine learning algorithm but the QDA machine learning algorithm as well. To explore how to do that in detail, check How does Quadratic Discriminant Analysis work?.
Principal components are the selected feature variables in a large dataset that allow the presentation of almost all the significant information contained in the dataset through a smaller number of feature variables.
And, principal component analysis (PCA) is the method by which these principal components are evaluated, and using them to better understand the data. It is an unsupervised machine learning algorithm and thus doesn’t require the input data to have target values. Here is a full PCA (Principal Component Analysis) Machine Learning Tutorial that you can go through if you want to learn how to implement PCA to solve machine learning problems.
Advantages of Principal Component Analysis Machine Learning Algorithm
1. It can be used for visualizing the dataset and can thus be implemented while performing Exploratory Data Analysis.
2. It is an excellent unsupervised learning method when one is working with large datasets as it removes correlated feature variables
3. It assists in saving on computation power.
4. It reduces the chances of overfitting a dataset.
Disadvantages of Principal Component Analysis Machine Learning Algorithm
1. PCA requires its users to normalize their feature variables before implementing it to solve data science problems.
2. As only a subset of feature variables is selected to understand the dataset, it is likely that the information obtained will be incomplete.
Applications of Principal Component Analysis Machine Learning Algorithm
Below, we have listed two easy applications of PCA for you to practice.
Perform PCA on Digits Dataset: Python’s sklearn library has an inbuilt dataset ‘digits’ that you can use to understand the implementation of the PCA machine learning algorithm. Check out our free recipe: How to reduce dimensionality using PCA in Python? to know more.
Apply PCA on Breast Cancer Dataset: Python’s sklearn library has another dataset that contains data related to breast cancer. Check out our free recipe: How to extract features using PCA in Python? To learn how to understand how to implement PCA on the breast cancer dataset.
GAMs is a polished and relatively more flexible version of the multiple linear regression machine learning model. That is because they support using non-linear functions of each of the feature variables and still reflect additivity.
For regression problems, GAMs include the use of formulae like the one given below for predicting target variable, y given the feature variable (x_{i}) :
y_{i} = _{ } êžµ_{0} + f_{1|}(x_{i1}) + f_{2}(x_{i2}) + f_{3}(x_{i3}) + …+ f_{p}(x_{ip}) + ðœ–_{i}
Where ðœ–_{i }represents the error terms. GAMs are additive as separate functions are evaluated for each feature variable and are then added together. For classification problems, GAMs extend logistic regression to evaluate the probability,
p(x) of whether a given feature variable (x_{i}) is an instance of a class (y_{i}) or not.The formula is given by,
log(p(x) / (1-p(x)) = êžµ_{0} + f_{1}(x_{1}) + f_{2}(x_{2}) + f_{3}(x_{3}) + …+ f_{p}(x_{ip}) + ðœ–_{i}
Advantages of General Additive Models Machine Learning Algorithm
1. GAMs can be used for both classification and regression problems.
2. They allow modeling non-linear relationships easily as they require their users to manually carry out different transformations on each variable individually.
3. Non-linear predictions made using GAMs are relatively accurate.
4. Individual transformations on each feature variable lead to drawing insightful conclusions about each variable in the dataset.
5. They are a practical compromise between linear and fully nonparametric models.
Disadvantages of General Additive Models Machine Learning Algorithm
1. The model is restricted to be additive and does not support complex interactions among feature variables.
Applications of General Additive Models Machine Learning Algorithm
You can use pyGAM library in Python to explore GAMs.
This algorithm is an extension of the linear regression machine learning model and instead of assuming a linear relation between feature variables (x_{i}) and the target variable (y_{i}), it uses a polynomial expression to describe the relationship. The polynomial expression used is given by
where ðœ–_{i }represents the error term and d is the degree of the polynomial. If one uses a large value for d, this algorithm supports estimating non-linear relationships between the feature variables and the target variables.
Advantages of Polynomial Regression Machine Learning Algorithm
1. It offers a simple method to fit non-linear data.
2. It is easy to implement and is not computationally expensive
3. It can fit a varied range of curvatures.
4. It makes the pattern in the dataset more interpretable.
Disadvantages of Polynomial Regression Machine Learning Algorithm
1. Using higher values for the degree of the polynomial leads to supports overly flexible predictions and leads to overfitting.
2. It has a high sensitivity for outliers.
3. It is difficult to predict what degree of the polynomial should be chosen for fitting a given dataset.
Applications of Polynomial Regression Machine Learning Algorithm
Use Polynomial Regression for Boston Dataset: Python’s sklearn library has the Boston Housing dataset that has 13 feature variables and 1 target variable. One can use Polynomial regression to use the 13 variables to predict the median value of the price of the houses in Boston. If you are curious about how to realize this in Python, check How and when to use polynomial regression?
Recommended Reading