Data Science, Machine Learning

Introduction to Machine Learning

The word “machine learning” sounds like a machine with robot appearance learning something. Actually, machine learning is very related to the user feeding large amount of training data into the machine to learn. The machine then will learn the pattern of the data and, as a result, can understand the data pattern and create a model. The model from machine learning basically can classify, cluster, and predict test data according to the training data.

There are three kinds of machine learning, supervised learning, unsupervised learning, and reinforcement learning. This article discusses supervised and unsupervised learning only.  Supervised learning can classify or predict test data from labeled training data. Supervised learning learns the labels of training dataset to classify or predict new dataset according to the variables. Supervised learning can do classification and regression. If the label is categorical, it is called classification. If the label is continuous number, it called regression.

Illustration of machine learning

Now, let’s imagine that machine learning is a kid and we want to teach a kid about how to identify animals. First, we show the kid ten different pictures of monkey and tell him that those are monkeys. The kid will learn to recognize different monkeys with their similarities, such as brown color, two arms, found in forest, and others. Next, show the kid 10 pictures of bird and tell him that those are birds. The kid will, again, learn about bird from the ten different birds, but they have similar characteristics. Birds have wings, no arm, and colorful feather. Now, we can test the kid whether he can identify the eleventh monkey and bird.

Examples of supervised learning are

  • predicting population growth according to predictors, such as current population number, number of female population, and population age (regression).
  • predicting economic growth according to economic parameters, like income, population number, and living expense (regression).
  • classifying land cover type into vegetation, soil, water body, and agriculture according to spectral reflectance, and
  • classifying type of customer into satisfied, neutral, or dissatisfied according to their opinions in survey.
Illustration of supervised learning

Unsupervised learning finds pattern similarity in the variables to group unlabeled dataset into clusters within large number of data. This will simplify the process of analyzing the data. Unsupervised learning also can be done by dimensionality reduction. This is to simplify dataset with many dimensions or variables by finding which dimensions/variables have high correlation to each other or one another.

Now, let’s imagine machine learning is a kid learning. Give a kid 20 pictures of fruit. Do not tell the kid the fruits name. This part is what make unsupervised learning to be different from supervised learning. We do not tell the kid the fruit names. Let the kid learn himself to categorize the fruits according to their similarities.

Fruits to cluster

The kid will firstly separate fruits with green color and not rounded-shape with the other fruits. Next, the rest fruits will be separated again according their color.

Examples of unsupervised learning are

  • customer segmentation according to the behavior,
  • clustering water quality dataset according to the parameters (ion content), and
  • grouping stocks into according to their temporal price.

Unlike supervised learning, unsupervised learning is applied to dataset of which the label or cluster name is not yet known. If supervised learning classifies customer into 3 classes, “satisfied”, “neutral”, and “dissatisfied”, unsupervised learning divides customers into a number of classes, but the number of the classes is not yet decided. The class labels are also not yet identified.

Illustration of unsupervised learning

Example or of Machine Learning are shown below. We will discuss these methods in other articles.

K Nearest Neighbors (kNN)Supervised LearningClassification and Regression
Decision Tree/Classification and Regression Tree (CART)Supervised LearningClassification and Regression
Random ForestSupervised LearningClassification and Regression
Support Vector MachineSupervised LearningClassification and Regression
Gradient BoostingSupervised LearningClassification and Regression
Naïve BayesSupervised LearningClassification
Linear RegressionSupervised LearningRegression
Logistic RegressionSupervised LearningRegression
K-meansUnsupervised LearningClustering
Hierarchical ClusteringUnsupervised LearningClustering
Principal Component Analysis (PCA)Unsupervised LearningDimensionality Reduction
t-SNEUnsupervised LearningDimensionality Reduction
Non-Negative Matrix Factorization (N-NMF)Unsupervised LearningDimensionality Reduction
Exploratory Factor Analysis [EFA]Unsupervised LearningDimensionality Reduction
and others. . .
Summary

Leave a comment