Data Science, Geographic Information System (GIS), Machine Learning, Remote Sensing

My Work Samples Portfolio

This page is meant to store work samples to use as cheat sheets. Thus, the following collection of notebooks are simple and can be modified following required use cases.

Overall, the work samples cover my skill sets, including:

  • delivering meaningful data-driven insights to support business goals,
  • automating data processing (with python),
  • data analysis (tabular, time series, text/NLP, and image),
  • descriptive and inferential statistical analysis,
  • GIS or spatial data analysis
  • data visualization and dashboard development,
  • Machine Learning modeling (regression, classification, clustering, dimensionality reduction, time series forecasting, recommender engine)
  • Deep Learning or Artificial Intelligence (regression and classification with MLP, image classification with CNN, time series forecasting with LSTM, text classification with LSTM)
  • web application development,
  • developing APIs,
  • Large Language Model (LLM),
  • Diffusion (Image Generation) etc.

Highlights:

Task GroupTasksDescriptionNotebook/Repo
Large Language Model (LLM)Retrieval-Augmented Generation (RAG)Develop RAG to enhance LLMs with custom documents. Streamlit chatbot as the UIarticle, repository
Deep LearningImage classification with CNN, Multi-label classificationDeveloped image classification model using CNN to recognize buildings, forest, glacier, mountain, sea, and street images.CNN
Deep LearningTime series forecasting with vanilla LSTM, stacked LSTM, bidirectional LSTM, CNN LSTM, and Conv LSTMForecast carbon monoxide emission using LSTM and did time series analysis.Time series
Deep LearningText Classification with Dense, LSTM, Bi-LSTM, GRU, CNN, CNN + GRUDeveloped text classification model to distinguish tweets into 4 emotions: joy, sadness, anger, and fear.Text classification
Supervised LearningSupervised Learning for Remote SensingPredicted the spatial distribution of land cover using Remote Sensing/satellite data. Published the result on a web app.ML + Remote sensing, web app
NLPNLP and Sentiment AnalysisPerformed NLP analysis and text regression for sentiment analysis.article, part 1, part 2
Table 1 Favorite notebooks

Others:

Task GroupTasksDescriptionNotebook/Repo
Supervised LearningRegressionPredicted house prices with various regression algorithms.Regression
Supervised LearningBinary classificationPredicted survival rate in titanic using various classification algorithms.Binary Classification
Supervised LearningBinary classification (with probability)Predicted high traffic probability using the metrics of AUC, accuracy, and F1-score.Binary Classification
Supervised LearningMulti-class classificationPredicted household poverty as a multi-class classification problem.Multi-class Classification
Supervised LearningImbalanced classificationPredicted whether an employee was a best performer as an imbalanced classification task.Imbalanced
Supervised LearningBayesian Optimization: bayes_opt or fminComparing the libraries bayes_opt and fmin to perform Bayesian optimization for hyperparameter-tuning.Bayesian Optimization
Supervised LearningSupervised Learning for Remote SensingPredicted the spatial distribution of land cover using Remote Sensing/satellite data. Published the result on a web app.ML + Remote sensing, web app
AutoMLAutoML for RegressionPredicted house prices with various autoML regression algorithms.Part 1, Part 2
AutoMLAutoML for ClassificationPredicted household poverty classes using autoML classification algorithms.Part 1, Part 2
Unsupervised LearningClusteringClustered customer segmentation using k-means and hierarchical clustering.k-means,
hierarchical clustering
Unsupervised LearningGeo-spatial clustering and point pattern analysisSpatial pattern analysis (point/polygon pattern analysis, Spatially Constrained Hierarchical Clustering, etc. ) of e-commerce customers in Brazil.Geo-spatial clustering
Unsupervised LearningDimensionality reduction: PCA with Sagemaker (upcoming)Performed PCA on environmental variables dataset.PCA
Unsupervised LearningAnomaly detection: Random Cut Forest with SagemakerPerformed anomaly detection on daily climate dataset and deployed the model using sagemaker.Random Cut Forest
Time series forecastingTime series forecasting with SARIMAXForecast the cash of ATMs across the time.“not yet published”
Deep LearningImage classification with CNN, Multi-label classificationDeveloped image classification model using CNN to recognize buildings, forest, glacier, mountain, sea, and street images.CNN
Deep LearningTime series forecasting with LSTMForecast carbon monoxide emission using LSTM and did time series analysis.Time series
Deep LearningText Classification with Dense, LSTM, Bi-LSTM, GRU, CNN, CNN + GRU Developed text classification model to distinguish tweets into 4 emotions: joy, sadness, anger, and fear.Text classification
NLPNLP and Sentiment AnalysisPerformed NLP analysis and text regression for sentiment analysis.article, part 1, part 2
Inferential StatisticsInferential Statistics, hypothesis testing, etc. “not yet published”
DashboardShiny DashboardVisualized daily covid cases in dashboard.Shiny Dashboard
DashboardTableau DashboardVisualized spatiotemporal analysis of house prices“not yet published” (upcoming)
Web applicationStreamlitStreamlit as the chatbot interface for an RAG or LLM applicationhttps://github.com/rendy-k/LLM-RAG
APIFAST API“not yet published”
SagemakerSagemaker: classificationDeveloped and deployed loan default probability classification using AWS sagemaker.classification
SagemakerSagemaker: invoke modelDeveloped the API to invoked deployed Machine Learning model.invoke model
SagemakerMulti-model deployment with SagemakerDeployed multi-model on AWS instance.multi-model deployment
SagemakerRecommender systemBuilt and deployed a recommender system to recommend anime titles using Factorization Machine of AWS.recommender system
SagemakerTime series forecastingBuilt and deployed DeepAR to forecast the time series of New Delhi daily weather.Deep AR
Large Language Model (LLM)Develop RAG to enhance LLMs with custom documents. Streamlit chatbot as the UIarticle, repository
Table 2 Notebook collection

In the “Notebook/Repo” column, the URLs will direct to where the notebooks or repositories are stored. Some of them do not have the URLs, but “not yet published”. This means that the notebooks are available in local computer for professional work. They are not yet modified and published.

Data Science, Machine Learning

Machine Learning Notebooks Collection

This post aims to share my Machine Learning notebooks. There are three types of Machine Learning for predicting structured tabular data: (1) supervised learning, (2) unsupervised learning, and (3) reinforcement learning. A supervised learning objective is to build a prediction model from a training dataset to predict an unseen test dataset. Supervised learning can solve regression tasks (for continuous output) and classification tasks (for categorical output). Unsupervised learning aims to learn the dataset patterns to simplify the information by clustering and dimensionality reduction. Cluster analysis groups observations into some clusters according to the similarity of their features. Dimensionality reduction reduces the number of dataset dimensions or features. Previously, I have written a post on basic Machine Learning here.

Continue reading “Machine Learning Notebooks Collection”
Data Science, Machine Learning

Linear Regression (Supervised Machine Learning)

Linear regression in a method in Machine Learning. The same term is also used in Statistics. To read about Machine Learning basic, please find my article here. Linear regression finds relationship between one or more continuous predictor variables and the dependent variable to predict. Simple linear regression has only one predictor or independent variable to predict the dependent variable. Plot the variables and draw a fit line with its distance to data points as small as possible. The distance of the fit line to each data point represents the prediction error.

Below is the data of 20 apples with their mass (gram) and volume (cm3). Now, we want to create a model or formula to estimate the volume of apple according to its mass using linear regression.

Continue reading “Linear Regression (Supervised Machine Learning)”
Data Science, Machine Learning

Introduction to Machine Learning

The word “machine learning” sounds like a machine with robot appearance learning something. Actually, machine learning is very related to the user feeding large amount of training data into the machine to learn. The machine then will learn the pattern of the data and, as a result, can understand the data pattern and create a model. The model from machine learning basically can classify, cluster, and predict test data according to the training data.

There are three kinds of machine learning, supervised learning, unsupervised learning, and reinforcement learning. This article discusses supervised and unsupervised learning only.  Supervised learning can classify or predict test data from labeled training data. Supervised learning learns the labels of training dataset to classify or predict new dataset according to the variables. Supervised learning can do classification and regression. If the label is categorical, it is called classification. If the label is continuous number, it called regression.

Continue reading “Introduction to Machine Learning”