Data Science on Everardo Shain

US Children Adoption Statistical Inference

Thu, 23 Nov 2023 08:06:25 +0600

Overview

Project focused on understanding behaviors on the United States children adoption using a dataset from Centers for Disease Control and Prevention, where a total of 3 hypotheses were tested using R on RStudio. My team and I performed some data cleaning to avoid missing values and separate our variables of interest to them visualize them with bar plots and pie charts. For all the hypotheses we identified the Independent 2-group Mann-Whitney U Test as the best choice and performed it, then we reinforced our analysis by applying a parametric bootstrapping and power calculation where all 3 hypotheses got a good power greater than 80%.

IMDB Sentiment Analysis

Fri, 03 Jun 2022 08:06:25 +0600

Overview

This was a group project focused on the development of a sentiment analysis pipeline for IMDB movie reviews using deep learning–based natural language processing techniques. A Convolutional Neural Network (CNN) was trained to classify reviews as positive or negative, while also generating fixed-length semantic embeddings. It included dataset preprocessing, model training with pretrained word embeddings, and an inference pipeline capable of exporting 300-dimensional feature vectors from raw text inputs.

IoT Sensor Data Classification

Fri, 29 Oct 2021 08:06:25 +0600

Overview

This was an IoT project in which I implemented two machine learning algorithms, K-Nearest Neighbors (KNN) and Decision Tree, in Python to classify data collected from an MMA7361 accelerometer and a DHT11 temperature/humidity sensor connected to a NodeMCU ESP32 microcontroller. The sensors were programmed and tested using Arduino IDE, with data transmitted both through wired serial communication and wirelessly via MQTT using Mosquitto and OpenSSL for secure transfer. I organized the workflow on Jupyter Notebook, where I processed the collected data, labeled different scenarios (dark room, sunny room, bathroom, sensor movement types), and split it into training and testing sets. Both algorithms achieved good results, with accuracies above 97%, including perfect classification with the Decision Tree on the accelerometer dataset.