Data Scientist
github / kaggle / linkedin / researchgate
I am a Data Scientist at Resideo with a MSc in Engineering from Tecnológico de Monterrey, one of the most prestigious schools in Mexico. Throughout my Data Science journey, I've developed multiple projects in education, warehouse operations, and business research. If you would like to contact me reach out to me here or through my LinkedIn profile.
August 2020 - June 2022
Lean Manufacturing Program Teaching Assistant (August 2021 - December 2021): Assisted the School of Engineering and Sciences in the Lean Manufacturing Program, where I helped the faculty with the grading of 31 students and developed basic data science activities in Jupyter Notebook. In addition, I developed video explanations to design dashboards in the ArcGIS online platform. Finally, I aided in the writing of a literature review on the topics of "Optimization Models for Logistics Network Design" and "Logistics Network Design using geographic information systems (GIS)."
August 2016 - June 2020
Developed a series of assignments focused on implementing continual improvement, statistics, optimization, and logistics projects.
Obtainment of the Academic Talent Scholarship from Tecnológico de Monterrey, a 60% full scholarship in favor of coursing college in this institution.
Awarded as one of the Best Averages of the generation of the August - December 2019 semester.
Throughout my experience, I've developed various Data Science, Optimization, and Statistics projects, which have helped me attain the following technical skills in Machine Learning.
In addition, I've also been capable of achieving the following coding skills.
In this section, I present some of my projects.
February 2022 - January 2023
Applied seven different machine learning classification models (Logistic Regression, Naïve Bayes, Decision Tree, Support Vector
Machine, Gradient Boosting, XGBoost, and Random Forest) to predict New Delhi students' online class rating (i.e.,
"Excellent," "Poor") through the use of their demographic and behavioral information during COVID-19. Results show
that Naïve Bayes (NB) model was the best for online class rating prediction, with a ROC-AUC of 83.738% and an F1
score of 74.648%.
Moreover, given the factors that appear to be relevant in the ML algorithms, the work proposes an action plan to increase students' online class satisfaction and
facilitate their learning experience. For example, this study recommends incorporating outdoor and indoor activities
that stimulate the students' creativity as strategies for dealing with stress. Additionally, this work discusses
the importance of using technological tools (i.e., Tablets) for students to take their online classes comfortably.
Furthermore, it is necessary to moderate the use of Social Media Platforms to prevent the damaging aspects of mental
health that these suppose for students. Finally, it is also essential to enhance these types of activities for students
from 18 to 22 years old, as they appear to be the most negatively affected by the online class transition.
Research Publication: V. V. Pineda-Romero, C. E. Orozco-Mora and H. G. Ceballos, "Factors to improve online education: A study on the impact of COVID-19 on Delhi students," 2023 Future of Educational Innovation-Workshop Series Data in Action, Monterrey, Mexico, 2023, pp. 1-8, doi: 10.1109/IEEECONF56852.2023.10104773.
Tools: Python with Jupyter Notebooks, NumPy, Matplotlib, Seaborn, Statsmodels, Pandas, Scikit‑learn, XGBoost, LaTeX, SHAP.
September 2022 - December 2023
The catalyst of this project was the environmental data obtained from 15 sensors located in a classroom.
Five measure the temperature, the other five measure humidity, and the rest measure the pressure.
These sensors have been spread over the 5 Tables in the area. Hence, each Table contains a temperature,
humidity, and pressure sensor. This project aims to use such sensors' information to predict the classroom's
occupation level (i.e., Empty, Low, Medium, High).
Throughout this project, we have applied several preprocessing treatments i.e.,
a feature selection analysis to search which sensors were most beneficial for the predictions. Furthermore, we
applied an SVM model to estimate a classroom's four occupancy levels. Results showed that
the methods used to predict occupancy levels achieve an accuracy of at least 93.54%.
Tools: Python with Jupyter Notebooks, Pandas, Numpy, Scikit‑learn, Seaborn, Matplotlib, Imbalanced‑learn, SciPy.
February 2022 - June 2022
This paper is inspired by a manual picking retail company where Unit of Measurement and Load constraints
affect the order-picking process. For this work, we used clustering data-mining techniques to develop
three class-based storage strategies that consider the unit of measurement, weight, and demand of products.
We test these strategies with the information provided by a Mexican retail company that stores over 200
different items. When comparing the current storage strategy with the one proposed in this work, we found
that our proposition enabled a reduction of up to 11.33% in its order-picking traveling distance.
Furthermore, these storage configurations were assessed and compared when implemented with the company's
current S-shape routing heuristic and with a proposed Constrained S-shape routing heuristic. The results
show that the constrained approach reduced up to 45.13% of the company's current order-picking process time.
Tools: Microsoft SQL Server, Python with Jupyter Notebooks, NumPy, Matplotlib, Seaborn, SciPy, Statsmodels, Pandas, Scikit‑learn,
LaTeX.
September 2020 - June 2022
This project was inspired by a Mexican retail company that stores perishable products such as yogurts and other
desserts. These products are sensitive items; thus, pickers must stack them adequately to protect their
physical quality. Hence, this warehouse considers precedence constraints in its order-picking process
(i.e., Unit of Measurement and Load constraints). This work compares the picking sequence currently
used by the company (i.e., S-shape heuristic) against the sequences generated by the optimal solution and
a Genetic Algorithm, which involve precedence constraints. We compared these in terms of traveling distance,
the feasibility of results, and order-picking process time. In addition, we used clustering techniques to
develop three class-based storage assignment configurations to increase the order-picking process efficiency
further. Results show that implementing the Genetic Algorithm and the third configuration of the class-based
storage policy increases order-picking process efficiency by 34.5%.
Tools: Microsoft SQL Server, Python with Jupyter Notebooks, Gurobi Python, Pandas, Scikit‑learn, Numpy, Matplotlib, LaTeX.
February 2022 - March 2022
Analyzed trends, seasonality, and stationarity behaviors from the
Google Mobility Report of Mexico, during COVID-19. Using the information from the analysis, we
developed an Auto-Regressive model that could predict Workplace mobility for a week of November
2020 with an RMSE of 6.32 for the training and 13.44 in the test set.
This quantity means that, on average, the predictions differ from the actual value by only 13 units.
Tools: Python with Jupyter Notebooks, Pandas, Numpy, Matplotlib, Scipy, Scikit‑learn, Seaborn, Statsmodels.
February 2021 - June 2021
Mediation and moderation analyses have been widely implemented in social sciences.
Both techniques use regression analysis for modeling; consequently, these are subjected to assumptions
of linearity, normality, constant variance, and
independence of errors. When these assumptions are violated, models generate distorted
results on the magnitude of effects and the causal relationships. To remediate the appearing inconsistencies,
a common practice is the inclusion of variables that exert specific effects given their
nature. In this project, a case study is analyzed based on the work developed by Mukuka et al. (2021),
where we explore the results of adding moderators to their model. In this work,
we compare the robustness and assumptions fulfillment from the original model with our
proposed model. Results show that the second model increases the coefficient of determination
by approximately 43.96% and decreases Root Mean Square Error (RMSE)
and Mean Absolute Error (MAE) by 7.9 and 6.12 units, respectively. In addition, all regression
analysis assumptions are met.
Tools: R with R Studio, tidyr, dplyr, ggplot2, orcutt, lmtest, olsrr, forecast, caret.
Download resume.