Valeria Pineda

Data Scientist

github / kaggle / linkedin / researchgate

About me

I am a Data Scientist at Resideo with a MSc in Engineering from Tecnológico de Monterrey, one of the most prestigious schools in Mexico. Throughout my Data Science journey, I've developed multiple projects in education, warehouse operations, and business research. If you would like to contact me reach out to me here or through my LinkedIn profile.

Education

Master of Science in Engineering

August 2020 - June 2022

Activities

Developed a series of projects focused on Data Science, Computer Science, and Optimization.
Participated in the IX National Congress of the Mexican Society of Operations Research (CSMIO) with a project focused on generating storage assignment proposals through clustering techniques at warehouses with a Precedence-Constrained Order Picking Process.
Developed a thesis project called "Routing and Storage Assignment for the Precedence-Constrained Order Picking Process," inspired by the precedence-constrained order-picking process of a Mexican retail company. In this project, a Genetic Algorithm was used to generate more efficient picking sequences and three class-based storage configurations were developed (using Machine Learning Clustering techniques) to reduce travelling distance. After implementing both approaches in a set of instances obtained from the company's Warehouse Management System, results show that the order-picking process efficiency was increased by 34.5%.

Research Publication

V. V. Pineda-Romero, C. E. Orozco-Mora and H. G. Ceballos, "Factors to improve online education: A study on the impact of COVID-19 on Delhi students," 2023 Future of Educational Innovation-Workshop Series Data in Action, Monterrey, Mexico, 2023, pp. 1-8, doi: 10.1109/IEEECONF56852.2023.10104773.

Scholarships

Obtained a full scholarship from Tecnológico de Monterrey in favor of coursing the Master of Science in Engineering.
Attained the "Consejo Nacional de Ciencia y Tecnologia (CONACYT) Scholarship," financial support from Mexico's national science and technology organization.

Internships

Lean Manufacturing Program Teaching Assistant (August 2021 - December 2021): Assisted the School of Engineering and Sciences in the Lean Manufacturing Program, where I helped the faculty with the grading of 31 students and developed basic data science activities in Jupyter Notebook. In addition, I developed video explanations to design dashboards in the ArcGIS online platform. Finally, I aided in the writing of a literature review on the topics of "Optimization Models for Logistics Network Design" and "Logistics Network Design using geographic information systems (GIS)."

B.S. Industrial and Systems Engineering

August 2016 - June 2020

Activities

Developed a series of assignments focused on implementing continual improvement, statistics, optimization, and logistics projects.

Scholarships

Obtainment of the Academic Talent Scholarship from Tecnológico de Monterrey, a 60% full scholarship in favor of coursing college in this institution.

Internships

Floor Walker at BMW Group (January 2019 - June 2019): Reduced time delay of supply to line process by 52% through time series analysis and monitoring warehouse operations. Supported Supply Chain departments focused on material flow, supply to line, warehouse management, and IT by performing exploratory data analysis of the company's operational datasets in SAP STARD ERP.
Physical Logistics Intern at BMW Group (February 2020 - June 2020): I was dedicated to the supervision of three service suppliers and implemented cost-saving initiatives to address long-standing operational problems. In addition, I generated four daily status and three weekly reports detailing logistic operations and issues regarding logistics KPIs.

International Experiences

Universidad de Burgos (Burgos, Spain).
University of Waterloo (Waterloo, Ontario, Canada).

Achievements

Awarded as one of the Best Averages of the generation of the August - December 2019 semester.

Skills

Throughout my experience, I've developed various Data Science, Optimization, and Statistics projects, which have helped me attain the following technical skills in Machine Learning.

Classification
Clustering
Time Series Analysis
Feature Engineering
Data manipulation
Data visualization
Regression

In addition, I've also been capable of achieving the following coding skills.

Python
R
Matlab
SQL
Latex
JavaScript

Projects

In this section, I present some of my projects.

Factors to improve online education, a study on the impact of COVID-19 on Delhi students

February 2022 - January 2023

Applied seven different machine learning classification models (Logistic Regression, Naïve Bayes, Decision Tree, Support Vector Machine, Gradient Boosting, XGBoost, and Random Forest) to predict New Delhi students' online class rating (i.e., "Excellent," "Poor") through the use of their demographic and behavioral information during COVID-19. Results show that Naïve Bayes (NB) model was the best for online class rating prediction, with a ROC-AUC of 83.738% and an F1 score of 74.648%.

Moreover, given the factors that appear to be relevant in the ML algorithms, the work proposes an action plan to increase students' online class satisfaction and facilitate their learning experience. For example, this study recommends incorporating outdoor and indoor activities that stimulate the students' creativity as strategies for dealing with stress. Additionally, this work discusses the importance of using technological tools (i.e., Tablets) for students to take their online classes comfortably. Furthermore, it is necessary to moderate the use of Social Media Platforms to prevent the damaging aspects of mental health that these suppose for students. Finally, it is also essential to enhance these types of activities for students from 18 to 22 years old, as they appear to be the most negatively affected by the online class transition.

Research Publication: V. V. Pineda-Romero, C. E. Orozco-Mora and H. G. Ceballos, "Factors to improve online education: A study on the impact of COVID-19 on Delhi students," 2023 Future of Educational Innovation-Workshop Series Data in Action, Monterrey, Mexico, 2023, pp. 1-8, doi: 10.1109/IEEECONF56852.2023.10104773.

Tools: Python with Jupyter Notebooks, NumPy, Matplotlib, Seaborn, Statsmodels, Pandas, Scikit‑learn, XGBoost, LaTeX, SHAP.

Prediction of occupancy levels in enclosed areas using environmental factors: A classroom scenario

September 2022 - December 2023

The catalyst of this project was the environmental data obtained from 15 sensors located in a classroom. Five measure the temperature, the other five measure humidity, and the rest measure the pressure. These sensors have been spread over the 5 Tables in the area. Hence, each Table contains a temperature, humidity, and pressure sensor. This project aims to use such sensors' information to predict the classroom's occupation level (i.e., Empty, Low, Medium, High).

Throughout this project, we have applied several preprocessing treatments i.e., a feature selection analysis to search which sensors were most beneficial for the predictions. Furthermore, we applied an SVM model to estimate a classroom's four occupancy levels. Results showed that the methods used to predict occupancy levels achieve an accuracy of at least 93.54%.

Tools: Python with Jupyter Notebooks, Pandas, Numpy, Scikit‑learn, Seaborn, Matplotlib, Imbalanced‑learn, SciPy.

A data mining‑driven storage policy to improve order‑picking efficiency

February 2022 - June 2022

This paper is inspired by a manual picking retail company where Unit of Measurement and Load constraints affect the order-picking process. For this work, we used clustering data-mining techniques to develop three class-based storage strategies that consider the unit of measurement, weight, and demand of products. We test these strategies with the information provided by a Mexican retail company that stores over 200 different items. When comparing the current storage strategy with the one proposed in this work, we found that our proposition enabled a reduction of up to 11.33% in its order-picking traveling distance.

Furthermore, these storage configurations were assessed and compared when implemented with the company's current S-shape routing heuristic and with a proposed Constrained S-shape routing heuristic. The results show that the constrained approach reduced up to 45.13% of the company's current order-picking process time.

Tools: Microsoft SQL Server, Python with Jupyter Notebooks, NumPy, Matplotlib, Seaborn, SciPy, Statsmodels, Pandas, Scikit‑learn, LaTeX.

Routing and Storage Assignment for the Precedence-Constrained Order Picking Process

September 2020 - June 2022

This project was inspired by a Mexican retail company that stores perishable products such as yogurts and other desserts. These products are sensitive items; thus, pickers must stack them adequately to protect their physical quality. Hence, this warehouse considers precedence constraints in its order-picking process (i.e., Unit of Measurement and Load constraints). This work compares the picking sequence currently used by the company (i.e., S-shape heuristic) against the sequences generated by the optimal solution and a Genetic Algorithm, which involve precedence constraints. We compared these in terms of traveling distance, the feasibility of results, and order-picking process time. In addition, we used clustering techniques to develop three class-based storage assignment configurations to increase the order-picking process efficiency further. Results show that implementing the Genetic Algorithm and the third configuration of the class-based storage policy increases order-picking process efficiency by 34.5%.

Tools: Microsoft SQL Server, Python with Jupyter Notebooks, Gurobi Python, Pandas, Scikit‑learn, Numpy, Matplotlib, LaTeX.

Google Mobility Analysis of Mexico during COVID-19

February 2022 - March 2022

Analyzed trends, seasonality, and stationarity behaviors from the Google Mobility Report of Mexico, during COVID-19. Using the information from the analysis, we developed an Auto-Regressive model that could predict Workplace mobility for a week of November 2020 with an RMSE of 6.32 for the training and 13.44 in the test set. This quantity means that, on average, the predictions differ from the actual value by only 13 units.

Tools: Python with Jupyter Notebooks, Pandas, Numpy, Matplotlib, Scipy, Scikit‑learn, Seaborn, Statsmodels.

Effect of adding omitted variables in Mediation and Moderation Analyses

February 2021 - June 2021

Mediation and moderation analyses have been widely implemented in social sciences. Both techniques use regression analysis for modeling; consequently, these are subjected to assumptions of linearity, normality, constant variance, and independence of errors. When these assumptions are violated, models generate distorted results on the magnitude of effects and the causal relationships. To remediate the appearing inconsistencies, a common practice is the inclusion of variables that exert specific effects given their nature. In this project, a case study is analyzed based on the work developed by Mukuka et al. (2021), where we explore the results of adding moderators to their model. In this work, we compare the robustness and assumptions fulfillment from the original model with our proposed model. Results show that the second model increases the coefficient of determination by approximately 43.96% and decreases Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) by 7.9 and 6.12 units, respectively. In addition, all regression analysis assumptions are met.

Tools: R with R Studio, tidyr, dplyr, ggplot2, orcutt, lmtest, olsrr, forecast, caret.

More Projects

For more projects enter to the following sites.

Resume

Download resume.

Resume