About Me


Specialization in data science with 8+ years of experience in fast-paced, high-stress construction environments as part of interdisciplinary project management teams. Pursuing a professional career in data science to find efficient, user-friendly solutions to management systems in industries that are slow to adopt new technologies.

As an unrepentant hobbyist, data science allows me to develop tools that improve upon my existing projects as well as open opportunities for new ones.

  Education


Web Development

    Udemy, 2023

Data Science

    Lighthouse Labs, 2022

Civil Engineering

    Western University, 2011

  Experience


Engineering Research Lead Volunteer

Healthcare Systems, December 2023 to Present

  • Continuing ongoing research from previous term.
  • Lead on developing new model architecture.

Engineering Research Assistant

Healthcare Systems, September 2023 to December 2023

  • Assisted in developing machine learning models trained on magnetic map data intended to predict mineral concentrations for a given area.
  • Developed tools to analyze model performance that directed future experiments towards incremental improvements.
  • Experiments included transfer learning models trained on imagenet and bigearthnet datasets.
  • Previous attempts at modeling a neural net trained on the magnetic map data resulted in \(R^2\) scores below 0; by term end, model performance saw positive scores.
  • Owns and maintains submodule repository and documentation of development code.

Freelance Data Analyst

Self-employed, October 2022 to Present

  • Working with clients to retrieve and manipulate data accumulated over the last decade.
  • Performed discoveries in the context of their business goals to uncover actionable insights.
  • Directly communicated with COO to develop hypotheses and outline the investigation.

Project Coordinator

Four Seasons Site Development, March 2014 to July 2022

  • Implemented and enforced production-specific safety programs and maintained clean safety records year-on-year in concert with clients, consultants, subcontractors, and company health & safety management personnel; challenging safety conditions include 30-m depth excavations and micro-tunneling underneath a provincial highway.
  • Developed skills to meet challenging deadlines and negotiate unforeseen complications in the pre-construction and exploratory phases of most projects; client needs were met with minimal loss to time and budget.
  • Created technical documentation for clients to satisfy contract requirements when closing projects.

Cadet Engineer

Stateland, April 2013 to February 2014

  • Assessed BoM for semi-detached housing to spot deviations from historical data and/or implement changes according to design revisions to inform future design decisions relating to subdivision developments and other low-cost residential projects.
  • Provided material take-offs to develop construction schedules and plans for material procurement.
  • Consulted with vendors to closely monitor their efforts (and re-establish deadlines as necessary) and to conduct trade buy-ins for masonry works, steel framing, and for various architectural elements such as windows and doors.
  • Facilitated meetings between senior engineers and vendors to negotiate costs, determine product specifications, and maintain collaborative relationships between all parties.
  • Created and modified existing templates to automate routine assignments, which doubled work efficiency as well as enhanced accuracy and consistency by removing sources of human error.

  Projects


  Flight Delay Prediction

Data Science, Lighthouse Labs Capstone

Dataset Features

  • Data & time of arrival
  • Data & time of departure
  • Airline name & airport of arrival and departures
  • Plane configuration, capacity, and passenger count
  • Flight distance & direction

Performed exploratory data analysis to inform data processing and augmentation based on distribution, biases, and counts using scipy and statsmodels. Used pandas, numpy, and boosting techniques to process and normalize data, and then applied PCA for feature reduction. Interfaced with weather and traffic API to engineer new features. Developed XGBoost model to predict flight delays and tested against 2020 data.

Resulting \(R^2\) scores were poor. Further progress could be made by incorporating data from services affected by COVID.

  Mineral Exploration

Healthcare Systems R&A

Dataset Features

  • Magnetic map data
  • Distance to volcanic formations
  • Distance to faults
  • Distance to outcroppings
  • Adjacent mineral concentrations from rock samples
  • XY-coordinate information and geodata for POI

Onboard onto existing project. Tasked with creating CNN model using magnetic map data as feature dimensions. Developed techniques to circumvent issues with the dataset such as sparseness and low observation counts.

Preprocessing involved filtering for outliers (using IQR) and null values. Normalized data using logarithmic transformation (directly through the target variable or indirectly by using a logarithmic loss metric). Images were augmented using flips and rotations via the OpenCV library to boost observation counts and to also balance the data.

Tensorflow allowed for custom metrics and for controlled learning rate decays based on granular batch evaluation scores. To analyze model performance, used pyplot to graph training and evaluation metrics along with batch-level parameter trends similar to Tensorboard graphing utilities.

Resulting \(R^2\) scores were poor and remained poor, but improvements were observed throughout the research term. To start, CNN models using transfer learning resulted in -0.16. At present, improvements to model parameters and dataset processing increased the score to +0.1.

Took initiative to begin development of a two-stage model that classifies zero value observations against non-zero value observations, then calculates a regression accordingly. Currently in development.

Managed git versioning through GitHub as repository owner. Responsible for writing and maintaining program documentation with the intention of institutionalizing knowledge gained from research.

  Skills


Web Development

  • Python
  • Javascript
  • HTML/CSS
  • Node
  • Dreamhost

Presentation

  • Adobe Illustrator
  • Whimsical
  • Markdown

Data Science

  • Tensorflow v2
  • Tensorboard
  • KerasTuner
  • XGBoost
  • CNN Models
  • LaTeX

Data Analysis

  • numpy
  • pandas
  • matplotlib
  • pyplot
  • statsmodels

  References


Dr. Ciro Diaz

Data Scientist

Mai Nguyen

Accounting Analyst

Sunil Bansal

Product Owner

Niruja Nagarajah

HR Consultant