Allison Krinsky

Xu Lab @ University of Washington

Summary

I have been working at the Xu Lab, a UW biochemistry lab, since April, 2020. I was originally tasked with performing the data analytics and visualizations but my role grew as I took on new responsibilities in web development and machine learning. I learned Python’s Flask package and developed interactive web pages complete with data mining features for future analysis.

Visit Lab Website

Data Analysis & Database Management

I parsed and cleaned data from various national molecule databases in order to retreive information necessary for machine learning models. Then, created a SQL database to hold the characteristics of over 5 million molecules. Using this database and our models, generated CCS values for reference. The CCSbase database predicts based on the molecule structure (SMI) and Lipydomics predicts using the lipid class, number of carbons and number of unsaturations.

Technologies: Python RDkit (Cheminformatics package), SQL, Jupyter Notebook

CCS Prediction Database
CCS Prediction Database
Lipydomics Prediction Database
Lipydomics Prediction Database

Web Dev

User login and authentication

I created an account system to track user inputs. Implementing a login with user authentication allows us to grant access to our propriatery DMCCS model to our consumers.

DMCCS Login Page
Data Collection Page

Data Mining

With an admin login, our lab members can access our list of users and what structures have been searched. Here, we can view or download the users information and recent structures in order to analyze what molecules are predicted more regularly so we can prioritize our database.

Technologies: Python’s Flask, Django, SQL, HTML/CCS

Machine Learning

The CCSbase model is used to replace experimental lab values with a theoretical value. Instead of performing an experiement to gather this information, we use the structure of the molecule to predict what that value would be.

My tasks thus far have consisted of retraining the model with the new data that I added to our database, and analyzing the goodness of the models in relation to one another.

Technologies: Sklearn, Pandas, Numpy, Matplotlib

dmccs prediction page
DMCCS Prediction Page
dmccs results page
DMCCS Prediction Results Page