My work currently focuses on the collection and analysis of large healthcare databases, including clinical, genomic, and operational data. I especially enjoy developing software and designing systems to accelerate this work.
I also specialize in visualizing and communicating insights from complex data, to both interdisciplinary groups of stakeholders and non-experts.
I am a proponent of open access and reproducibility in research.
I’m a data scientist, epidemiologist, and software engineer. I have more than a decade of experience building software, designing and implementing studies, and analyzing data for healthcare-related research.
I’m currently an Assistant Professor of Precision Health at Geisinger Research where I focus on risk prediction, communicating and visualizing complex information, and applications of clinical informatics and bioinformatics. I previously led the data science team at a data-drive healthcare software startup.
My PhD is from University of Maryland Baltimore’s epidemiology department. (Epidemiology is a generalist field related to public health research; my training included statistics, study design, survey methods, and causal inference.) My dissertation involved UX research and data presentation optimization for public health information.
Please see my CV for more on my background and experience.
My current research includes:
- Predicting germline variant pathogenicity using large genomic and clinical data sources.
- Automated methods for abstracting clinical data from electronic medical records.
Some of my past work includes:
cometsoftware suite for collecting discrete choice experiment data in low-connectivity settings.
My dissertation, titled Assessing and Improving Patient Understanding of Publicly Reported Healthcare-Associated Infection-Related Hospital Quality Measures.
Open Source Software
A tool for fixing the formatting of SAS output so it can be used in presentations, emailed, etc. without funny characters or unnecessary white space.
Documentation & Teaching
Some of my favorite pages include my R and SciPy reference notes, my exhaustive analysis of Mac and iOS notetaking apps, and my list of resources for learning
Thoughts on Reference Management Software
What reference management software should you use? I wrote the first version in 2015 and have updated it constantly since then.
Organizing Data Analysis Projects
Best practices for organizing data analysis code, data, and other related documents.
Data Presentation Tips
Best practices for presenting data, including examples and links to reference materials.
Tools for Epidemiologists
A curated list of online resources and software for epidemiologists.
Pitchfork music reviews + Rdio mashup
An easy way to see what new music is available on Rdio and how good it is (according to Pitchfork).
The Survey Software Review
A systematic, independent analysis of online survey software for researchers.