Location: Boston, MA (on-site)
Commitment: full-time
Team: R&D • Data & QA/QC
We’re looking for a Data Scientist with a strong background in feature extraction, data cleaning, and pipeline optimization to help us advance a next-generation diagnostic platform. Your main role will be to fine-tune our software tools and develop a quality control (QC) pipeline, making sure the reproducibility, consistency, and robustness of every dataset for downstream machine learning and clinical applications.
Key Responsibilities
Feature Extraction & Data Processing
● Develop and refine data pipelines for high-dimensional atomic force microscopy outputs into clean, reproducible features.
● Develop, extract, and validate quantitative features (e.g., surface texture, topography, statistical descriptors) from imaging datasets.
● Ensure data is reproducible, normalized, and consistent within/between runs, instruments, protocols, operators, and study centers.
● Apply advanced preprocessing methods (baseline correction, denoising, drift correction) to microscopy outputs.
Data Quality & Statistical Analysis
● Build quality control (QC) metrics to flag and correct inconsistent or low-quality data; Implement power analysis and success criteria.
● Support validation analyses (CI, ICC, Bland–Altman, LoB/LoD, etc) studies across datasets to ensure the reliability of extracted features.
Machine Learning Readiness
● Apply data cleaning, normalization, and denoising techniques to improve data quality
and prepare ML-ready datasets by integrating multi-channel, multi-condition data streams.
● Exploratory data analysis, hypothesis generation, baseline models, ablations, and error analysis.
● Literature scans: Surface Topography, biophysical properties, feature stability/importance analysis; comparative model studies (RF/XGBoost/DL).
● Collaborate with ML engineers to align feature extraction workflows with classifier needs (Random Forest, XGBoost, deep learning frameworks).
● Monitor and improve data pipeline efficiency and scalability for large datasets.
Communication & Collaboration
● Align feature schemas with classifier needs; partner with ML engineers on interfaces and data contracts.
● Write clear SOPs; present findings to cross-functional R&D team, QA/CLIA, clinical operations, and collaborators.
Required Qualifications
● 5+ years in data science or scientific computing (or MS/PhD with equivalent project depth).
● Strong hands-on skills in Python (NumPy, Pandas, SciPy) and statistical data analysis.
● Hands-on feature engineering for imaging or time-series/3D surface data; signal preprocessing experience.
● Experience with feature extraction, image, or signal processing.
● Proven work on data quality: normalization, denoising, artifact detection, and QC metrics.
● Clear communicator who can work independently and with cross-functional teams.
● Experience with industry diagnosis experience
● Leadership skills in data scientists
Desired Skills
● Familiarity with multi-modal data integration and large-scale computational workflows.
● Background in biological data analysis or clinical diagnostics.
● Experience with machine learning classifiers (Random Forest, XGBoost, SVM).
● Experience with surface texture analysis and ISO-standard feature characterization.
Why Join Us
You’ll be joining a fast-moving team at the intersection of data science, biotech, and diagnostics. This is a chance to shape how raw imaging data becomes clinically meaningful insights, with direct impact on patient outcomes.