Why We Exist

Most early-stage biotech companies are sitting on valuable data without the infrastructure to use it. Your scientists are spending 60% of their time cleaning data instead of doing science. Your ML models live in notebooks that no one else can run. Your data is scattered across laptops, shared drives, and email attachments.

We started Icicle because we've been on both sides of this problem — in the research lab generating the data, and in industry building the systems that make it useful. We bridge that gap for teams that are too early for a full-time data hire and too serious for a freelancer.

Our Team

We don't need a biology tutorial — we've lived it. The same people you talk to are the ones building your pipeline.

J

Jumana

Founder & Principal

UW bioengineering alumna and published researcher in gene therapy discovery, behavioral neuroscience, and medical informatics. Jumana has built production ML systems at biotech startups and scaled computational teams from the ground up — bridging wet-lab science and software engineering so early-stage companies can move faster with their data.

Research Credentials

University of Washington alumni. Published researchers in gene therapy discovery, behavioral neuroscience, and medical informatics. We understand your assays, your data types, and your regulatory constraints — because we've worked inside them.

Production Engineering

Scaled computational teams at biotech startups. Built production ML systems and accelerated discovery pipelines. We don't build demos — we build infrastructure that runs unattended in your cloud and holds up under scrutiny.

Full-Stack Capability

End-to-end ML pipelines, cloud infrastructure, data engineering, and interactive dashboards. From a Jupyter notebook to a deployed, documented, production system.

Tools & Methods

Bioinformatics

Scanpy Seurat DESeq2 Cell Ranger STAR FlowCytometryTools Snakemake

ML / AI

PyTorch scikit-learn XGBoost Hugging Face MLflow

Infrastructure

AWS GCP Docker Kubernetes Terraform

Data

PostgreSQL S3 DVC Pandas Polars Apache Airflow

Visualization

Plotly Streamlit Matplotlib Seaborn

Embedded, Not
Arms-Length.

01

Data Readiness Diagnostic

We start with a focused assessment of your data landscape — sources, pipelines, bottlenecks, opportunities. You get a prioritized roadmap, no commitment required.

02

Build & Deploy

We scope a project, embed with your team, and build production-grade systems — ML pipelines, data infrastructure, analysis workflows. Working code in your repository, not a slide deck.

03

Transfer & Grow

We document everything, pair with your engineers, and make your team self-sufficient. Our goal is to make you strong enough to not need us — and then call us back because you want to.

What We've Delivered.

01

University Research Collaboration

Built reproducible analysis pipelines for a genomics research group at a major research university. Automated a manual workflow that previously took 2 weeks of postdoc time per experiment, reducing it to under 4 hours. Delivered publication-ready figures and documented code that the lab continues to use independently.

02

Biotech Data Infrastructure

Designed and deployed a cloud-based data infrastructure for an early-stage biotech company, consolidating 18 months of experimental data from scattered spreadsheets and local drives into a structured, version-controlled system. Established naming conventions, automated backups, and access controls that passed investor due diligence review.

03

Convoke — Ongoing Partnership

Embedded data science and engineering support for Convoke on an ongoing retainer basis. Built automated analysis pipelines that replaced manual Excel-based workflows, cutting weekly reporting time from 8 hours to under 45 minutes. Continues as an active client relationship.

University of Washington
University of Pennsylvania
Convoke

Tell Us What
You're Building.

Book a Call