Cassandra + Spark for Genomic Big Data (Anupama Joshi & Matt Negulescu, Epinomics)
At Epinomics, we are advancing epigenetic research to drive personalized medicine, using epigenomic data analysis. Our goal is to provide an analysis resource to the community that will promote high quality, replicable, and interpretable results. We work with academic and commercial users to get their genomic sequencing data and metadata in our system. We find some epigenetic features from the sequenced genome, which are called ""chromatin accessibility"" which is indicative of the instrumental epigenetic changes responsible for differential gene expression and disease development. We have a spark based pipeline which retrieves chromatin accessibility data from Cassandra and runs analysis finding overlapping accessibility, cluster this data and run machine-learning algorithms. At each step we store this data in Cassandra which Powers our interactive D3.js based visualizations. We are building the Epigenomic landscape so as to revolutionize the personal medicine field. About the Speakers Anupama Joshi Principal Engineer, Epinomics At Epinomics,I am responsible for overall technical leadership for product development and delivery. I Manage the engineering team and the roadmap of Epinomics tech infrastructure and consumer facing product. I work on design and architecture of analytics infrastructure to process large amounts of NGS data. Work with a team of scientist to develop machine learning algorithms to find actionable insights in the genetic data.