The Data Science Facility at the Donald Danforth Plant Science Center is a computing and data analytics hub that develops and deploys technologies in computer science, mathematics, and statistics to accelerate discoveries from data and models in plant science.
Physically, the facility supports computing through several modalities: 1) high-performance computing and workflow management on an HTCondor cluster; 2) virtualized applications using machine- and container-level virtualization; 3) web/database applications and support. Currently, the infrastructure contains over 1300 processing cores and 2800 graphics processing cores, more than 8 terabytes of memory, and a single, high-performance 721 terabyte storage area network. These resources are shared in a managed, multi-user environment and communicate via a 10 gigabit Ethernet network. Management of the system is simplified through virtualization of key services, which also allows for the deployment of diverse applications and platforms simultaneously.
Services offered by the Data Science Facility include 1) user services: authentication services/user accounts, software installation, patches and upgrades, troubleshooting, advising, Slack (virtual help desk), GitHub (version control), training (system usage, specific software, workflows), documentation, and outreach; 2) computing: cluster resources, web server hosting, database server hosting, maintenance/upgrades, system monitoring, and virtual machine and container management; 3) storage: monitoring, performance configuration, and maintenance. Additionally, the facility consults on the development of computational, data analysis, and experimental design components of proposals and assists with editing of computational and statistical analysis sections of manuscripts. The core facility also offers analysis services, ranging from whole project consulting to individual analyses.
Intellectual development is offered by members of the facility through regular workshops and training events, custom application development for lab or group projects, and community-based sharing of software, ideas, and methods. In addition, the facility enhances interaction between groups at the center and partner institutions and facilitates interoperation between local computing and storage resources and public/private cloud/cyberinfrastructures such as Amazon Web Services, CyVerse, and Open Science Grid.
Director and Principal Investigator
Assistant Professor, Dakota State University
Graduate Student, Cornell University
Graduate Student, Illinois State University
Software Developer, Bayer Crop Science
The Data Science group at the Danforth Center uses and develops computational approaches and infrastructure that leverage large datasets to address biological problems. We emphasize the development of modular, reusable, and open-source tools through collaborator- and community-driven efforts. Our aim is to apply these tools to high-throughput genotyping and phenotyping data to identify the genetic basis of traits in research model plants and biofuel and food security crops.
The ability to rapidly and non-destructively measure plant physical and physiological features is a key bottleneck in plant research and breeding. Imaging coupled with computer vision algorithms and statistical analysis are a set of technologies that have the potential to address the plant phenotyping bottleneck, but they introduce their own computing, interpretation, and data management challenges that our group develops tools to address so that these technologies can be utilized more broadly by the scientific community. Plant Computer Vision (PlantCV) is our primary platform for developing a plant phenotyping toolbox. Through PlantCV we are deploying computer vision, machine learning, and other data science algorithms to extract biologically relevant data from image and sensor datasets.
A major emphasis of the Data Science group is collaboration, which enables us to apply the tools we develop to a variety of plant systems. Diverse candidate biofuel feedstocks such as Camelina sativa (oilseed) and Sorghum bicolor (lignocellulosic feedstock) are major focuses in the group where we are utilizing natural variation and high-throughput phenotyping to study the genetic basis of traits that could improve these crops for bio-based fuels. We are also developing tools for model systems (e.g. Arabidopsis thaliana and Setaria viridis), food security crops (e.g. cassava), and other systems for producing plant natural products (e.g. indigo).
Zheng X, Carrington JC, Fahlgren N, Abbasi A, Berry JC. 2018. Antiviral functions of ARGONAUTE proteins during Turnip Crinkle Virus infection revealed by image-based trait analysis in Arabidopsis. bioRxiv:487322. DOI: 10.1101/487322.
Berry JC, Fahlgren N, Pokorny AA, Bart RS, Veley KM. 2018. An automated, high-throughput method for standardizing image color profiles to improve image-based plant phenotyping. PeerJ 6:e5727. DOI: 10.7717/peerj.5727.
Feldman MJ, Ellsworth PZ, Fahlgren N, Gehan MA, Cousins AB, Baxter I. 2018. Components of water use efficiency have unique genetic signatures in the model C4 grass Setaria. Plant Physiology 178:699–715. DOI: 10.1104/pp.18.00146.
Burnette M, Kooper R, Maloney JD, Rohde GS, Terstriep JA, Willis C, Fahlgren N, Mockler T, Newcomb M, Sagan V, Andrade-Sanchez P, Shakoor N, Sidike P, Ward R, LeBauer D. 2018. TERRA-REF Data Processing Infrastructure. In: Proceedings of the Practice and Experience on Advanced Research Computing. ACM, 27. DOI: 10.1145/3219104.3219152.
Li H, Yin Z, Manley P, Burken JG, Shakoor N, Fahlgren N, Mockler T. 2018. Early drought plant stress detection with bi-directional long-term memory networks. Photogrammetric Engineering & Remote Sensing 84:459–468. DOI: 10.14358/PERS.84.7.459.
Tovar JC, Hoyer JS, Lin A, Tielking A, Callen ST, Elizabeth Castillo S, Miller M, Tessman M, Fahlgren N, Carrington JC, Nusinow DA, Gehan MA. 2018. Raspberry Pi-powered imaging for plant phenotyping. Applications in Plant Sciences 6:e1031. DOI: 10.1002/aps3.1031.
Gehan MA, Fahlgren N, Abbasi A, Berry JC, Callen ST, Chavez L, Doust AN, Feldman MJ, Gilbert KB, Hodge JG, Hoyer JS, Lin A, Liu S, Lizárraga C, Lorence A, Miller M, Platon E, Tessman M, Sax T. 2017. PlantCV v2: Image analysis software for high-throughput plant phenotyping. PeerJ 5:e4088. DOI: 10.7717/peerj.4088.
Fahlgren N, Bart R, Herrera-Estrella L, Rellán-Álvarez R, Chitwood DH, Dinneny JR. 2016. Plant scientists: GM technology is safe. Science 351:824. DOI: 10.1126/science.351.6275.824-a.
Fahlgren N, Hill ST, Carrington JC, Carbonell A. 2016. P-SAMS: a web site for plant artificial microRNA and synthetic trans-acting small interfering RNA design. Bioinformatics 32:157–158. DOI: 10.1093/bioinformatics/btv534.
Abbasi A, Fahlgren N. 2016. Naïve Bayes pixel-level plant segmentation. In: 2016 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, 1–4. DOI: 10.1109/WNYIPW.2016.7904790.
Wang H, Beyene G, Zhai J, Feng S, Fahlgren N, Taylor NJ, Bart R, Carrington JC, Jacobsen SE, Ausin I. 2015. CG gene body DNA methylation changes and evolution of duplicated genes in cassava. Proceedings of the National Academy of Sciences of the United States of America 112:13729–13734. DOI: 10.1073/pnas.1519067112.
Fahlgren N, Feldman M, Gehan MA, Wilson MS, Shyu C, Bryant DW, Hill ST, McEntee CJ, Warnasooriya SN, Kumar I, Ficor T, Turnipseed S, Gilbert KB, Brutnell TP, Carrington JC, Mockler TC, Baxter I. 2015. A versatile phenotyping system and analytics platform reveals diverse temporal responses to water availability in Setaria. Molecular Plant 8:1520–1535. DOI: 10.1016/j.molp.2015.06.005.
Carbonell A, Fahlgren N, Mitchell S, Cox KL Jr, Reilly KC, Mockler TC, Carrington JC. 2015. Highly specific gene silencing in a monocot species by artificial microRNAs derived from chimeric miRNA precursors. The Plant Journal: for cell and molecular biology 82:1061–1075. DOI: 10.1111/tpj.12835.
Fahlgren N, Gehan MA, Baxter I. 2015. Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Current Opinion in Plant Biology 24:93–99. DOI: 10.1016/j.pbi.2015.02.006.
Garcia-Ruiz H, Carbonell A, Hoyer JS, Fahlgren N, Gilbert KB, Takeda A, Giampetruzzi A, Garcia Ruiz MT, McGinn MG, Lowery N, Martinez Baladejo MT, Carrington JC. 2015. Roles and programming of Arabidopsis ARGONAUTE proteins during Turnip Mosaic Virus infection. PLoS Pathogens 11:e1004755. DOI: 10.1371/journal.ppat.1004755.
Gilbert KB, Fahlgren N, Kasschau KD, Chapman EJ, Carrington JC, Carbonell A. 2014. Preparation of multiplexed small RNA libraries from plants. Bio-protocol 4:e1275. DOI: 10.21769/BioProtoc.1275.
Carbonell A, Takeda A, Fahlgren N, Johnson SC, Cuperus JT, Carrington JC. 2014. New generation of artificial microRNA and synthetic trans-acting small interfering RNA vectors for efficient gene silencing in Arabidopsis. Plant Physiology 165:15–29. DOI: 10.1104/pp.113.234989.
Jeong D-H, Schmidt SA, Rymarquis LA, Park S, Ganssmann M, German MA, Accerbi M, Zhai J, Fahlgren N, Fox SE, Garvin DF, Mockler TC, Carrington JC, Meyers BC, Green PJ. 2013. Parallel analysis of RNA ends enhances global investigation of microRNAs and target RNAs of Brachypodium distachyon. Genome Biology 14:R145. DOI: 10.1186/gb-2013-14-12-r145.
Fahlgren N, Bollmann SR, Kasschau KD, Cuperus JT, Press CM, Sullivan CM, Chapman EJ, Hoyer JS, Gilbert KB, Grünwald NJ, Carrington JC. 2013. Phytophthora have distinct endogenous small RNA populations that include short interfering and microRNAs. PloS ONE 8:e77181. DOI: 10.1371/journal.pone.0077181.
Carbonell A, Fahlgren N, Garcia-Ruiz H, Gilbert KB, Montgomery TA, Nguyen T, Cuperus JT, Carrington JC. 2012. Functional analysis of three Arabidopsis ARGONAUTES using slicer-defective mutants. The Plant Cell 24:3613–3629. DOI: 10.1105/tpc.112.099945.
Zhang C, Montgomery TA, Fischer SEJ, Garcia SMDA, Riedel CG, Fahlgren N, Sullivan CM, Carrington JC, Ruvkun G. 2012. The Caenorhabditis elegans RDE-10/RDE-11 complex regulates RNAi by promoting secondary siRNA amplification. Current Biology: CB 22:881–890. DOI: 10.1016/j.cub.2012.04.011.
Fischer SEJ, Montgomery TA, Zhang C, Fahlgren N, Breen PC, Hwang A, Sullivan CM, Carrington JC, Ruvkun G. 2011. The ERI-6/7 helicase acts at the first stage of an siRNA amplification pathway that targets recent gene duplications. PLoS Genetics 7:e1002369. DOI: 10.1371/journal.pgen.1002369.
Hu TT, Pattyn P, Bakker EG, Cao J, Cheng J-F, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo Y-L. 2011. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nature Genetics 43:476–481. DOI: 10.1038/ng.807.
Cuperus JT, Fahlgren N, Carrington JC. 2011. Evolution and functional diversification of MIRNA genes. The Plant Cell 23:431–442. DOI: 10.1105/tpc.110.082784.
Zhang C, Montgomery TA, Gabel HW, Fischer SEJ, Phillips CM, Fahlgren N, Sullivan CM, Carrington JC, Ruvkun G. 2011. mut-16 and other mutator class genes modulate 22G and 26G siRNA pathways in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America 108:1201–1208. DOI: 10.1073/pnas.1018695108.
Cuperus JT, Carbonell A, Fahlgren N, Garcia-Ruiz H, Burke RT, Takeda A, Sullivan CM, Gilbert SD, Montgomery TA, Carrington JC. 2010. Unique functionality of 22-nt miRNAs in triggering RDR6-dependent siRNA biogenesis from target transcripts in Arabidopsis. Nature Structural & Molecular Biology 17:997–1003. DOI: 10.1038/nsmb.1866.
Fahlgren N, Jogdeo S, Kasschau KD, Sullivan CM, Chapman EJ, Laubinger S, Smith LM, Dasenko M, Givan SA, Weigel D, Carrington JC. 2010. MicroRNA gene evolution in Arabidopsis lyrata and Arabidopsis thaliana. The Plant Cell 22:1074–1089. DOI: 10.1105/tpc.110.073999.
International Brachypodium Initiative. 2010. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768. DOI: 10.1038/nature08747.
Garcia-Ruiz H, Takeda A, Chapman EJ, Sullivan CM, Fahlgren N, Brempelis KJ, Carrington JC. 2010. Arabidopsis RNA-dependent RNA polymerases and Dicer-Like proteins in antiviral defense and small interfering RNA biogenesis during Turnip Mosaic Virus infection. The Plant Cell 22:481–496. DOI: 10.1105/tpc.109.073056.
Cuperus JT, Montgomery TA, Fahlgren N, Burke RT, Townsend T, Sullivan CM, Carrington JC. 2010. Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing. Proceedings of the National Academy of Sciences of the United States of America 107:466–471. DOI: 10.1073/pnas.0913203107.
Fahlgren N, Carrington JC. 2010. miRNA Target Prediction in Plants. In: Meyers BC, Green PJ eds. Plant MicroRNAs. Methods in Molecular Biology. Totowa, NJ: Humana Press, 51–57. DOI: 10.1007/978-1-60327-005-2_4.
Klevebring D, Street NR, Fahlgren N, Kasschau KD, Carrington JC, Lundeberg J, Jansson S. 2009. Genome-wide profiling of populus small RNAs. BMC Genomics 10:620. DOI: 10.1186/1471-2164-10-620.
Gu W, Shirayama M, Conte D Jr, Vasale J, Batista PJ, Claycomb JM, Moresco JJ, Youngman EM, Keys J, Stoltz MJ, Chen C-CG, Chaves DA, Duan S, Kasschau KD, Fahlgren N, Yates JR 3rd, Mitani S, Carrington JC, Mello CC. 2009. Distinct argonaute-mediated 22G-RNA pathways direct genome surveillance in the C. elegans germline. Molecular Cell 36:231–244. DOI: 10.1016/j.molcel.2009.09.020.
Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE, Cano LM, Grabherr M, Kodira CD, Raffaele S, Torto-Alalibo T, Bozkurt TO, Ah-Fong AMV, Alvarado L, Anderson VL, Armstrong MR, Avrova A, Baxter L, Beynon J, Boevink PC, Bollmann SR, Bos JIB, Bulone V, Cai G, Cakir C, Carrington JC, Chawner M, Conti L, Costanzo S, Ewan R, Fahlgren N, Fischbach MA, Fugelstad J, Gilroy EM, Gnerre S, Green PJ, Grenville-Briggs LJ, Griffith J, Grünwald NJ, Horn K, Horner NR, Hu C-H, Huitema E, Jeong D-H, Jones AME, Jones JDG, Jones RW, Karlsson EK, Kunjeti SG, Lamour K, Liu Z, Ma L, Maclean D, Chibucos MC, McDonald H, McWalters J, Meijer HJG, Morgan W, Morris PF, Munro CA, O’Neill K, Ospina-Giraldo M, Pinzón A, Pritchard L, Ramsahoye B, Ren Q, Restrepo S, Roy S, Sadanandom A, Savidor A, Schornack S, Schwartz DC, Schumann UD, Schwessinger B, Seyer L, Sharpe T, Silvar C, Song J, Studholme DJ, Sykes S, Thines M, van de Vondervoort PJI, Phuntumart V, Wawra S, Weide R, Win J, Young C, Zhou S, Fry W, Meyers BC, van West P, Ristaino J, Govers F, Birch PRJ, Whisson SC, Judelson HS, Nusbaum C. 2009. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461:393–398. DOI: 10.1038/nature08358.
Fahlgren N, Sullivan CM, Kasschau KD, Chapman EJ, Cumbie JS, Montgomery TA, Gilbert SD, Dasenko M, Backman TWH, Givan SA, Carrington JC. 2009. Computational and analytical framework for small RNA profiling by high-throughput sequencing. RNA 15:992–1002. DOI: 10.1261/rna.1473809.
Montgomery TA, Yoo SJ, Fahlgren N, Gilbert SD, Howell MD, Sullivan CM, Alexander A, Nguyen G, Allen E, Ahn JH, Carrington JC. 2008. AGO1-miR173 complex initiates phased siRNA formation in plants. Proceedings of the National Academy of Sciences of the United States of America 105:20055–20062. DOI: 10.1073/pnas.0810241105.
Batista PJ, Ruby JG, Claycomb JM, Chiang R, Fahlgren N, Kasschau KD, Chaves DA, Gu W, Vasale JJ, Duan S, Conte D Jr, Luo S, Schroth GP, Carrington JC, Bartel DP, Mello CC. 2008. PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C. elegans. Molecular Cell 31:67–78. DOI: 10.1016/j.molcel.2008.06.002.
Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE, Alexander AL, Chapman EJ, Fahlgren N, Allen E, Carrington JC. 2008. Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. Cell 133:128–141. DOI: 10.1016/j.cell.2008.02.033.
Backman TWH, Sullivan CM, Cumbie JS, Miller ZA, Chapman EJ, Fahlgren N, Givan SA, Carrington JC, Kasschau KD. 2008. Update of ASRP: the Arabidopsis Small RNA Project database. Nucleic Acids Research 36:D982–5. DOI: 10.1093/nar/gkm997.
Liu P-P, Montgomery TA, Fahlgren N, Kasschau KD, Nonogaki H, Carrington JC. 2007. Repression of AUXIN RESPONSE FACTOR10 by microRNA160 is critical for seed germination and post-germination stages. The Plant Journal: for cell and molecular biology 52:133–146. DOI: 10.1111/j.1365-313X.2007.03218.x.
Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC. 2007. Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biology 5:e57. DOI: 10.1371/journal.pbio.0050057.
Howell MD, Fahlgren N, Chapman EJ, Cumbie JS, Sullivan CM, Givan SA, Kasschau KD, Carrington JC. 2007. Genome-wide analysis of the RNA-DEPENDENT RNA POLYMERASE6/DICER-LIKE4 pathway in Arabidopsis reveals dependency on miRNA- and tasiRNA-directed targeting. The Plant Cell 19:926–942. DOI: 10.1105/tpc.107.050062.
Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC. 2007. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PloS ONE 2:e219. DOI: 10.1371/journal.pone.0000219.
Fahlgren N, Montgomery TA, Howell MD, Allen E, Dvorak SK, Alexander AL, Carrington JC. 2006. Regulation of AUXIN RESPONSE FACTOR3 by TAS3 ta-siRNA affects developmental timing and patterning in Arabidopsis. Current Biology: CB 16:939–944. DOI: 10.1016/j.cub.2006.03.065.
Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC. 2005. Expression of Arabidopsis MIRNA genes. Plant Physiology 138:2145–2154. DOI: 10.1104/pp.105.062943.
Please contact us if you have any questions about the Data Science Facility or our research.