The Data Science Facility is one of six core facilities at the Donald Danforth Plant Science Center. We analyze large datasets to address biological problems using computing, mathematics, and statistics.

Data Science Facility

The Data Science Facility at the Donald Danforth Plant Science Center is a computing and data analytics hub that develops and deploys technologies in computer science, mathematics, and statistics to accelerate discoveries from data and models in plant science.

Physically, the facility supports computing through several modalities: 1) high-performance computing and workflow management on an HTCondor cluster; 2) virtualized applications using machine- and container-level virtualization; 3) web/database applications and support. Currently, the infrastructure contains over 1300 processing cores and 2800 graphics processing cores, more than 8 terabytes of memory, and a single, high-performance 721 terabyte storage area network. These resources are shared in a managed, multi-user environment and communicate via a 10 gigabit Ethernet network. Management of the system is simplified through virtualization of key services, which also allows for the deployment of diverse applications and platforms simultaneously.

Services offered by the Data Science Facility include 1) user services: authentication services/user accounts, software installation, patches and upgrades, troubleshooting, advising, Slack (virtual help desk), GitHub (version control), training (system usage, specific software, workflows), documentation, and outreach; 2) computing: cluster resources, web server hosting, database server hosting, maintenance/upgrades, system monitoring, and virtual machine and container management; 3) storage: monitoring, performance configuration, and maintenance. Additionally, the facility consults on the development of computational, data analysis, and experimental design components of proposals and assists with editing of computational and statistical analysis sections of manuscripts. The core facility also offers analysis services, ranging from whole project consulting to individual analyses.

Intellectual development is offered by members of the facility through regular workshops and training events, custom application development for lab or group projects, and community-based sharing of software, ideas, and methods. In addition, the facility enhances interaction between groups at the center and partner institutions and facilitates interoperation between local computing and storage resources and public/private cloud/cyberinfrastructures such as Amazon Web Services, CyVerse, and Open Science Grid.

Team Members

Noah Fahlgren
Director and Principal Investigator

Josh Rothhaupt
Manager

Haley Schuhl
Computational Scientist

Wenxiao Zhan
Computational Scientist

Seth Polydore
Postdoctoral Associate

Previous Team Members

Arash Abbasi
Assistant Professor, Dakota State University

Michael Miller
Graduate Student, Cornell University

Alex Pokorny
Graduate Student, Illinois State University

Monica Tessman
Software Developer, Bayer Crop Science

Research

The Data Science group at the Danforth Center uses and develops computational approaches and infrastructure that leverage large datasets to address biological problems. We emphasize the development of modular, reusable, and open-source tools through collaborator- and community-driven efforts. Our aim is to apply these tools to high-throughput genotyping and phenotyping data to identify the genetic basis of traits in research model plants and biofuel and food security crops.

The ability to rapidly and non-destructively measure plant physical and physiological features is a key bottleneck in plant research and breeding. Imaging coupled with computer vision algorithms and statistical analysis are a set of technologies that have the potential to address the plant phenotyping bottleneck, but they introduce their own computing, interpretation, and data management challenges that our group develops tools to address so that these technologies can be utilized more broadly by the scientific community. Plant Computer Vision (PlantCV) is our primary platform for developing a plant phenotyping toolbox. Through PlantCV we are deploying computer vision, machine learning, and other data science algorithms to extract biologically relevant data from image and sensor datasets.

A major emphasis of the Data Science group is collaboration, which enables us to apply the tools we develop to a variety of plant systems. Diverse candidate biofuel feedstocks such as Camelina sativa (oilseed) and Sorghum bicolor (lignocellulosic feedstock) are major focuses in the group where we are utilizing natural variation and high-throughput phenotyping to study the genetic basis of traits that could improve these crops for bio-based fuels. We are also developing tools for model systems (e.g. Arabidopsis thaliana and Setaria viridis), food security crops (e.g. cassava), and other systems for producing plant natural products (e.g. indigo).

Ongoing Projects

PlantCV

Project homepage: https://plantcv.danforthcenter.org
Project summary:: PlantCV is an image analysis software package for plant phenotyping. Our long-term goals for the PlantCV project are to: 1) Provide a common interface for a collection of image analysis techniques that are integrated from a variety of source packages and algorithms; 2) Utilize a modular architecture that enables flexibility in the design of analysis workflows and rapid assimilation and integration of new methods; 3) Develop openly in real-time in the cloud using an open-source framework to rapidly disseminate new methods and to encourage open contribution by stakeholders; and 4) Provide a simplified interface for users to utilize the underlying tools and build custom analysis workflows without significant experience with programming.

Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus

Funding agency: National Science Foundation Office of Advanced Cyberinfrastructure (OAC)
Program: Cyberinfrastructure for Sustained Scientific Innovation (CSSI) - Data and Software
Grant number: 1835543
Project homepage: https://clowder.ncsa.illinois.edu
Project summary: Preserving, sharing, navigating, and reusing large and diverse collections of data is now essential to scientific discoveries in areas such as phenomics, materials science, geoscience, and urban science. These data navigation needs are also important when addressing the growing number of research areas where data and tools must span multiple domains. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community. Clowder, an active curation based data management system, addresses these needs and challenges by distributing much of the data curation overhead throughout the lifecycle of the data, augmenting this with social curation and automated analysis tools, and providing extensible community-dependent means of viewing and navigating data. As an open source framework, built to be extensible at every level, Clowder is capable of interacting with and utilizing a variety of community tools while also supporting different data governance and ownership requirements.

The project enhances Clowder's core systems for the benefit of a larger group of users. It increases the level of interoperability with community resources, hardens the core software, and distributes core software development, while continuing to expand usage. Governance mechanisms and a business model are established to make Clowder sustainable, creating an appropriate governance structure to ensure that the software continues to be available, supportable, and usable. The effort engages a number of stakeholders, taking data from diverse but converging scientific domains already using the Clowder framework, to address broad interoperability and cross domain data sharing. The overall effort will transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.

STTR Phase II: Novel Analysis Tools for Production of Higher Indican Yielding Plants for Bio-based Indigo

Funding agency: National Science Foundation Industrial Innovation and Partnerships (IIP)
Program: Small Business Technology Transfer Program Phase II (STTR)
Grant number: 1831949
Project summary: This Small Business Technology Transfer (STTR) Phase II project will develop a new tool for the analysis of in situ indican precursor in indigo plants, which when combined with genomic analysis and genetic linkage mapping in selectively bred indigo crops, will lead to high indican yielding breeding parental lines and ultimately competitively price natural indigo dye. Additionally, characterization of genetic markers will accelerate further crop improvement by understanding and harnessing the genes crucial to indican synthesis and other aspects of significance to overall indigo yield. These advancements will benefit customers, denim mills, by leading to a more reliable, lower cost plant-derived indigo supply. The success of this multiphase STTR project will enable a cost-competitive, cleaner, and more sustainable denim dyeing process, while greatly expanding the market for domestically produced natural indigo. Commercialization of a more consistent and higher yielding US-grown indigo plant that produces high purity indigo powder can replace the current standard of synthetic, imported indigo powder. While in demand by the marketplace today, plant-derived indigo is currently only used in premium denim products due to the high cost per pound resulting from low yields per plant per acre. The direct result of this research will be to open new market segments and expand existing market penetration for US-grown and manufactured biobased indigo for the textile industry, an addressable market of $1.86B. The methods and technology developed through this project have a direct path to the commercial marketplace and the industry is ready to support biobased textile dyes such as plant-derived indigo.

During this project, reference genome resources will be constructed for indigo feedstock crops, F1 mapping populations will be constructed for P. tinctoria, I. tinctoria, and I. suffruticosa varieties, and design of the handheld rapid assay device will be validated through laboratory analysis. The reference genome for I. suffruticosa will be built using Pacific Biosciences SMRT sequencing and assembly. In addition to this reference genome, whole-genome resequencing will be performed on available I. suffruticosa and I. tinctoria varieties. Nucleotide variation and variant effect prediction will be made between the Indigofera-based indigo varieties. This variation will be used to develop markers to evaluate intervarietal crosses. Controlled greenhouse crosses of three species will be made to create F1 mapping populations for use in constructing a genetic map and linkage mapping, in combination with the resources developed. Intraspecies varieties exhibiting distinct phenotypic traits of commercial interest or notable dye yield differences will be selected for crossing to generate the F1 populations. Results from laboratory-based fluorometry equipment will be evaluated for efficacy and then compared against extractive indigo dye analysis from the leaf biomass. A final prototype will be constructed based on these findings and validated through use in the laboratory and in the field.

Optimizing Tradeoffs Implicit During Bioenergy Crop Improvement: Understanding the Effect of Altered Cell Wall and Sugar Content on Sorghum-associated Pathogenic Bacteria

Funding agency: US Department of Energy Office of Biological & Environmental Research (BER)
Program: Plant Feedstocks Genomics for Bioenergy
Grant number: DE-SC0018072
Project summary: High-biomass-yielding crops may harbor modifications to cell walls, which are a major barrier to pathogen entry, and to the tissue distribution of sugars, which are the pathogen’s food source; hence they are likely to present previously unseen challenges for disease resistance. Xanthomonas is a known pathogen of sorghum (Sorghum bicolor (L.) Moench), though the incidence and impact of the disease has historically been low. We are working to establish the sorghum – Xanthomonas pathosystem as a model for deducing how latent microbial pathogens might exploit key biofuel crop traits. Our approach will be to quantitatively model the disease triangle that describes sorghum, pathogenic bacteria, and the environment. Field and laboratory experiments will be combined to determine bacterial susceptibility of genetically diverse sorghum genotypes that differ in cell wall and sugar composition. Standard plant pathology techniques combined with powerful phenomics approaches will provide a holistic view of this pathosystem within variable environments. Further, transcriptomics will be employed to elucidate mechanisms used by bacterial pathogens to induce sorghum susceptibility. Microbial pathogens are known to manipulate the sugar and cell wall characteristics of their hosts. Consequently, these characteristics will be analyzed during pathogen invasion. This research will reveal the mechanisms underlying tolerance to pathogens that must be maintained during biofuel trait optimization. The proposed research will yield a detailed understanding of the impact of bioenergy relevant traits on pathogen susceptibility. This is a necessary first step towards the development of novel routes for disease control that can be deployed in parallel with targeted alterations to sugar and cell wall composition during bioenergy crop improvement and breeding efforts.

Genomics and Phenomics to Identify Yield and Drought Tolerance Alleles for Improvement of Camelina as a Biofuel Crop

Funding agency: US Department of Agriculture National Institute of Food and Agriculture
Program: Plant Feedstocks Genomics for Bioenergy/Agriculture and Food Research Initiative (AFRI)
Grant number: 2016-67009-25639
Project summary: Plant oils represent an outstanding potential source of energy-dense hydrocarbons that can be used for fuels and industrial raw materials, but a major challenge is to produce these oils in non-food oilseed crops that have high yields and can grow under marginal and varied climatic conditions. In recent years, Camelina sativa has received considerable attention as a potential non-food biofuels crop, but significant challenges remain to develop stable, high-yielding, geographically adapted germplasm suitable for biofuels production. We will utilize advanced high-throughput phenotyping and genomics-based approaches to discover useful gene/alleles controlling seed yield and oil content and quality in camelina under water-limited conditions, and will identify high-yielding cultivars suitable for production in different geographical regions. The project includes three primary objectives: 1) Develop and apply automated, non-destructive high-throughput phenotyping (HTP) protocols to evaluate the phenotypic diversity and stress tolerance of a camelina panel consisting of 250 accessions, grown under well-watered and water-limited conditions. 2) Discover alleles/genes controlling morphological, physiological, seed, and oil yield properties using genome-wide association studies (GWAS). 3) Identify, test, and validate useful germplasm, including transgenic lines producing drop-in ready jet fuel components, under diverse environments and marginal production areas. Taken together, this project will significantly advance the utilization of non-food oilseed crops for biofuel production and provide guidance and insight for future studies of phenomics-based crop improvement.

The Missouri Transect: Climate, Plants, and Community

Funding agency: National Science Foundation Office of Integrative Activities
Program: Established Program to Stimulate Competitive Research (EPSCoR) Research Infrastructure Improvement Program Track-1: (RII Track-1)
Grant number: IIA-1355406
Project homepage: https://missouriepscor.org
Project summary: The Missouri Transect is a five-year effort to build infrastructure, knowledge, and collaborations in research and education across Missouri. The research and education activities are focused on understanding, modeling, and predicting 1) short- and long-term trends in temperature and water availability; 2) the impact of these trends on the productivity of native flora and agriculture crops; and 3) how different stakeholder communities are likely to respond to a changing climate. We have assembled interdisciplinary teams who will focus on specific areas of research and education and interact collaboratively to build the research platform across the state.

Mission: The Missouri Transect will enhance our state’s capacity to model and respond to the effects of climate change on plants and community on a local scale.

Vision: Missouri EPSCoR will enhance the state’s infrastructure for science and technology, stimulating Missouri’s economy and leading to job creation.

Members of the Climate, Plant, and Community Teams bring together their diverse research strengths and approaches to bear on three overlapping research questions: How do short- and long-term climate-scale trends affect crops and the natural flora? What genes are important to plant responses to these environmental variations? and How resilient are Missouri communities to variations in the climate and its impact on plants?

The teams assembled have a range of expertise, experience, and capabilities in research, education, diversity, outreach, and commercialization. Team members are drawn from participating institutions and are leaders in their respective fields.

Expected outcomes from this project include a strong climate monitoring infrastructure with the ability to predict daily, seasonal, annual and future variability; drought-tolerant crops; a regional resilience framework to help Missouri communities respond to climate change; modernization of Missouri’s cyberinfrastructure capable of handling “Big Data”; and a more technologically advanced workforce.

Publications

Zheng X, Carrington JC, Fahlgren N, Abbasi A, Berry JC. 2018. Antiviral functions of ARGONAUTE proteins during Turnip Crinkle Virus infection revealed by image-based trait analysis in Arabidopsis. bioRxiv:487322. DOI: 10.1101/487322.

Berry JC, Fahlgren N, Pokorny AA, Bart RS, Veley KM. 2018. An automated, high-throughput method for standardizing image color profiles to improve image-based plant phenotyping. PeerJ 6:e5727. DOI: 10.7717/peerj.5727.

Feldman MJ, Ellsworth PZ, Fahlgren N, Gehan MA, Cousins AB, Baxter I. 2018. Components of water use efficiency have unique genetic signatures in the model C4 grass Setaria. Plant Physiology 178:699–715. DOI: 10.1104/pp.18.00146.

Burnette M, Kooper R, Maloney JD, Rohde GS, Terstriep JA, Willis C, Fahlgren N, Mockler T, Newcomb M, Sagan V, Andrade-Sanchez P, Shakoor N, Sidike P, Ward R, LeBauer D. 2018. TERRA-REF Data Processing Infrastructure. In: Proceedings of the Practice and Experience on Advanced Research Computing. ACM, 27. DOI: 10.1145/3219104.3219152.

Li H, Yin Z, Manley P, Burken JG, Shakoor N, Fahlgren N, Mockler T. 2018. Early drought plant stress detection with bi-directional long-term memory networks. Photogrammetric Engineering & Remote Sensing 84:459–468. DOI: 10.14358/PERS.84.7.459.

Tovar JC, Hoyer JS, Lin A, Tielking A, Callen ST, Elizabeth Castillo S, Miller M, Tessman M, Fahlgren N, Carrington JC, Nusinow DA, Gehan MA. 2018. Raspberry Pi-powered imaging for plant phenotyping. Applications in Plant Sciences 6:e1031. DOI: 10.1002/aps3.1031.

Gehan MA, Fahlgren N, Abbasi A, Berry JC, Callen ST, Chavez L, Doust AN, Feldman MJ, Gilbert KB, Hodge JG, Hoyer JS, Lin A, Liu S, Lizárraga C, Lorence A, Miller M, Platon E, Tessman M, Sax T. 2017. PlantCV v2: Image analysis software for high-throughput plant phenotyping. PeerJ 5:e4088. DOI: 10.7717/peerj.4088.

Fahlgren N, Bart R, Herrera-Estrella L, Rellán-Álvarez R, Chitwood DH, Dinneny JR. 2016. Plant scientists: GM technology is safe. Science 351:824. DOI: 10.1126/science.351.6275.824-a.

Fahlgren N, Hill ST, Carrington JC, Carbonell A. 2016. P-SAMS: a web site for plant artificial microRNA and synthetic trans-acting small interfering RNA design. Bioinformatics 32:157–158. DOI: 10.1093/bioinformatics/btv534.

Abbasi A, Fahlgren N. 2016. Naïve Bayes pixel-level plant segmentation. In: 2016 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, 1–4. DOI: 10.1109/WNYIPW.2016.7904790.

Wang H, Beyene G, Zhai J, Feng S, Fahlgren N, Taylor NJ, Bart R, Carrington JC, Jacobsen SE, Ausin I. 2015. CG gene body DNA methylation changes and evolution of duplicated genes in cassava. Proceedings of the National Academy of Sciences of the United States of America 112:13729–13734. DOI: 10.1073/pnas.1519067112.

Fahlgren N, Feldman M, Gehan MA, Wilson MS, Shyu C, Bryant DW, Hill ST, McEntee CJ, Warnasooriya SN, Kumar I, Ficor T, Turnipseed S, Gilbert KB, Brutnell TP, Carrington JC, Mockler TC, Baxter I. 2015. A versatile phenotyping system and analytics platform reveals diverse temporal responses to water availability in Setaria. Molecular Plant 8:1520–1535. DOI: 10.1016/j.molp.2015.06.005.

Carbonell A, Fahlgren N, Mitchell S, Cox KL Jr, Reilly KC, Mockler TC, Carrington JC. 2015. Highly specific gene silencing in a monocot species by artificial microRNAs derived from chimeric miRNA precursors. The Plant Journal: for cell and molecular biology 82:1061–1075. DOI: 10.1111/tpj.12835.

Fahlgren N, Gehan MA, Baxter I. 2015. Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Current Opinion in Plant Biology 24:93–99. DOI: 10.1016/j.pbi.2015.02.006.

Garcia-Ruiz H, Carbonell A, Hoyer JS, Fahlgren N, Gilbert KB, Takeda A, Giampetruzzi A, Garcia Ruiz MT, McGinn MG, Lowery N, Martinez Baladejo MT, Carrington JC. 2015. Roles and programming of Arabidopsis ARGONAUTE proteins during Turnip Mosaic Virus infection. PLoS Pathogens 11:e1004755. DOI: 10.1371/journal.ppat.1004755.

Gilbert KB, Fahlgren N, Kasschau KD, Chapman EJ, Carrington JC, Carbonell A. 2014. Preparation of multiplexed small RNA libraries from plants. Bio-protocol 4:e1275. DOI: 10.21769/BioProtoc.1275.

Carbonell A, Takeda A, Fahlgren N, Johnson SC, Cuperus JT, Carrington JC. 2014. New generation of artificial microRNA and synthetic trans-acting small interfering RNA vectors for efficient gene silencing in Arabidopsis. Plant Physiology 165:15–29. DOI: 10.1104/pp.113.234989.

Jeong D-H, Schmidt SA, Rymarquis LA, Park S, Ganssmann M, German MA, Accerbi M, Zhai J, Fahlgren N, Fox SE, Garvin DF, Mockler TC, Carrington JC, Meyers BC, Green PJ. 2013. Parallel analysis of RNA ends enhances global investigation of microRNAs and target RNAs of Brachypodium distachyon. Genome Biology 14:R145. DOI: 10.1186/gb-2013-14-12-r145.

Fahlgren N, Bollmann SR, Kasschau KD, Cuperus JT, Press CM, Sullivan CM, Chapman EJ, Hoyer JS, Gilbert KB, Grünwald NJ, Carrington JC. 2013. Phytophthora have distinct endogenous small RNA populations that include short interfering and microRNAs. PloS ONE 8:e77181. DOI: 10.1371/journal.pone.0077181.

Carbonell A, Fahlgren N, Garcia-Ruiz H, Gilbert KB, Montgomery TA, Nguyen T, Cuperus JT, Carrington JC. 2012. Functional analysis of three Arabidopsis ARGONAUTES using slicer-defective mutants. The Plant Cell 24:3613–3629. DOI: 10.1105/tpc.112.099945.

Zhang C, Montgomery TA, Fischer SEJ, Garcia SMDA, Riedel CG, Fahlgren N, Sullivan CM, Carrington JC, Ruvkun G. 2012. The Caenorhabditis elegans RDE-10/RDE-11 complex regulates RNAi by promoting secondary siRNA amplification. Current Biology: CB 22:881–890. DOI: 10.1016/j.cub.2012.04.011.

Fischer SEJ, Montgomery TA, Zhang C, Fahlgren N, Breen PC, Hwang A, Sullivan CM, Carrington JC, Ruvkun G. 2011. The ERI-6/7 helicase acts at the first stage of an siRNA amplification pathway that targets recent gene duplications. PLoS Genetics 7:e1002369. DOI: 10.1371/journal.pgen.1002369.

Hu TT, Pattyn P, Bakker EG, Cao J, Cheng J-F, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo Y-L. 2011. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nature Genetics 43:476–481. DOI: 10.1038/ng.807.

Cuperus JT, Fahlgren N, Carrington JC. 2011. Evolution and functional diversification of MIRNA genes. The Plant Cell 23:431–442. DOI: 10.1105/tpc.110.082784.

Zhang C, Montgomery TA, Gabel HW, Fischer SEJ, Phillips CM, Fahlgren N, Sullivan CM, Carrington JC, Ruvkun G. 2011. mut-16 and other mutator class genes modulate 22G and 26G siRNA pathways in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America 108:1201–1208. DOI: 10.1073/pnas.1018695108.

Cuperus JT, Carbonell A, Fahlgren N, Garcia-Ruiz H, Burke RT, Takeda A, Sullivan CM, Gilbert SD, Montgomery TA, Carrington JC. 2010. Unique functionality of 22-nt miRNAs in triggering RDR6-dependent siRNA biogenesis from target transcripts in Arabidopsis. Nature Structural & Molecular Biology 17:997–1003. DOI: 10.1038/nsmb.1866.

Fahlgren N, Jogdeo S, Kasschau KD, Sullivan CM, Chapman EJ, Laubinger S, Smith LM, Dasenko M, Givan SA, Weigel D, Carrington JC. 2010. MicroRNA gene evolution in Arabidopsis lyrata and Arabidopsis thaliana. The Plant Cell 22:1074–1089. DOI: 10.1105/tpc.110.073999.

International Brachypodium Initiative. 2010. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768. DOI: 10.1038/nature08747.

Garcia-Ruiz H, Takeda A, Chapman EJ, Sullivan CM, Fahlgren N, Brempelis KJ, Carrington JC. 2010. Arabidopsis RNA-dependent RNA polymerases and Dicer-Like proteins in antiviral defense and small interfering RNA biogenesis during Turnip Mosaic Virus infection. The Plant Cell 22:481–496. DOI: 10.1105/tpc.109.073056.

Cuperus JT, Montgomery TA, Fahlgren N, Burke RT, Townsend T, Sullivan CM, Carrington JC. 2010. Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing. Proceedings of the National Academy of Sciences of the United States of America 107:466–471. DOI: 10.1073/pnas.0913203107.

Fahlgren N, Carrington JC. 2010. miRNA Target Prediction in Plants. In: Meyers BC, Green PJ eds. Plant MicroRNAs. Methods in Molecular Biology. Totowa, NJ: Humana Press, 51–57. DOI: 10.1007/978-1-60327-005-2_4.

Klevebring D, Street NR, Fahlgren N, Kasschau KD, Carrington JC, Lundeberg J, Jansson S. 2009. Genome-wide profiling of populus small RNAs. BMC Genomics 10:620. DOI: 10.1186/1471-2164-10-620.

Gu W, Shirayama M, Conte D Jr, Vasale J, Batista PJ, Claycomb JM, Moresco JJ, Youngman EM, Keys J, Stoltz MJ, Chen C-CG, Chaves DA, Duan S, Kasschau KD, Fahlgren N, Yates JR 3rd, Mitani S, Carrington JC, Mello CC. 2009. Distinct argonaute-mediated 22G-RNA pathways direct genome surveillance in the C. elegans germline. Molecular Cell 36:231–244. DOI: 10.1016/j.molcel.2009.09.020.

Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE, Cano LM, Grabherr M, Kodira CD, Raffaele S, Torto-Alalibo T, Bozkurt TO, Ah-Fong AMV, Alvarado L, Anderson VL, Armstrong MR, Avrova A, Baxter L, Beynon J, Boevink PC, Bollmann SR, Bos JIB, Bulone V, Cai G, Cakir C, Carrington JC, Chawner M, Conti L, Costanzo S, Ewan R, Fahlgren N, Fischbach MA, Fugelstad J, Gilroy EM, Gnerre S, Green PJ, Grenville-Briggs LJ, Griffith J, Grünwald NJ, Horn K, Horner NR, Hu C-H, Huitema E, Jeong D-H, Jones AME, Jones JDG, Jones RW, Karlsson EK, Kunjeti SG, Lamour K, Liu Z, Ma L, Maclean D, Chibucos MC, McDonald H, McWalters J, Meijer HJG, Morgan W, Morris PF, Munro CA, O’Neill K, Ospina-Giraldo M, Pinzón A, Pritchard L, Ramsahoye B, Ren Q, Restrepo S, Roy S, Sadanandom A, Savidor A, Schornack S, Schwartz DC, Schumann UD, Schwessinger B, Seyer L, Sharpe T, Silvar C, Song J, Studholme DJ, Sykes S, Thines M, van de Vondervoort PJI, Phuntumart V, Wawra S, Weide R, Win J, Young C, Zhou S, Fry W, Meyers BC, van West P, Ristaino J, Govers F, Birch PRJ, Whisson SC, Judelson HS, Nusbaum C. 2009. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461:393–398. DOI: 10.1038/nature08358.

Fahlgren N, Sullivan CM, Kasschau KD, Chapman EJ, Cumbie JS, Montgomery TA, Gilbert SD, Dasenko M, Backman TWH, Givan SA, Carrington JC. 2009. Computational and analytical framework for small RNA profiling by high-throughput sequencing. RNA 15:992–1002. DOI: 10.1261/rna.1473809.

Montgomery TA, Yoo SJ, Fahlgren N, Gilbert SD, Howell MD, Sullivan CM, Alexander A, Nguyen G, Allen E, Ahn JH, Carrington JC. 2008. AGO1-miR173 complex initiates phased siRNA formation in plants. Proceedings of the National Academy of Sciences of the United States of America 105:20055–20062. DOI: 10.1073/pnas.0810241105.

Batista PJ, Ruby JG, Claycomb JM, Chiang R, Fahlgren N, Kasschau KD, Chaves DA, Gu W, Vasale JJ, Duan S, Conte D Jr, Luo S, Schroth GP, Carrington JC, Bartel DP, Mello CC. 2008. PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C. elegans. Molecular Cell 31:67–78. DOI: 10.1016/j.molcel.2008.06.002.

Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE, Alexander AL, Chapman EJ, Fahlgren N, Allen E, Carrington JC. 2008. Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. Cell 133:128–141. DOI: 10.1016/j.cell.2008.02.033.

Backman TWH, Sullivan CM, Cumbie JS, Miller ZA, Chapman EJ, Fahlgren N, Givan SA, Carrington JC, Kasschau KD. 2008. Update of ASRP: the Arabidopsis Small RNA Project database. Nucleic Acids Research 36:D982–5. DOI: 10.1093/nar/gkm997.

Liu P-P, Montgomery TA, Fahlgren N, Kasschau KD, Nonogaki H, Carrington JC. 2007. Repression of AUXIN RESPONSE FACTOR10 by microRNA160 is critical for seed germination and post-germination stages. The Plant Journal: for cell and molecular biology 52:133–146. DOI: 10.1111/j.1365-313X.2007.03218.x.

Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC. 2007. Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biology 5:e57. DOI: 10.1371/journal.pbio.0050057.

Howell MD, Fahlgren N, Chapman EJ, Cumbie JS, Sullivan CM, Givan SA, Kasschau KD, Carrington JC. 2007. Genome-wide analysis of the RNA-DEPENDENT RNA POLYMERASE6/DICER-LIKE4 pathway in Arabidopsis reveals dependency on miRNA- and tasiRNA-directed targeting. The Plant Cell 19:926–942. DOI: 10.1105/tpc.107.050062.

Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC. 2007. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PloS ONE 2:e219. DOI: 10.1371/journal.pone.0000219.

Fahlgren N, Montgomery TA, Howell MD, Allen E, Dvorak SK, Alexander AL, Carrington JC. 2006. Regulation of AUXIN RESPONSE FACTOR3 by TAS3 ta-siRNA affects developmental timing and patterning in Arabidopsis. Current Biology: CB 16:939–944. DOI: 10.1016/j.cub.2006.03.065.

Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC. 2005. Expression of Arabidopsis MIRNA genes. Plant Physiology 138:2145–2154. DOI: 10.1104/pp.105.062943.

Contact

Please contact us if you have any questions about the Data Science Facility or our research.

nfahlgren@danforthcenter.org