Processing Fungal Community Datasets Generated From Next-Generation Sequencing

To provide a reply to queries I recently received about a ‘best practices’ guide to analyzing fungal NGS-based datasets, I have compiled a variety of thoughts within this site. As many of you know, I came into the NGS world late so don’t feel like I have a full grasp on all the issues.

That said, I have found from my own experiences with fungal NGS data that there are multiple steps in the current informatics pipelines that require some form of manual control to maximize signal-noise ratios. There are many good research groups thinking about this and I encourage the folks reading this to reach out to them (in no particular order: Amend and Nguyen labs at Hawaii, Baldrian lab in Czech Republic, Peay lab at Stanford, Lindahl Lab in Sweden, Kauserad lab in Norway, Lindner and Lankau labs at UW Madison, Tedersoo and Opik labs in Estonia, Taylor lab at UNM, Shawn Brown at OSU, Zewei Song at UMN, Ylva Lekberg at MPG Ranch, Robert Edgar as an independent researcher, to name a few). If you have not already read the Nguyen et al. (2015) New Phyt paper, please do so before proceeding. That contains much of my core thinking about the topic.

The only thing that I really think should be a “rule” in fungal (and any other organismal group) amplicon-based NGS work is the inclusion of both positive and negative controls. There are so many sources of error and contamination using these methods that without both types of controls it is hard to feel confident the downstream analyses. Along these lines, Michelle Jusino, Dan Lindner, and others at UW Madison have built a synthetic mock community that looks super promising and they would like it to be used by other researchers. Contact them directly for details. I know others are building better mocks in terms of phylogenetic coverage than the one we have used in the my lab and the bacterial folks have a “mockrobiota” project that might be worth checking out.