Additional OTU Filtering

We have recently been following the combined recommendations of Oliver et al. (2015, Fungal Ecology) and Lindahl et al. (2013, New Phyt) with regard to our OTU filtering of rare sequences. Specifically, Oliver recommends dropping any OTU with less than ~10 total sequences. Lindahl recommends dropping values with <5 sequences in a given sample to zero. These are related but different recommendations. The former is about simply cutting the long OTU tail at a given place and the latter is about tidying the OTU table based mainly on the issue of tag switching (Carlsen et al. 2012, Fungal Ecology). I think that combining both seems like a reasonable way to proceed.

We have recently explored additional ways to figure out what is the magic number at which to cut the sequence tail (knowing that singleton OTUS are a problem - see Tedersoo and Edgar papers - but likely additional low abundance OTUs are as well - see Nguyen et al. 2015). In the recent Nguyen et al. (2016) Molecular Ecology paper, we wrote an R script that allowed us to somewhat objectively decide where that cut-off should be (I say somewhat because you have designate a level at which you feel the number of sequence reads is real and then you work progressively from there to see much further into the tail you want to include). Others have used OTU sequence percentage cutoffs – e.g. only OTUs containing at least >0.0001% of the total sequences - which seem okay to me if they are done in a way that is not 100% arbitrary. Note using the percent cut-off doesn’t do the same thing as removing reads on a per sample basis as is recommended by Lindahl et al. (2013).

