Taxonomic Assignments

We have found that using Dylan and Kabir’s ITS1 primers that the forward primer sits 26 bp into the 18S gene, so there are 26 bp that should be trimmed to start at the front of ITS1. You can do this in the early sequence filtering steps. Zewei Song (the writer of the FAST pipeline, check GitHub) had also found a 14 bp tail that should be trimmed from the 3’ end of the sequences if the sequences are paired.

One important step that I have yet to see cleanly inserted in any NGS analysis pipeline is when you get a high % similarity match to a fungus in the database, but only over a relatively short part of the sequence. To address this, we have calculated the length/qlength of each OTU (see Nguyen et al. 2015 for further discussion of this issue). Here is where a mock community can really help (to figure where to make cut-offs), but in general we have found that sequences below a 0.75 ratio are much less likely to be fungal (we have often used 0.85 as a cut-off as well). The same concept can be achieved with BLAST e-values, but we have been manually calculating the ratio outside our informatics pipelines, than eliminating all OTUs that don’t meet the criteria, and jumping back into the remaining pipeline analyses with the OTUs that satisfy our criteria.

