What is it and when to use it

The mock_composition file is a CSV file with the following columns:

  • sample: Name of the mock sample
  • action:
    • keep: Expected ASV in the mock, that should be kept in the data set
    • tolerate: ASV that can be present in a mock, but it is not essential to keep it in the data set (e.g. badly amplified organism)
  • asv: sequence of the ASV
  • taxon: Optional; Name of the organism
  • asv_id: Optional; If there is a conflict between asv and asv_id, the asv_id is ignored

This file is essential in the MakeKnownOccurrences function, that

  • identifies known False Positives (FP) in control samples (mock and negative controls),
  • identifies missing occurrences (FN = False Negatives in mock samples)
  • calculates performance metrics (precision and sensitivity) based on control samples.
  • produces a known_occurrences file or data frame with the list of expected occurrences (TP = True Positives) in mock samples and FP in all control samples.

The known_occurrences file is necessary for running the Optimize functions (OptimizePCRerror, OptimizeLFNsampleReplicate, OptimizeLFNreadCountLFNvariant) to find the best parameter values for the LFN filters (LFNsampleReplicate, LFNvariant, LFNreadCount) and FilterPCRerror.

The mock_composition is also useful, although not essential for the WriteASVtable function if you wish to add a column in the output to easily find expected occurrences in each mock sample.

Make mock_composition

Let’s see how to identify the expected mock ASV form your data.

The idea is to

  • Prefilter your data set
  • Assign ASV to taxa
  • Examine the ASV in the mock samples and their read counts and pick the correct sequences.

I suggest that you start by filtering/denoising your data set by using at least some of the following functions as in the Tutorial. This will eliminate most of the erroneous ASV, so it will be easier to identify the expected ASV from your mock samples.

Set parametres and access the demo files

We will use the some of the files created by the first part of the Tutorial (Till the FilterRenkonen)

  • The demo files are included in the vtamR package, hence the use of system.file(). When using your own data just enter your file names.
  • read_count_file is the output of FilterRenkonen of the Tutorial.
  • The blast_db and taxonomy, blast_path, num_threads are set up as in the Tutorial
library(vtamR)
library(dplyr)

read_count_file <- system.file("extdata/demo/7_FilterRenkonen.csv", package = "vtamR")
taxonomy <- system.file("extdata/db_test/taxonomy_reduced.tsv", package = "vtamR")
blast_db <- system.file("extdata/db_test", package = "vtamR")
blast_db <- file.path(blast_db, "COInr_reduced")
blast_path <- "~/miniconda3/envs/vtam/bin/blastn"
num_threads = 8

Assign taxa to ASVs

TaxAssign will assign all asv in the input cas file or data frame (read_count_file).

See more details of taxonomic assignment here.

asv_tax <- TaxAssign(asv=read_count_file, 
                     taxonomy=taxonomy, 
                     blast_db=blast_db, 
                     blast_path=blast_path, 
                     num_threads=num_threads
                     )

Pool replicates by sample

See details of PoolReplicates here.

read_count_samples_df <- PoolReplicates(read_count_file)

Make an ASV table with taxonomic assignments

Make a data frame with ASVs and read counts in the wide format and add the total number of reads for each ASV and the number of samples they are present (add_sums_by_asv=T) and their taxonomic assignment. This format is easier to read for humans, than the read_count_df.

See details of WriteASVtable here.

sortedinfo <- system.file("extdata/demo/sortedinfo.csv", package = "vtamR")
tmp_asv_table <- WriteASVtable(read_count_samples_df, 
                               sortedinfo=sortedinfo, 
                               add_sums_by_asv=T, 
                               asv_tax=asv_tax)

If there are many samples it might be better to select only mock samples and pertinent columns. In this example, tpos1 is the name of one of the mock samples. You can do this separately for each mock.

asv_tpos1 <- tmp_asv_table %>%
  select(tpos1, Total_number_of_reads, Number_of_samples, asv_id, 
         phylum, class, order, family, genus, species, asv
         ) %>%
  filter(tpos1 > 0) %>%
  arrange(desc(tpos1))

Let’s see the ASV present in tpos1.

knitr::kable(asv_tpos1, format = "markdown")
tpos1 Total_number_of_reads Number_of_samples asv_id phylum class order family genus species asv
16348 16362 2 1227 Arthropoda Insecta Trichoptera Hydropsychidae Hydropsyche Hydropsyche pellucidula CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATTGTTACATCTCATGCATTTATTATAATTTTTTTTATAGTTATACCAATCATAATT
4322 4328 2 2009 Arthropoda Insecta Ephemeroptera Baetidae Baetis Baetis rhodani TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT
1616 1619 2 429 Arthropoda Insecta Diptera Chironomidae Rheocricotopus Rheocricotopus chalybeatus ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT
446 446 1 2271 Arthropoda Insecta Ephemeroptera Baetidae Baetis Baetis fuscatus TTTATATTTCATTTTTGGTGCATGATCAGGTATGGTGGGTACTTCCCTTAGTTTATTAATTCGAGCAGAACTTGGTAATCCTGGTTCTTTGATTGGCGATGATCAGATTTATAACGTTATTGTCACTGCCCATGCTTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT
341 341 1 1688 Arthropoda Insecta Diptera Chironomidae Synorthocladius Synorthocladius semivirens CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT
297 297 1 120 Arthropoda Insecta Ephemeroptera Caenidae Caenis Caenis pusilla ACTATATTTTATTTTTGGGGCTTGATCCGGAATGCTGGGCACCTCTCTAAGCCTTCTAATTCGTGCCGAGCTGGGGCACCCGGGTTCTTTAATTGGCGACGATCAAATTTACAATGTAATCGTCACAGCCCATGCTTTTATTATGATTTTTTTCATGGTTATGCCTATTATAATC
220 221 2 708 Chordata Actinopteri Cypriniformes Leuciscidae Phoxinus Phoxinus phoxinus CCTTTATCTTGTATTTGGTGCCTGGGCCGGAATGGTAGGGACCGCCCTAAGCCTTCTTATTCGGGCCGAACTAAGCCAGCCTGGCTCGCTATTAGGTGATAGCCAAATTTATAATGTTATTGTTACCGCCCACGCCTTCGTAATAATTTTCTTTATAGTCATGCCAATTCTCATT
215 215 1 27 NA NA NA NA NA NA ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTACAGCTCACGCATTCCTAATGATCTTTTTCATGGTTATGCCAGGACTTGTT
157 157 1 600 NA NA NA NA NA NA ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATAACCAACTTTATAACAGTATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA
136 137 2 221 Arthropoda NA NA NA NA NA ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGTCTTGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC
88 88 1 172 NA NA NA NA NA NA ACTCTATTTAATATTTGCTGCATTTTCAGGGGTTATAGGAACAATATTTTCTATAATTATAAGAATGGAACTTGCTTATCCAGGTGATCAAATATTGAATGGTAATCACCAACTTTATAATGTTATTGTAACTGCTCATGCATTTGTAATGATTTTTTTTATGGTTATGCCTGCCTTGATT
85 85 1 213 Arthropoda NA NA NA NA NA ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGACTAGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC
49 49 1 560 NA NA NA NA NA NA ATTATACCTTATCTTTTCTCTTTTCTCAGGTTTACTTGGAACAGCATTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGACAATCAGTTATACAATAGTATTATTACAGCACACGCAATATTAATGATTTTCTTTATGGTTATGCCAGCAATGATT
44 44 1 6216 NA NA NA NA NA NA TCTTTATATTATTTACGGCTTTCTCATAGGATTGGTTGGTACATTTTTTTCCGCTGTCATTCGTATTCAACTCATGTACCCTGGTTCGTTGTTTTTGGGTGGTAATTACCATTATTATAATACTGTAATTACAGCGCACGCACTTGTGATAATTTTTTTTATGGTCATACCAGTGTTGATT
36 2652 3 2380 NA NA NA NA NA NA TCTTTACTTAATTTTTGGTGCTATTTCTGGTGTAGCTGGAACTGCTTTATCACTTTACATCAGATTTACATTATCTCAACCAAACTCGAGTTTTTTAGAATATAACCACCATTTATATAATGTAATTGTTACAGGACATGCACTTATAATGGTTTTTTTTGTAGTAATGCCTATTTTAATT
24 34 2 251 NA NA NA NA NA NA ACTTTATTTTATTTTCGGAGCGTGGTCGGGGATGGTAGGCACATCTCTGAGTCTTTTAATTCGAGCCGAATTGGGTAATCCTGGTTCACTAATTGGGGATGACCAGATTTACAACGTTATTGTAACAGCCCATGCTTTTATTATGATTTTTTTTATAGTAATGCCAATTATGATT
22 22 1 7451 NA NA NA NA NA NA AATGTATCTAATATTTGCAATATTTGCAGGCATTGTTGGTGGACTAATGTCAGTGATACTCAGGCTAGAACTCGCACAACCTGGTAACCAGTTTTTAGGCGGCGATCATCAATTTTATAATGTTATGCTCACTGCTCACGCACTTGTCATGGTATTTTTTATGATTATGCCTGGGCTTTTC
20 20 1 2204 Arthropoda NA NA NA NA NA TCTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTCGGAACTTTACTTTCATTATTTATTCGTCTGGAATTAGCATACCCAGGAAATCAATTTTTTTTAGGAAACCACCAACTTTATAATGTAGTTGTAACAGCACATGCTTTTATTATGATTTTTTTTATGGTTATGCCAATTTTAATC
16 16 1 8690 NA NA NA NA NA NA TTTATATTTAATATTTGGGCTAATAGCGGGTGTAATAGGAACGTTATTTTCGATATTAATTAGATTAGAATTAGCCTATCCAGGGAATCAATATTTTTTGGGAGATCATCAATTTTATAATGTTGTTGTTACATCACATGCGTTTATTATGATTTTTTTTATGGTAATGCCGGCATTTGTT
15 15 1 416 Arthropoda Insecta Diptera Chironomidae Rheocricotopus Rheocricotopus chalybeatus ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCAATCATAATT
15 23 2 1717 NA NA NA NA NA NA CTTGTACATGATTTTCGGAACGTTGGCCGGAGTGGTCGGAACGACGTTGTCGGTATGGATGCGAATGGAATTGGCGGCACCGGGAGTGCAAGCATTGTCGGGAAACCATCAGTTGTATAACGTGATGGTGACGGCACATGCCTTCATCATGATTTTCTTCTTCGTGATGCCCTTTTTGATT
14 14 1 1641 NA NA NA NA NA NA CTTATATCTATTATTTGCAGGCGTTTCGGGTATTGCCGGCACTGTTTTATCTTTATATATACGAGCTACACTAGCAACTCCTGCTTCCAATTTTTTAAGCAAAAATCATCACTTGTATAACGTAATAGTGACAGGCCATGCGTTTTTAATGATTTTTTTTTTAGTAATGCCTGCTCTTATA
12 12 1 629 NA NA NA NA NA NA ATTGTACTTGATTTATGGGGGATTTGCTGGTTTAATTGGAACGATGTTCTCTGTTCTAATAAGAATGGAACTATCATCACCCGGTAATACTATACTAGCTGGTAACTATCAATACTATAATGTTATAGTAACTGCGCATGCTTTCATTATGATCTTCTTTTTTGTTATGCCTGCTATGATG
12 12 1 6042 Arthropoda Insecta Ephemeroptera Baetidae Baetis Baetis rhodani TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGATCATGGCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT
11 16 2 2302 Arthropoda Insecta Diptera Simuliidae Simulium Simulium lineatum TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGATCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCAATTATAATT
10 26 2 649 NA NA NA NA NA NA CCTATACCTAGTATTTGCAGTATTTGCAGGTATAATTGGTACAGCATTTTCAGTACTAATTCGTATGGAACTTGCAGCACCAGGAGTACAATATCTTAACGGAGATCACCAACTTTATAATGTAGTTATTACTGCACATGCGCTAATTATGATTTTCTTTATGGTTATGCCTGCTCTCGTG
10 10 1 7478 NA NA NA NA NA NA ACTATACTTAATCTTTGCATTATTTTCTGGATTATTAGGTACAGCGTTTTCTGTTCTTATAAGATTAGAATTAAGTGGGCCAGGTGTTCAATATATAGCGGACAATCAACTATACAACAGTGTTATTACAGCACACGCTATCTTAATGATATTCTTTATGGTTATGCCTGCAATGATA
9 9 1 4790 NA NA NA NA NA NA ATTATACCTTATATTTTCTCTGTTTTCGGGTTTACTTGGAACCGCTTTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGATAACCAATTATACAATAGTATAATTACAGCACACGCGATACTTATGATTTTCTTTATGGTTATGCCAGCAATGATT
8 8 1 1759 Streptophyta Magnoliopsida NA NA NA NA TCTATATTTCATCTTCGGTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGATCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTTTTTATGGTTATGCCGGCGATGATA
8 8 1 2197 NA NA NA NA NA NA TCTTTACCTGATCTTCGCCGTATTCTCAGGAATGATTGGTACAGCATTCAGTGTAATTATTCGAATGGAACTTGCTGCGCCCGGTGTGCAATACCTTCACGGTAACCACCAACTATATAACGTAATTATTACAGCCCACGCCTTCCTAATGATCTTTTTCATGGTTATGCCTGGTCTTGTG
7 61 3 2306 Arthropoda Insecta Diptera Simuliidae Simulium Simulium balcanicum TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGCTCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATGCCAATTATAATT
7 7 1 5765 Arthropoda Insecta Diptera Chironomidae NA NA CCTTTATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT
6 6 1 7 NA NA NA NA NA NA AATGTATCTAATCTTTGGAGGTTTTTCTGGTATTATTGGAACAGCTTTATCTATTTTAATCAGAATAGAATTATCGCAACCAGGAAACCAAATTTTAATGGGAAACCATCAATTATATAATGTAATTGTAACTTCTCACGCTTTTATTATGATTTTTTTTATGGTAATGCCAATTTTATTA
6 6 1 392 Arthropoda Insecta Diptera Chironomidae NA NA ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT
6 6 1 574 NA NA NA NA NA NA ATTATATTTTCTTTTCGGTACACTATCCGGTGTTATAGGAACAATTTTATCTTTACTTATACGCTTGGAATTAGCATATCCGGGAAATCAATTTTTTTTAGGTAATCATCAATTATACAATGTCGTAGTTACAGCCCATGCATTTTTAATGATTTTTTTTATGGTAATGCCTGTTTTAATT
6 6 1 7483 Arthropoda Insecta Diptera Chironomidae NA NA ACTATATTTTATTTTTGGGGCATGGTCAGGAATAGTTGGTACTTCCCTTAGTATCCTAATTCGAGCTGAACTAGGACATGCCGGCTCCCTAATTGGAGACGATCAAATTTATAATGTAATCGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT
5 466 3 3848 NA NA NA NA NA NA TTTATACTTATTATTTGCTGTTTTAGCAGGAGTTGTAGGAACATATTTTTCTGCTTTAATCAGAATAGAGTTAGCATATCCTGGTAATGGAATTTTTAACGGTAATTTTCAACTTTATAATGTTGTAGTAACAGCGCATGCTTTTATTATGATTTTCTTTTTAGTAATGCCAGCAATGATT
5 5 1 4440 NA NA NA NA NA NA ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTATAGCTCACGCATTCCTAATGACCTTTTTCATGGTTATGCCAGGACTTGTT
5 5 1 8667 NA NA NA NA NA NA TCTTTATATTCTTTTTGGAGCTATTGCAGGAGTATGTGGTACTGCAGTCTCCGTAGCGATTAGATTAGAACTTGCTCAACCAGGTGCAGGTATACTATCGTCTAATCACCAGTTATACAATGTTTTTATTACAGCTCATGCTATTTTAATGATTTTTTTCATGGTCATGCCTATTCTTATA
4 4 1 182 NA NA NA NA NA NA ACTGTATTTAATATTTGGTGGCTTTTCGGGTATTATAGGTACTATATTCTCTATGATTATAAGATTAGAATTGGCTGCGCCCGGCTCTCAAATATTAGGTGGTAATAGCCAACTTTATAATGTAATTATTACTGCGCATGCTTTTGTTATGATTTTCTTTTTTGTTATGCCTGTTATGATA
4 4 1 1737 NA NA NA NA NA NA TCTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCTATGATC
4 4 1 5769 Arthropoda Insecta NA NA NA NA CCTTTATTTTATTTTTGGTGCTTGATCTGGTATAGTTGGTACTTCTTTAAGAATGCTAATTCGAGCAGAATTAGGACGTCCAGGAACATTTATTGGAGATGACCAAGTTTATAATGTTATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTTTAATT
4 4 1 5865 NA NA NA NA NA NA GTTATATTTAATATTTAGTATAATAGCAGGTTTAGTTGGTACGTGATTTTCAATAATGATAAGAACAGAATTAGCATATCCAGGTTTTCAATATTTTAATGGAGATTTACAACATTATAATGTGATAATTACAGGACATGCGTTCATTATGATATTTTTCATGGTAATGCCAGCATTAATT
3 3 1 534 Arthropoda Insecta Ephemeroptera Baetidae Baetis Baetis rhodani ACTTTATTTTATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT
3 3 1 569 NA NA NA NA NA NA ATTATATCTTATATTTGCAGCCTTCTCTGGTATAATAGGAACTATTTTTTCTATTATTATAAGAATGGAATTAGCATTTCCAGGAGATCAAGTTTTGGGCGGTAATCATCAACTTTATAATGTTATTGTCACTGCACACGCTTTTTTAATGATATTTTTTATGGTTATGCCCGCTCTTATT
3 3 1 613 NA NA NA NA NA NA ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATGACCAACTTTATAACAATATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA
3 3 1 2199 NA NA NA NA NA NA TCTTTATCTTATATTTGCATTATTTTCAGGGCTTTTAGGTACAGCTTTTTCTGTTTTAATTAGACTAGAATTATCTGGACCTGGAGTACAATACATAGCAGACAACCAATTATACAACAGTATAATAACTGCGCATGCTATTCTGATGATATTTTTCATGGTAATGCCTGCAATGATA
3 3 1 4796 NA NA NA NA NA NA ATTATATTTAATATTTGGGGGTATCTCAGGTGTAGCAGGGACTGTATTATCCTTATACATACGAATAACACTATCGCACCCAGAAGGAAATTTTTTAGAACACAATCACCACTTATACAATGTTATTGTAACAGGTCATGCTTTTGTTATGATTTTTTTTATGGTAATGCCTGTTCTTATC
2 NA 1 10 NA NA NA NA NA NA ACTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCCATGATT
2 2 1 572 Arthropoda Insecta Diptera Simuliidae Simulium Simulium pseudequinum ATTATATTTTATTTTTGGGGCCTGAGCAGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCTGAATTAGGACACCCAGGATCTTTAATTGGTGATGACCAAATTTATAATGTAATTGTTACAGCTCATGCTTTCGTAATAATTTTTTTTATAGTTATACCAATTATAATT
2 2 1 1682 Arthropoda Insecta Diptera Chironomidae NA NA CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTGATACCTATTATAATT
2 2 1 1794 Arthropoda Insecta Diptera Chironomidae Rheocricotopus Rheocricotopus chalybeatus TCTATATTTCATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT
2 2 1 2224 NA NA NA NA NA NA TTTATATATGATTTTTGCAGCCTTTTCAGGAATTGTAGGGACTGTATTTTCAATGTTAATTCGATTTGAATTAGCACATCCAGGACATCAAATTTTATCTGGAAATAACCAATTATACAACGTTATCGTAACGGCACATGCTTTTGTAATGATTTTCTTCATGGTAATGCCTGCATTAATT
2 2 1 5307 Arthropoda Insecta Trichoptera Hydropsychidae Hydropsyche Hydropsyche pellucidula CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT
2 2 1 5926 Arthropoda Insecta Diptera Chironomidae NA NA TCTATATTTCATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT
2 2 1 6328 Arthropoda Insecta Diptera NA NA NA TTTATATTTTATTTTTGGTATTTGATCAGGTATAGTGGGTACTTCTTTGAGCTTAATAATTCGTACAGAATTAGGTCAGCCAGGTTATTTAATTGGAGATGACCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTCTTTATAGTGATACCTATTATAATT

In this mock sample, there should be the following 6 species:

  • Caenis pusilla
  • Rheocricotopus
  • Phoxinus phoxinus
  • Hydropsyche pellucidula
  • Synorthocladius semivirens
  • Baetis rhodani

We can see that in spite of all the filtering we have done so far, there are still a lot of unexpected occurrences in this sample. Most of them have low read counts and could be filtered out by Low Frequency Noise Filters

Select the expected ASV and make mock_composition

You can now pick the correct sequences of the expected ASVs in each mock and make the mock_composition file.