The mock_composition file is a CSV file with the following columns:
keep
: Expected ASV in the mock, that should be kept in the data settolerate
: ASV that can be present in a mock, but it is not essential to keep it in the data set (e.g. badly amplified organism)This file is essential in the MakeKnownOccurrences
function, that
The known_occurrences file is necessary for running the Optimize functions (OptimizePCRerror
, OptimizeLFNsampleReplicate
, OptimizeLFNreadCountLFNvariant
) to find the best parameter values for the LFN filters (LFNsampleReplicate
, LFNvariant
, LFNreadCount
) and FilterPCRerror
.
The mock_composition is also useful, although not essential for the WriteASVtable
function if you wish to add a column in the output to easily find expected occurrences in each mock sample.
Let’s see how to identify the expected mock ASV form your data.
The idea is to
I suggest that you start by filtering/denoising your data set by using at least some of the following functions as in the Tutorial. This will eliminate most of the erroneous ASV, so it will be easier to identify the expected ASV from your mock samples.
We will use the some of the files created by the first part of the Tutorial (Till the FilterRenkonen)
vtamR
package, hence the use of system.file()
. When using your own data just enter your file names.read_count_file
is the output of FilterRenkonen
of the Tutorial.blast_db
and taxonomy
, blast_path
, num_threads
are set up as in the Tutoriallibrary(vtamR)
library(dplyr)
read_count_file <- system.file("extdata/demo/7_FilterRenkonen.csv", package = "vtamR")
taxonomy <- system.file("extdata/db_test/taxonomy_reduced.tsv", package = "vtamR")
blast_db <- system.file("extdata/db_test", package = "vtamR")
blast_db <- file.path(blast_db, "COInr_reduced")
blast_path <- "~/miniconda3/envs/vtam/bin/blastn"
num_threads = 8
TaxAssign will assign all asv in the input cas file or data frame (read_count_file
).
See more details of taxonomic assignment here.
asv_tax <- TaxAssign(asv=read_count_file,
taxonomy=taxonomy,
blast_db=blast_db,
blast_path=blast_path,
num_threads=num_threads
)
See details of PoolReplicates here.
read_count_samples_df <- PoolReplicates(read_count_file)
Make a data frame with ASVs and read counts in the wide format and add the total number of reads for each ASV and the number of samples they are present (add_sums_by_asv=T
) and their taxonomic assignment. This format is easier to read for humans, than the read_count_df
.
See details of WriteASVtable
here.
sortedinfo <- system.file("extdata/demo/sortedinfo.csv", package = "vtamR")
tmp_asv_table <- WriteASVtable(read_count_samples_df,
sortedinfo=sortedinfo,
add_sums_by_asv=T,
asv_tax=asv_tax)
If there are many samples it might be better to select only mock samples and pertinent columns. In this example, tpos1
is the name of one of the mock samples. You can do this separately for each mock.
asv_tpos1 <- tmp_asv_table %>%
select(tpos1, Total_number_of_reads, Number_of_samples, asv_id,
phylum, class, order, family, genus, species, asv
) %>%
filter(tpos1 > 0) %>%
arrange(desc(tpos1))
Let’s see the ASV present in tpos1
.
knitr::kable(asv_tpos1, format = "markdown")
tpos1 | Total_number_of_reads | Number_of_samples | asv_id | phylum | class | order | family | genus | species | asv |
---|---|---|---|---|---|---|---|---|---|---|
16348 | 16362 | 2 | 1227 | Arthropoda | Insecta | Trichoptera | Hydropsychidae | Hydropsyche | Hydropsyche pellucidula | CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATTGTTACATCTCATGCATTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
4322 | 4328 | 2 | 2009 | Arthropoda | Insecta | Ephemeroptera | Baetidae | Baetis | Baetis rhodani | TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
1616 | 1619 | 2 | 429 | Arthropoda | Insecta | Diptera | Chironomidae | Rheocricotopus | Rheocricotopus chalybeatus | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT |
446 | 446 | 1 | 2271 | Arthropoda | Insecta | Ephemeroptera | Baetidae | Baetis | Baetis fuscatus | TTTATATTTCATTTTTGGTGCATGATCAGGTATGGTGGGTACTTCCCTTAGTTTATTAATTCGAGCAGAACTTGGTAATCCTGGTTCTTTGATTGGCGATGATCAGATTTATAACGTTATTGTCACTGCCCATGCTTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
341 | 341 | 1 | 1688 | Arthropoda | Insecta | Diptera | Chironomidae | Synorthocladius | Synorthocladius semivirens | CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
297 | 297 | 1 | 120 | Arthropoda | Insecta | Ephemeroptera | Caenidae | Caenis | Caenis pusilla | ACTATATTTTATTTTTGGGGCTTGATCCGGAATGCTGGGCACCTCTCTAAGCCTTCTAATTCGTGCCGAGCTGGGGCACCCGGGTTCTTTAATTGGCGACGATCAAATTTACAATGTAATCGTCACAGCCCATGCTTTTATTATGATTTTTTTCATGGTTATGCCTATTATAATC |
220 | 221 | 2 | 708 | Chordata | Actinopteri | Cypriniformes | Leuciscidae | Phoxinus | Phoxinus phoxinus | CCTTTATCTTGTATTTGGTGCCTGGGCCGGAATGGTAGGGACCGCCCTAAGCCTTCTTATTCGGGCCGAACTAAGCCAGCCTGGCTCGCTATTAGGTGATAGCCAAATTTATAATGTTATTGTTACCGCCCACGCCTTCGTAATAATTTTCTTTATAGTCATGCCAATTCTCATT |
215 | 215 | 1 | 27 | NA | NA | NA | NA | NA | NA | ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTACAGCTCACGCATTCCTAATGATCTTTTTCATGGTTATGCCAGGACTTGTT |
157 | 157 | 1 | 600 | NA | NA | NA | NA | NA | NA | ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATAACCAACTTTATAACAGTATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA |
136 | 137 | 2 | 221 | Arthropoda | NA | NA | NA | NA | NA | ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGTCTTGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC |
88 | 88 | 1 | 172 | NA | NA | NA | NA | NA | NA | ACTCTATTTAATATTTGCTGCATTTTCAGGGGTTATAGGAACAATATTTTCTATAATTATAAGAATGGAACTTGCTTATCCAGGTGATCAAATATTGAATGGTAATCACCAACTTTATAATGTTATTGTAACTGCTCATGCATTTGTAATGATTTTTTTTATGGTTATGCCTGCCTTGATT |
85 | 85 | 1 | 213 | Arthropoda | NA | NA | NA | NA | NA | ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGACTAGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC |
49 | 49 | 1 | 560 | NA | NA | NA | NA | NA | NA | ATTATACCTTATCTTTTCTCTTTTCTCAGGTTTACTTGGAACAGCATTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGACAATCAGTTATACAATAGTATTATTACAGCACACGCAATATTAATGATTTTCTTTATGGTTATGCCAGCAATGATT |
44 | 44 | 1 | 6216 | NA | NA | NA | NA | NA | NA | TCTTTATATTATTTACGGCTTTCTCATAGGATTGGTTGGTACATTTTTTTCCGCTGTCATTCGTATTCAACTCATGTACCCTGGTTCGTTGTTTTTGGGTGGTAATTACCATTATTATAATACTGTAATTACAGCGCACGCACTTGTGATAATTTTTTTTATGGTCATACCAGTGTTGATT |
36 | 2652 | 3 | 2380 | NA | NA | NA | NA | NA | NA | TCTTTACTTAATTTTTGGTGCTATTTCTGGTGTAGCTGGAACTGCTTTATCACTTTACATCAGATTTACATTATCTCAACCAAACTCGAGTTTTTTAGAATATAACCACCATTTATATAATGTAATTGTTACAGGACATGCACTTATAATGGTTTTTTTTGTAGTAATGCCTATTTTAATT |
24 | 34 | 2 | 251 | NA | NA | NA | NA | NA | NA | ACTTTATTTTATTTTCGGAGCGTGGTCGGGGATGGTAGGCACATCTCTGAGTCTTTTAATTCGAGCCGAATTGGGTAATCCTGGTTCACTAATTGGGGATGACCAGATTTACAACGTTATTGTAACAGCCCATGCTTTTATTATGATTTTTTTTATAGTAATGCCAATTATGATT |
22 | 22 | 1 | 7451 | NA | NA | NA | NA | NA | NA | AATGTATCTAATATTTGCAATATTTGCAGGCATTGTTGGTGGACTAATGTCAGTGATACTCAGGCTAGAACTCGCACAACCTGGTAACCAGTTTTTAGGCGGCGATCATCAATTTTATAATGTTATGCTCACTGCTCACGCACTTGTCATGGTATTTTTTATGATTATGCCTGGGCTTTTC |
20 | 20 | 1 | 2204 | Arthropoda | NA | NA | NA | NA | NA | TCTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTCGGAACTTTACTTTCATTATTTATTCGTCTGGAATTAGCATACCCAGGAAATCAATTTTTTTTAGGAAACCACCAACTTTATAATGTAGTTGTAACAGCACATGCTTTTATTATGATTTTTTTTATGGTTATGCCAATTTTAATC |
16 | 16 | 1 | 8690 | NA | NA | NA | NA | NA | NA | TTTATATTTAATATTTGGGCTAATAGCGGGTGTAATAGGAACGTTATTTTCGATATTAATTAGATTAGAATTAGCCTATCCAGGGAATCAATATTTTTTGGGAGATCATCAATTTTATAATGTTGTTGTTACATCACATGCGTTTATTATGATTTTTTTTATGGTAATGCCGGCATTTGTT |
15 | 15 | 1 | 416 | Arthropoda | Insecta | Diptera | Chironomidae | Rheocricotopus | Rheocricotopus chalybeatus | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCAATCATAATT |
15 | 23 | 2 | 1717 | NA | NA | NA | NA | NA | NA | CTTGTACATGATTTTCGGAACGTTGGCCGGAGTGGTCGGAACGACGTTGTCGGTATGGATGCGAATGGAATTGGCGGCACCGGGAGTGCAAGCATTGTCGGGAAACCATCAGTTGTATAACGTGATGGTGACGGCACATGCCTTCATCATGATTTTCTTCTTCGTGATGCCCTTTTTGATT |
14 | 14 | 1 | 1641 | NA | NA | NA | NA | NA | NA | CTTATATCTATTATTTGCAGGCGTTTCGGGTATTGCCGGCACTGTTTTATCTTTATATATACGAGCTACACTAGCAACTCCTGCTTCCAATTTTTTAAGCAAAAATCATCACTTGTATAACGTAATAGTGACAGGCCATGCGTTTTTAATGATTTTTTTTTTAGTAATGCCTGCTCTTATA |
12 | 12 | 1 | 629 | NA | NA | NA | NA | NA | NA | ATTGTACTTGATTTATGGGGGATTTGCTGGTTTAATTGGAACGATGTTCTCTGTTCTAATAAGAATGGAACTATCATCACCCGGTAATACTATACTAGCTGGTAACTATCAATACTATAATGTTATAGTAACTGCGCATGCTTTCATTATGATCTTCTTTTTTGTTATGCCTGCTATGATG |
12 | 12 | 1 | 6042 | Arthropoda | Insecta | Ephemeroptera | Baetidae | Baetis | Baetis rhodani | TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGATCATGGCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
11 | 16 | 2 | 2302 | Arthropoda | Insecta | Diptera | Simuliidae | Simulium | Simulium lineatum | TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGATCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCAATTATAATT |
10 | 26 | 2 | 649 | NA | NA | NA | NA | NA | NA | CCTATACCTAGTATTTGCAGTATTTGCAGGTATAATTGGTACAGCATTTTCAGTACTAATTCGTATGGAACTTGCAGCACCAGGAGTACAATATCTTAACGGAGATCACCAACTTTATAATGTAGTTATTACTGCACATGCGCTAATTATGATTTTCTTTATGGTTATGCCTGCTCTCGTG |
10 | 10 | 1 | 7478 | NA | NA | NA | NA | NA | NA | ACTATACTTAATCTTTGCATTATTTTCTGGATTATTAGGTACAGCGTTTTCTGTTCTTATAAGATTAGAATTAAGTGGGCCAGGTGTTCAATATATAGCGGACAATCAACTATACAACAGTGTTATTACAGCACACGCTATCTTAATGATATTCTTTATGGTTATGCCTGCAATGATA |
9 | 9 | 1 | 4790 | NA | NA | NA | NA | NA | NA | ATTATACCTTATATTTTCTCTGTTTTCGGGTTTACTTGGAACCGCTTTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGATAACCAATTATACAATAGTATAATTACAGCACACGCGATACTTATGATTTTCTTTATGGTTATGCCAGCAATGATT |
8 | 8 | 1 | 1759 | Streptophyta | Magnoliopsida | NA | NA | NA | NA | TCTATATTTCATCTTCGGTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGATCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTTTTTATGGTTATGCCGGCGATGATA |
8 | 8 | 1 | 2197 | NA | NA | NA | NA | NA | NA | TCTTTACCTGATCTTCGCCGTATTCTCAGGAATGATTGGTACAGCATTCAGTGTAATTATTCGAATGGAACTTGCTGCGCCCGGTGTGCAATACCTTCACGGTAACCACCAACTATATAACGTAATTATTACAGCCCACGCCTTCCTAATGATCTTTTTCATGGTTATGCCTGGTCTTGTG |
7 | 61 | 3 | 2306 | Arthropoda | Insecta | Diptera | Simuliidae | Simulium | Simulium balcanicum | TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGCTCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATGCCAATTATAATT |
7 | 7 | 1 | 5765 | Arthropoda | Insecta | Diptera | Chironomidae | NA | NA | CCTTTATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
6 | 6 | 1 | 7 | NA | NA | NA | NA | NA | NA | AATGTATCTAATCTTTGGAGGTTTTTCTGGTATTATTGGAACAGCTTTATCTATTTTAATCAGAATAGAATTATCGCAACCAGGAAACCAAATTTTAATGGGAAACCATCAATTATATAATGTAATTGTAACTTCTCACGCTTTTATTATGATTTTTTTTATGGTAATGCCAATTTTATTA |
6 | 6 | 1 | 392 | Arthropoda | Insecta | Diptera | Chironomidae | NA | NA | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
6 | 6 | 1 | 574 | NA | NA | NA | NA | NA | NA | ATTATATTTTCTTTTCGGTACACTATCCGGTGTTATAGGAACAATTTTATCTTTACTTATACGCTTGGAATTAGCATATCCGGGAAATCAATTTTTTTTAGGTAATCATCAATTATACAATGTCGTAGTTACAGCCCATGCATTTTTAATGATTTTTTTTATGGTAATGCCTGTTTTAATT |
6 | 6 | 1 | 7483 | Arthropoda | Insecta | Diptera | Chironomidae | NA | NA | ACTATATTTTATTTTTGGGGCATGGTCAGGAATAGTTGGTACTTCCCTTAGTATCCTAATTCGAGCTGAACTAGGACATGCCGGCTCCCTAATTGGAGACGATCAAATTTATAATGTAATCGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
5 | 466 | 3 | 3848 | NA | NA | NA | NA | NA | NA | TTTATACTTATTATTTGCTGTTTTAGCAGGAGTTGTAGGAACATATTTTTCTGCTTTAATCAGAATAGAGTTAGCATATCCTGGTAATGGAATTTTTAACGGTAATTTTCAACTTTATAATGTTGTAGTAACAGCGCATGCTTTTATTATGATTTTCTTTTTAGTAATGCCAGCAATGATT |
5 | 5 | 1 | 4440 | NA | NA | NA | NA | NA | NA | ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTATAGCTCACGCATTCCTAATGACCTTTTTCATGGTTATGCCAGGACTTGTT |
5 | 5 | 1 | 8667 | NA | NA | NA | NA | NA | NA | TCTTTATATTCTTTTTGGAGCTATTGCAGGAGTATGTGGTACTGCAGTCTCCGTAGCGATTAGATTAGAACTTGCTCAACCAGGTGCAGGTATACTATCGTCTAATCACCAGTTATACAATGTTTTTATTACAGCTCATGCTATTTTAATGATTTTTTTCATGGTCATGCCTATTCTTATA |
4 | 4 | 1 | 182 | NA | NA | NA | NA | NA | NA | ACTGTATTTAATATTTGGTGGCTTTTCGGGTATTATAGGTACTATATTCTCTATGATTATAAGATTAGAATTGGCTGCGCCCGGCTCTCAAATATTAGGTGGTAATAGCCAACTTTATAATGTAATTATTACTGCGCATGCTTTTGTTATGATTTTCTTTTTTGTTATGCCTGTTATGATA |
4 | 4 | 1 | 1737 | NA | NA | NA | NA | NA | NA | TCTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCTATGATC |
4 | 4 | 1 | 5769 | Arthropoda | Insecta | NA | NA | NA | NA | CCTTTATTTTATTTTTGGTGCTTGATCTGGTATAGTTGGTACTTCTTTAAGAATGCTAATTCGAGCAGAATTAGGACGTCCAGGAACATTTATTGGAGATGACCAAGTTTATAATGTTATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTTTAATT |
4 | 4 | 1 | 5865 | NA | NA | NA | NA | NA | NA | GTTATATTTAATATTTAGTATAATAGCAGGTTTAGTTGGTACGTGATTTTCAATAATGATAAGAACAGAATTAGCATATCCAGGTTTTCAATATTTTAATGGAGATTTACAACATTATAATGTGATAATTACAGGACATGCGTTCATTATGATATTTTTCATGGTAATGCCAGCATTAATT |
3 | 3 | 1 | 534 | Arthropoda | Insecta | Ephemeroptera | Baetidae | Baetis | Baetis rhodani | ACTTTATTTTATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
3 | 3 | 1 | 569 | NA | NA | NA | NA | NA | NA | ATTATATCTTATATTTGCAGCCTTCTCTGGTATAATAGGAACTATTTTTTCTATTATTATAAGAATGGAATTAGCATTTCCAGGAGATCAAGTTTTGGGCGGTAATCATCAACTTTATAATGTTATTGTCACTGCACACGCTTTTTTAATGATATTTTTTATGGTTATGCCCGCTCTTATT |
3 | 3 | 1 | 613 | NA | NA | NA | NA | NA | NA | ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATGACCAACTTTATAACAATATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA |
3 | 3 | 1 | 2199 | NA | NA | NA | NA | NA | NA | TCTTTATCTTATATTTGCATTATTTTCAGGGCTTTTAGGTACAGCTTTTTCTGTTTTAATTAGACTAGAATTATCTGGACCTGGAGTACAATACATAGCAGACAACCAATTATACAACAGTATAATAACTGCGCATGCTATTCTGATGATATTTTTCATGGTAATGCCTGCAATGATA |
3 | 3 | 1 | 4796 | NA | NA | NA | NA | NA | NA | ATTATATTTAATATTTGGGGGTATCTCAGGTGTAGCAGGGACTGTATTATCCTTATACATACGAATAACACTATCGCACCCAGAAGGAAATTTTTTAGAACACAATCACCACTTATACAATGTTATTGTAACAGGTCATGCTTTTGTTATGATTTTTTTTATGGTAATGCCTGTTCTTATC |
2 | NA | 1 | 10 | NA | NA | NA | NA | NA | NA | ACTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCCATGATT |
2 | 2 | 1 | 572 | Arthropoda | Insecta | Diptera | Simuliidae | Simulium | Simulium pseudequinum | ATTATATTTTATTTTTGGGGCCTGAGCAGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCTGAATTAGGACACCCAGGATCTTTAATTGGTGATGACCAAATTTATAATGTAATTGTTACAGCTCATGCTTTCGTAATAATTTTTTTTATAGTTATACCAATTATAATT |
2 | 2 | 1 | 1682 | Arthropoda | Insecta | Diptera | Chironomidae | NA | NA | CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTGATACCTATTATAATT |
2 | 2 | 1 | 1794 | Arthropoda | Insecta | Diptera | Chironomidae | Rheocricotopus | Rheocricotopus chalybeatus | TCTATATTTCATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT |
2 | 2 | 1 | 2224 | NA | NA | NA | NA | NA | NA | TTTATATATGATTTTTGCAGCCTTTTCAGGAATTGTAGGGACTGTATTTTCAATGTTAATTCGATTTGAATTAGCACATCCAGGACATCAAATTTTATCTGGAAATAACCAATTATACAACGTTATCGTAACGGCACATGCTTTTGTAATGATTTTCTTCATGGTAATGCCTGCATTAATT |
2 | 2 | 1 | 5307 | Arthropoda | Insecta | Trichoptera | Hydropsychidae | Hydropsyche | Hydropsyche pellucidula | CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
2 | 2 | 1 | 5926 | Arthropoda | Insecta | Diptera | Chironomidae | NA | NA | TCTATATTTCATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
2 | 2 | 1 | 6328 | Arthropoda | Insecta | Diptera | NA | NA | NA | TTTATATTTTATTTTTGGTATTTGATCAGGTATAGTGGGTACTTCTTTGAGCTTAATAATTCGTACAGAATTAGGTCAGCCAGGTTATTTAATTGGAGATGACCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTCTTTATAGTGATACCTATTATAATT |
In this mock sample, there should be the following 6 species:
We can see that in spite of all the filtering we have done so far, there are still a lot of unexpected occurrences in this sample. Most of them have low read counts and could be filtered out by Low Frequency Noise Filters
You can now pick the correct sequences of the expected ASVs in each mock and make the mock_composition file.