The mock_composition file is essential to describe which ASVs are expected in each mock sample.
It is a CSV file with the following columns:
keep: Expected ASV in the mock, that should be kept in
the data settolerate: ASV that can be present in a mock, but it is
not essential to keep it in the data set (e.g. badly amplified
organism)The known_occurrences file is essential for running the following functions, either to determine the optimal parameter values for the LFN filters or to evaluate the filtering procedure in terms of precision (TP / (TP + FP)) and sensitivity (TP / (TP + FN)).
OptimizePCRerrorOptimizeLFNsampleReplicateOptimizeLFNreadCountLFNvariantMakeKnownOccurrencesASVspecificCutoffThe mock_composition
is also useful, although not essential for the
WriteASVtable function if you wish to add a column in the
output to easily find expected occurrences in each mock sample.
mock_composition FileI suggest two different methods to construct the
mock_composition file:
MakeMockCompositionLTG functionMakeMockCompositionLTGThis function provides a convenient and fast way to build the
mock_composition file. However, some expected occurrences
may occasionally be missed. If that happens, you can manually select the
expected ASVs from a prefiltered dataset (as descirbed later).
Reference sequences
Collect a reference sequence for each species expected in the mock samples. Each reference should cover at least 70% of the region amplified by the primers. It may be longer or slightly shorter than the ASV and may differ slightly from the exact expected sequence.
The reference sequences must be in FASTA format, and the FASTA headers should include a valid NCBI taxonomic identifier in the following format:
>SequenceName taxID=12345
The taxID must correspond to a valid entry in the NCBI Taxonomy
database.
Taxonomy file
The function requires a taxonomy file. The file distributed with the COInr database is suitable for Eukaryotes, even if it is not a COI marker. (see TaxAssign reference data base).
Read count data
Provide a read_count_df data frame containing read
counts for each mock sample. It is recommended to use a dataset that has
already undergone initial filtering steps to remove artefactual ASVs and
reduce drastically the number of ASV (e.g., after Denoising with SWARM,
LFNglobalReadCount, FilterIndel,
FilterCodonStop, FilterExternalContaminant,
and FilterChimera).
The MakeMockCompositionLTG function performs the
following steps:
mock_composition template
file, which should be reviewed and, if necessary, edited by the
user.The main output file is
mock_composition_template_to_check.csv.
This file serves as a template for the final
mock_composition file.
It contains the most abundant sequence for each ltg_name
identified in the taxonomic assignment output, repeated for each mock
sample. If different mock samples have distinct compositions, you should
remove the lines corresponding to taxa that are not expected in a
particular sample.
Note: The file does not include sequences for species in the custom database that did not show significant similarity to any ASV. This may occur if:
We will use some of the files created by the first part of the Tutorial (Till the FilterRenkonen)
vtamR package, hence
the use of system.file(). When using your own data just
enter your file names.read_count_file is the output of
FilterRenkonen of the Tutorial.blast_db and taxonomy are set up as in
the Tutoriallibrary(vtamR)
read_count_file <- system.file("extdata/demo/7_FilterChimera.csv", package = "vtamR")
reference_mock_fasta <- system.file("extdata/demo/mock_ncbi.fasta", package = "vtamR")
sampleinfo <- system.file("extdata/demo/sampleinfo.csv", package = "vtamR")
taxonomy <- system.file("extdata/db_test/taxonomy_reduced.tsv", package = "vtamR")
blast_path <- "blastn" # Adapt this if BLAST is not in your PATH
outdir_mock <- "mock_composition"
mock_template <- MakeMockCompositionLTG(read_count=read_count_file,
fas=reference_mock_fasta,
taxonomy=taxonomy,
sampleinfo = sampleinfo,
outdir= outdir_mock;
blast_path=blast_path)
Note: If BLAST is in your PATH (see Installation), you you can omit the
blast_path argument.
The idea is to
I suggest that you start by filtering/denoising your data set by using at least some of the following functions as in the Tutorial. This will eliminate most of the erroneous ASV, so it will be easier to identify the expected ASV from your mock samples.
We will use some of the files created by the first part of the Tutorial (Till the FilterChimera)
vtamR package, hence
the use of system.file(). When using your own data just
enter your file names.read_count_file is the output of
FilterRenkonen of the Tutorial.blast_db and taxonomy are set up as in
the Tutoriallibrary(vtamR)
library(dplyr)
read_count_file <- system.file("extdata/demo/7_FilterChimera.csv", package = "vtamR")
taxonomy <- system.file("extdata/db_test/taxonomy_reduced.tsv", package = "vtamR")
sampleinfo <- system.file("extdata/demo/sampleinfo.csv", package = "vtamR")
blast_db <- system.file("extdata/db_test", package = "vtamR")
blast_db <- file.path(blast_db, "COInr_reduced")
blast_path <- "blastn" # Adapt this if BLAST is not in your PATH
Let’s limit the analyses to the mock samples
read_count_df <- read.csv(read_count_file)
sampleinfo_df <- read.csv(sampleinfo)
# select mock samples in sampleinfo
mock_samples <- sampleinfo_df %>%
filter(sample_type == "mock")
# select mock samples from read_count_df
read_count_mock <- read_count_df %>%
filter(sample %in% mock_samples$sample)
TaxAssignLTG will assign all ASV in the input csv file
or data frame (read_count_file).
See more details of taxonomic assignment here.
Note: If BLAST is in your PATH (see Installation), you you can omit the
blast_path argument.
asv_tax <- TaxAssignLTG(asv=read_count_mock,
taxonomy=taxonomy,
blast_db=blast_db,
quiet=TRUE,
blast_path = blast_path
)
Make a data frame with ASVs and read counts in the wide format and add
their taxonomic assignment. This format is easier to read for humans,
than the read_count_df.
See details of WriteASVtable here.
asv_table_mock <- WriteASVtable(read_count_mock,
sampleinfo=sampleinfo,
asv_tax=asv_tax,
pool_replicates=TRUE)
Sort the output by the taxon name and then by decreasing read count.
asv_table_mock <- asv_table_mock %>%
arrange(ltg_name, desc(tpos1))
Let’s see the ASV present in tpos1.
knitr::kable(asv_table_mock, format = "markdown")
| asv_id | tpos1 | ltg_taxid | ltg_name | ltg_rank | ltg_rank_index | domain_taxid | domain | kingdom_taxid | kingdom | phylum_taxid | phylum | class_taxid | class | order_taxid | order | family_taxid | family | genus_taxid | genus | species_taxid | species | pid | pcov | phit | taxn | seqn | refres | ltgres | asv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 219 | 132 | 6656 | Arthropoda | phylum | 3.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 97 | 70 | 70 | 1 | 1 | 8 | 8 | ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGTCTTGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC |
| 211 | 83 | 6656 | Arthropoda | phylum | 3.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 97 | 70 | 70 | 1 | 1 | 8 | 8 | ACTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTAGGAACTTTACTTTCATTATTTATTCGACTAGAATTAGCTTATCCAGGAAATCAATTTTTTTTAGGAAATCACCAACTTTATAATGTGGTTGTGACAGCACATGCTTTTATCATGATTTTTTTCATGGTTATGCCGATTTTAATC |
| 2197 | 19 | 6656 | Arthropoda | phylum | 3.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 90 | 70 | 70 | 3 | 3 | 7 | 8 | TCTTTATTTCATTTTCGGAACATTTGCAGGAGTTGTCGGAACTTTACTTTCATTATTTATTCGTCTGGAATTAGCATACCCAGGAAATCAATTTTTTTTAGGAAACCACCAACTTTATAATGTAGTTGTAACAGCACATGCTTTTATTATGATTTTTTTTATGGTTATGCCAATTTTAATC |
| 2264 | 442 | 1077837 | Baetis fuscatus | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30073 | Ephemeroptera | 172515 | Baetidae | 189838 | Baetis | 1077837 | Baetis fuscatus | 100 | 70 | 70 | 1 | 1 | 8 | 8 | TTTATATTTCATTTTTGGTGCATGATCAGGTATGGTGGGTACTTCCCTTAGTTTATTAATTCGAGCAGAACTTGGTAATCCTGGTTCTTTGATTGGCGATGATCAGATTTATAACGTTATTGTCACTGCCCATGCTTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
| 6 | 4303 | 189839 | Baetis rhodani | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30073 | Ephemeroptera | 172515 | Baetidae | 189838 | Baetis | 189839 | Baetis rhodani | 100 | 70 | 70 | 1 | 1 | 8 | 8 | TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
| 6033 | 12 | 189839 | Baetis rhodani | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30073 | Ephemeroptera | 172515 | Baetidae | 189838 | Baetis | 189839 | Baetis rhodani | 97 | 70 | 70 | 1 | 1 | 8 | 8 | TCTATATTTCATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGATCATGGCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
| 531 | 3 | 189839 | Baetis rhodani | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30073 | Ephemeroptera | 172515 | Baetidae | 189838 | Baetis | 189839 | Baetis rhodani | 100 | 70 | 70 | 1 | 1 | 8 | 8 | ACTTTATTTTATTTTTGGTGCTTGGGCAGGTATGGTAGGTACCTCATTAAGACTTTTAATTCGAGCCGAGTTGGGTAACCCGGGTTCATTAATTGGGGACGATCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATGATTTTTTTTATAGTGATACCTATTATAATT |
| 1 | 294 | 1592914 | Caenis pusilla | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30073 | Ephemeroptera | 197146 | Caenidae | 197147 | Caenis | 1592914 | Caenis pusilla | 100 | 70 | 70 | 1 | 1 | 8 | 8 | ACTATATTTTATTTTTGGGGCTTGATCCGGAATGCTGGGCACCTCTCTAAGCCTTCTAATTCGTGCCGAGCTGGGGCACCCGGGTTCTTTAATTGGCGACGATCAAATTTACAATGTAATCGTCACAGCCCATGCTTTTATTATGATTTTTTTCATGGTTATGCCTATTATAATC |
| 7474 | 4 | 7149 | Chironomidae | family | 6.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | NA | NA | NA | NA | 85 | 70 | 70 | 4 | 4 | 6 | 7 | ACTATATTTTATTTTTGGGGCATGGTCAGGAATAGTTGGTACTTCCCTTAGTATCCTAATTCGAGCTGAACTAGGACATGCCGGCTCCCTAATTGGAGACGATCAAATTTATAATGTAATCGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
| 6319 | 2 | 41828 | Chironomoidea | superfamily | 5.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | NA | NA | NA | NA | NA | NA | 85 | 70 | 70 | 4 | 4 | 6 | 7 | TTTATATTTTATTTTTGGTATTTGATCAGGTATAGTGGGTACTTCTTTGAGCTTAATAATTCGTACAGAATTAGGTCAGCCAGGTTATTTAATTGGAGATGACCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTCTTTATAGTGATACCTATTATAATT |
| 5760 | 4 | 33392 | Endopterygota | cohort | 4.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | NA | NA | NA | NA | NA | NA | NA | NA | 85 | 70 | 70 | 4 | 4 | 6 | 7 | CCTTTATTTTATTTTTGGTGCTTGATCTGGTATAGTTGGTACTTCTTTAAGAATGCTAATTCGAGCAGAATTAGGACGTCCAGGAACATTTATTGGAGATGACCAAGTTTATAATGTTATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTTTAATT |
| 4 | 16292 | 869943 | Hydropsyche pellucidula | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30263 | Trichoptera | 41030 | Hydropsychidae | 50443 | Hydropsyche | 869943 | Hydropsyche pellucidula | 100 | 70 | 70 | 1 | 1 | 8 | 8 | CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATTGTTACATCTCATGCATTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
| 5298 | 2 | 869943 | Hydropsyche pellucidula | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 30263 | Trichoptera | 41030 | Hydropsychidae | 50443 | Hydropsyche | 869943 | Hydropsyche pellucidula | 97 | 70 | 70 | 1 | 1 | 8 | 8 | CCTTTATTTTATTTTCGGTATCTGATCAGGTCTCGTAGGATCATCACTTAGATTTATTATTCGAATAGAATTAAGAACTCCTGGTAGATTTATTGGCAACGACCAAATTTATAACGTAATCGTAACTGCTCATGCCTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
| 5756 | 11 | 43808 | Orthocladiinae | subfamily | 6.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | NA | NA | NA | NA | 90 | 70 | 70 | 3 | 3 | 7 | 8 | CCTTTATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
| 390 | 6 | 43808 | Orthocladiinae | subfamily | 6.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | NA | NA | NA | NA | 90 | 70 | 70 | 3 | 3 | 7 | 8 | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCAATCATAATT |
| 1677 | 2 | 43808 | Orthocladiinae | subfamily | 6.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | NA | NA | NA | NA | 90 | 70 | 70 | 3 | 3 | 7 | 8 | CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTGATACCTATTATAATT |
| 5917 | 2 | 43808 | Orthocladiinae | subfamily | 6.5 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | NA | NA | NA | NA | 90 | 70 | 70 | 3 | 3 | 7 | 8 | TCTATATTTCATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
| 1753 | 8 | 1437201 | Pentapetalae | clade | 4.5 | 2759 | Eukaryota | 33090 | Viridiplantae | 35493 | Streptophyta | 3398 | Magnoliopsida | NA | NA | NA | NA | NA | NA | NA | NA | 100 | 70 | 70 | 1 | 1 | 8 | 8 | TCTATATTTCATCTTCGGTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGATCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTTTTTATGGTTATGCCGGCGATGATA |
| 3 | 216 | 58324 | Phoxinus phoxinus | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 7711 | Chordata | 186623 | Actinopteri | 7952 | Cypriniformes | 2743726 | Leuciscidae | 42662 | Phoxinus | 58324 | Phoxinus phoxinus | 100 | 70 | 70 | 1 | 1 | 8 | 8 | CCTTTATCTTGTATTTGGTGCCTGGGCCGGAATGGTAGGGACCGCCCTAAGCCTTCTTATTCGGGCCGAACTAAGCCAGCCTGGCTCGCTATTAGGTGATAGCCAAATTTATAATGTTATTGTTACCGCCCACGCCTTCGTAATAATTTTCTTTATAGTCATGCCAATTCTCATT |
| 27 | 212 | 33317 | Protostomia | clade | 2.5 | 2759 | Eukaryota | 33208 | Metazoa | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 80 | 70 | 70 | 4 | 4 | 6 | 7 | ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTACAGCTCACGCATTCCTAATGATCTTTTTCATGGTTATGCCAGGACTTGTT |
| 2 | 1608 | 1042866 | Rheocricotopus chalybeatus | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | 611384 | Rheocricotopus | 1042866 | Rheocricotopus chalybeatus | 97 | 70 | 70 | 1 | 1 | 8 | 8 | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT |
| 414 | 15 | 1042866 | Rheocricotopus chalybeatus | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | 611384 | Rheocricotopus | 1042866 | Rheocricotopus chalybeatus | 97 | 70 | 70 | 1 | 1 | 8 | 8 | ACTTTATTTTATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCAATCATAATT |
| 1788 | 2 | 1042866 | Rheocricotopus chalybeatus | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | 611384 | Rheocricotopus | 1042866 | Rheocricotopus chalybeatus | 97 | 70 | 70 | 1 | 1 | 8 | 8 | TCTATATTTCATTTTTGGTGCTTGATCAGGAATAGTAGGAACTTCTTTAAGAATTCTAATTCGAGCTGAATTAGGTCATGCCGGTTCATTAATTGGAGATGATCAAATTTATAATGTAATTGTAACTGCTCATGCTTTTGTAATAATTTTCTTTATAGTTATACCTATTTTAATT |
| 2299 | 7 | 1216507 | Simulium balcanicum | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7190 | Simuliidae | 7191 | Simulium | 1216507 | Simulium balcanicum | 100 | 70 | 70 | 1 | 1 | 8 | 8 | TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGCTCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATGCCAATTATAATT |
| 2295 | 10 | 697243 | Simulium lineatum | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7190 | Simuliidae | 7191 | Simulium | 697243 | Simulium lineatum | 100 | 70 | 70 | 1 | 1 | 8 | 8 | TTTATATTTTATTTTTGGAGCCTGAGCTGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCCGAATTAGGACACCCAGGATCTCTAATTGGAGACGACCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTGTAATAATTTTTTTTATAGTTATACCAATTATAATT |
| 569 | 2 | 1419339 | Simulium pseudequinum | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7190 | Simuliidae | 7191 | Simulium | 1419339 | Simulium pseudequinum | 100 | 70 | 70 | 1 | 1 | 8 | 8 | ATTATATTTTATTTTTGGGGCCTGAGCAGGAATAGTAGGTACTTCCCTTAGTATACTTATTCGAGCTGAATTAGGACACCCAGGATCTTTAATTGGTGATGACCAAATTTATAATGTAATTGTTACAGCTCATGCTTTCGTAATAATTTTTTTTATAGTTATACCAATTATAATT |
| 5 | 338 | 611678 | Synorthocladius semivirens | species | 8.0 | 2759 | Eukaryota | 33208 | Metazoa | 6656 | Arthropoda | 50557 | Insecta | 7147 | Diptera | 7149 | Chironomidae | 611392 | Synorthocladius | 611678 | Synorthocladius semivirens | 97 | 70 | 70 | 1 | 1 | 8 | 8 | CTTATATTTTATTTTTGGTGCTTGATCAGGGATAGTGGGAACTTCTTTAAGAATTCTTATTCGAGCTGAACTTGGTCATGCGGGATCTTTAATCGGAGACGATCAAATTTACAATGTAATTGTTACTGCACACGCCTTTGTAATAATTTTTTTTATAGTTATACCTATTTTAATT |
| 597 | 156 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATAACCAACTTTATAACAGTATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA |
| 170 | 85 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTCTATTTAATATTTGCTGCATTTTCAGGGGTTATAGGAACAATATTTTCTATAATTATAAGAATGGAACTTGCTTATCCAGGTGATCAAATATTGAATGGTAATCACCAACTTTATAATGTTATTGTAACTGCTCATGCATTTGTAATGATTTTTTTTATGGTTATGCCTGCCTTGATT |
| 557 | 48 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTATACCTTATCTTTTCTCTTTTCTCAGGTTTACTTGGAACAGCATTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGACAATCAGTTATACAATAGTATTATTACAGCACACGCAATATTAATGATTTTCTTTATGGTTATGCCAGCAATGATT |
| 6207 | 44 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTTTATATTATTTACGGCTTTCTCATAGGATTGGTTGGTACATTTTTTTCCGCTGTCATTCGTATTCAACTCATGTACCCTGGTTCGTTGTTTTTGGGTGGTAATTACCATTATTATAATACTGTAATTACAGCGCACGCACTTGTGATAATTTTTTTTATGGTCATACCAGTGTTGATT |
| 2373 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTTTACTTAATTTTTGGTGCTATTTCTGGTGTAGCTGGAACTGCTTTATCACTTTACATCAGATTTACATTATCTCAACCAAACTCGAGTTTTTTAGAATATAACCACCATTTATATAATGTAATTGTTACAGGACATGCACTTATAATGGTTTTTTTTGTAGTAATGCCTATTTTAATT |
| 249 | 24 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTTTATTTTATTTTCGGAGCGTGGTCGGGGATGGTAGGCACATCTCTGAGTCTTTTAATTCGAGCCGAATTGGGTAATCCTGGTTCACTAATTGGGGATGACCAGATTTACAACGTTATTGTAACAGCCCATGCTTTTATTATGATTTTTTTTATAGTAATGCCAATTATGATT |
| 7442 | 22 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | AATGTATCTAATATTTGCAATATTTGCAGGCATTGTTGGTGGACTAATGTCAGTGATACTCAGGCTAGAACTCGCACAACCTGGTAACCAGTTTTTAGGCGGCGATCATCAATTTTATAATGTTATGCTCACTGCTCACGCACTTGTCATGGTATTTTTTATGATTATGCCTGGGCTTTTC |
| 8681 | 16 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TTTATATTTAATATTTGGGCTAATAGCGGGTGTAATAGGAACGTTATTTTCGATATTAATTAGATTAGAATTAGCCTATCCAGGGAATCAATATTTTTTGGGAGATCATCAATTTTATAATGTTGTTGTTACATCACATGCGTTTATTATGATTTTTTTTATGGTAATGCCGGCATTTGTT |
| 1711 | 15 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | CTTGTACATGATTTTCGGAACGTTGGCCGGAGTGGTCGGAACGACGTTGTCGGTATGGATGCGAATGGAATTGGCGGCACCGGGAGTGCAAGCATTGTCGGGAAACCATCAGTTGTATAACGTGATGGTGACGGCACATGCCTTCATCATGATTTTCTTCTTCGTGATGCCCTTTTTGATT |
| 646 | 14 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | CCTATACCTAGTATTTGCAGTATTTGCAGGTATAATTGGTACAGCATTTTCAGTACTAATTCGTATGGAACTTGCAGCACCAGGAGTACAATATCTTAACGGAGATCACCAACTTTATAATGTAGTTATTACTGCACATGCGCTAATTATGATTTTCTTTATGGTTATGCCTGCTCTCGTG |
| 1636 | 14 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | CTTATATCTATTATTTGCAGGCGTTTCGGGTATTGCCGGCACTGTTTTATCTTTATATATACGAGCTACACTAGCAACTCCTGCTTCCAATTTTTTAAGCAAAAATCATCACTTGTATAACGTAATAGTGACAGGCCATGCGTTTTTAATGATTTTTTTTTTAGTAATGCCTGCTCTTATA |
| 626 | 11 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTGTACTTGATTTATGGGGGATTTGCTGGTTTAATTGGAACGATGTTCTCTGTTCTAATAAGAATGGAACTATCATCACCCGGTAATACTATACTAGCTGGTAACTATCAATACTATAATGTTATAGTAACTGCGCATGCTTTCATTATGATCTTCTTTTTTGTTATGCCTGCTATGATG |
| 7469 | 10 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTATACTTAATCTTTGCATTATTTTCTGGATTATTAGGTACAGCGTTTTCTGTTCTTATAAGATTAGAATTAAGTGGGCCAGGTGTTCAATATATAGCGGACAATCAACTATACAACAGTGTTATTACAGCACACGCTATCTTAATGATATTCTTTATGGTTATGCCTGCAATGATA |
| 4781 | 9 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTATACCTTATATTTTCTCTGTTTTCGGGTTTACTTGGAACCGCTTTTTCAGTTTTAATAAGACTTGAATTATCTGGACCTGGTGTTCAGTACATAGCAGATAACCAATTATACAATAGTATAATTACAGCACACGCGATACTTATGATTTTCTTTATGGTTATGCCAGCAATGATT |
| 2190 | 8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTTTACCTGATCTTCGCCGTATTCTCAGGAATGATTGGTACAGCATTCAGTGTAATTATTCGAATGGAACTTGCTGCGCCCGGTGTGCAATACCTTCACGGTAACCACCAACTATATAACGTAATTATTACAGCCCACGCCTTCCTAATGATCTTTTTCATGGTTATGCCTGGTCTTGTG |
| 7 | 6 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | AATGTATCTAATCTTTGGAGGTTTTTCTGGTATTATTGGAACAGCTTTATCTATTTTAATCAGAATAGAATTATCGCAACCAGGAAACCAAATTTTAATGGGAAACCATCAATTATATAATGTAATTGTAACTTCTCACGCTTTTATTATGATTTTTTTTATGGTAATGCCAATTTTATTA |
| 571 | 6 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTATATTTTCTTTTCGGTACACTATCCGGTGTTATAGGAACAATTTTATCTTTACTTATACGCTTGGAATTAGCATATCCGGGAAATCAATTTTTTTTAGGTAATCATCAATTATACAATGTCGTAGTTACAGCCCATGCATTTTTAATGATTTTTTTTATGGTAATGCCTGTTTTAATT |
| 3839 | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TTTATACTTATTATTTGCTGTTTTAGCAGGAGTTGTAGGAACATATTTTTCTGCTTTAATCAGAATAGAGTTAGCATATCCTGGTAATGGAATTTTTAACGGTAATTTTCAACTTTATAATGTTGTAGTAACAGCGCATGCTTTTATTATGATTTTCTTTTTAGTAATGCCAGCAATGATT |
| 4431 | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTATAGCTCACGCATTCCTAATGACCTTTTTCATGGTTATGCCAGGACTTGTT |
| 8658 | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTTTATATTCTTTTTGGAGCTATTGCAGGAGTATGTGGTACTGCAGTCTCCGTAGCGATTAGATTAGAACTTGCTCAACCAGGTGCAGGTATACTATCGTCTAATCACCAGTTATACAATGTTTTTATTACAGCTCATGCTATTTTAATGATTTTTTTCATGGTCATGCCTATTCTTATA |
| 180 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTGTATTTAATATTTGGTGGCTTTTCGGGTATTATAGGTACTATATTCTCTATGATTATAAGATTAGAATTGGCTGCGCCCGGCTCTCAAATATTAGGTGGTAATAGCCAACTTTATAATGTAATTATTACTGCGCATGCTTTTGTTATGATTTTCTTTTTTGTTATGCCTGTTATGATA |
| 1731 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCTATGATC |
| 5856 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | GTTATATTTAATATTTAGTATAATAGCAGGTTTAGTTGGTACGTGATTTTCAATAATGATAAGAACAGAATTAGCATATCCAGGTTTTCAATATTTTAATGGAGATTTACAACATTATAATGTGATAATTACAGGACATGCGTTCATTATGATATTTTTCATGGTAATGCCAGCATTAATT |
| 566 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTATATCTTATATTTGCAGCCTTCTCTGGTATAATAGGAACTATTTTTTCTATTATTATAAGAATGGAATTAGCATTTCCAGGAGATCAAGTTTTGGGCGGTAATCATCAACTTTATAATGTTATTGTCACTGCACACGCTTTTTTAATGATATTTTTTATGGTTATGCCCGCTCTTATT |
| 610 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTGTACCTTATATTTGCCTTATTTTCAGGGCTATTAGGTACTGCTTTTTCTGTTTTAATAAGACTTGAATTATCAGGACCTGGTGTACAATACATAGCTGATGACCAACTTTATAACAATATAATTACTGCACATGCAATACTTATGATTTTCTTCATGGTTATGCCTGCTATGATA |
| 4787 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ATTATATTTAATATTTGGGGGTATCTCAGGTGTAGCAGGGACTGTATTATCCTTATACATACGAATAACACTATCGCACCCAGAAGGAAATTTTTTAGAACACAATCACCACTTATACAATGTTATTGTAACAGGTCATGCTTTTGTTATGATTTTTTTTATGGTAATGCCTGTTCTTATC |
| 10 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ACTATACCTGATGTTTGCCTTATTCGCAGGTTTAGTAGGTACAGCATTTTCTGTACTTATTAGAATGGAATTAAGTGCACCAGGAGTTCAATACATCAGTGATAACCAGTTATATAATAGTATTATAACAGCTCACGCTATTGTTATGATATTCTTTATGGTTATGCCTGCCATGATT |
| 2192 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TCTTTATCTTATATTTGCATTATTTTCAGGGCTTTTAGGTACAGCTTTTTCTGTTTTAATTAGACTAGAATTATCTGGACCTGGAGTACAATACATAGCAGACAACCAATTATACAACAGTATAATAACTGCGCATGCTATTCTGATGATATTTTTCATGGTAATGCCTGCAATGATA |
| 2217 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | TTTATATATGATTTTTGCAGCCTTTTCAGGAATTGTAGGGACTGTATTTTCAATGTTAATTCGATTTGAATTAGCACATCCAGGACATCAAATTTTATCTGGAAATAACCAATTATACAACGTTATCGTAACGGCACATGCTTTTGTAATGATTTTCTTCATGGTAATGCCTGCATTAATT |
In this mock sample, there should be the following 6 species:
We can see that in spite of all the filtering we have done so far, there are still a lot of unexpected occurrences in this sample. Most of them have low read counts and could be filtered out by Low Frequency Noise Filters
You can now pick the correct sequences of the expected ASVs in each mock and make the mock_composition file.