% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/SynDist_func_20250323.R
\name{SynDist}
\alias{SynDist}
\title{SynDist() function}
\usage{
SynDist(
  seq_file,
  path_out,
  input_fasta = NULL,
  codon_pos = NULL,
  analysis = "dist"
)
}
\arguments{
\item{seq_file}{is a sequence occurrence table as output by the 'dada2'
pipeline, which has samples in rows and nucleotide sequences in columns.
Optionally, a fasta file can be supplied as input in the format rendered
by read.fasta() from the package 'seqinr'.}

\item{path_out}{is a user defined path to the folder where the output files
will be saved.}

\item{input_fasta}{optional, a logical (TRUE/FALSE) that indicates whether
the input file is a fasta file (TRUE) or a 'dada2'-style sequence table
(NULL/FALSE). The default is NULL/FALSE.}

\item{codon_pos}{is optional, a vector of comma separated integers specifying
which codons to include in analyses. If omitted, analyses are made using
all codons. Note: With SynDist(), codon_pos should always be specified as
codons, i.e. numbered nucleotide triplets in open reading frame.}

\item{analysis}{is used to specify the desired kind of analysis. It takes
the values 'dist' for quantification of pairwise synonymous variation
between sequences, or 'codon' for quantification of synonymous substitutions
per nucleotide or codon position. The argument is optional with 'dist' as
default.}
}
\value{
When analysis="dist", the function produces a .csv distance matrix
  with the number of synonymous substitutions in each pairwise sequence
  comparison in the upper right matrix and the synonymous p-distance in each
  pairwise sequence comparison in the lower left matrix. If a sequence
  occurrence table is given as input file, the function additionally produces
  two tables with the mean number of synonymous substitutions and the mean
  synonymous p-distance across all pairwise sequence comparisons for each
  sample in the data set. If a sequence occurrence table is given as input
  file, the sequences are named in the output matrix by an index number that
  corresponds to their column number in the input file.
  If analysis="codon", the function produces two .csv summary tables, one with
  the total number of synonymous substitutions per nucleotide position across
  all pairwise sequence comparisons and one with the number of synonymous
  codon variations per codon across all pairwise sequence comparisons. Note
  that in the codon summary table, the synonymous codon variation does not
  quantify the number of nucleotide variations between the synonymous codons,
  since that can be derived from the nucleotide summary table. Each summary
  table also contains a column that specifies the proportion of the observed
  number of synonymous variations (per nucleotide position or codon) out of the
  number of pairwise sequence comparisons. E.g., if three sequences are
  compared and a synonymous substitution is observed for a given codon once
  (i.e., between two of the three sequences), that gives a proportion of
  synonymous observations of one out of three pairwise sequence comparisons for
  that codon.
}
\description{
\code{\link{SynDist}} identifies and quantifies synonymous variation
among aligned protein-coding DNA sequences, that is, nucleotide
substitutions that do not translate to changes in the amino acid
sequences, due to degeneracy of the genetic code.
}
\details{
The SynDist() function takes a fasta file or a 'dada2'-style sequence
occurrence table (with aligned sequences as column names and samples in
rows) as input and identifies synonymous variation by pairwise sequence
comparisons.

SynDist() can do qualitative or quantitative analysis of synonymous
variation. If analysis="codon" is specified, the function identifies
synonymous nucleotide variation and outputs tables with the number of
observations of synonymous nucleotide changes per base and per codon
among all pairwise sequence comparisons in the data set. These tables
also specify, for each base or codon position, the proportions of the
total pairwise comparisons that harbor synonymous substitutions.

If analysis="dist", the function produces a distance matrix specifying
the number and proportion (p-distance) of synonymous nucleotide
changes in each pairwise sequence comparison in the data set. In the
distance matrix, synonymous p-distance is calculated as the number of
synonymous nucleotide changes observed in each pairwise sequence
comparison divided by the sequence length (number of bases). If a
'dada2'-style sequence occurrence table is provided as input, the
SynDist() function furthermore produces two tables with the mean number
of synonymous variations and mean synonymous p-distances among all
pairwise comparisons of the sequences in each sample in the data set.
(Note: The means will be NA for samples that have 0 or 1 sequence(s).)

The SynDist() function includes an option for the user to specify which
codons to compare. This is useful e.g. if the sequences contain gaps in
some codons, which should be excluded from quantitative analysis.

SynDist() translates the supplied DNA sequences to amino acid sequences
using the standard genetic code and sequences must be aligned in open
reading frame. The function only accepts the following characters in the
sequences: -,a,t,g,c,A,T,G,C

Nucleotide triplets containing gaps (-) are translated to 'X', similar to
stop codons. Please note that '-' are treated as unique characters in p-
distance calculations. The function will give warnings if gaps or stop
codons are detected. If you wish to exclude stop codons or gaps from
distance calculations, please use the codon_pos option to specify which
codons to compare.

If you publish data or results produced with MHCtools, please cite both of
the following references:
Roved, J. (2022). MHCtools: Analysis of MHC data in non-model species. Cran.
Roved, J. (2024). MHCtools 1.5: Analysis of MHC sequencing data in R. In S.
Boegel (Ed.), HLA Typing: Methods and Protocols (2nd ed., pp. 275–295).
Humana Press. https://doi.org/10.1007/978-1-0716-3874-3_18
}
\examples{
seq_file <- sequence_table_SynDist
path_out <- tempdir()
SynDist(seq_file, path_out, input_fasta=NULL,codon_pos=c(1,2,3,4,5,6,7,8),
analysis="dist")
}
\seealso{
For more information about 'dada2', visit
  <https://benjjneb.github.io/dada2/>
}
