Starting in version 2.0-0, OneMap
can also deal with
inbred-based populations, that is, populations that have homozygous
parental lines in the genealogy (F2s, backcrosses, and RILs). As a
consequence, linkage phases do not need to be estimated. Since version
2.3.0, phases are estimated for F2 populations to properly generate
progeny haplotypes not only the recombination fraction.
In this vignette, we explain how to proceed with the analysis in an F2 population. The same procedure can be used for backcrosses and RILs as well, and therefore users should not have any difficulty in analyzing their data. However, there are a number of differences from genetic mapping in outcrossing species; please read the proper vignette.
If you are not familiar with R
, we recommend first the
reading of vignette Introduction
to R. You do not need to be an expert in R to build your linkage
map, but some concepts are necessary and will help you through the
process.
There is a GitHub OneMap
version which is constantly
improved, we strongly recommend all users to try this version.
In Taniguti et al., 2023, we introduced updates to OneMap 3.0 and evaluated various combinations of bioinformatics tools and filters for obtaining high-quality markers and linkage maps. The paper provides practical guidelines based on these results.
The Reads2Map workflow enables you to replicate these tests on your dataset. This comprehensive pipeline integrates tools such as BWA, GATK, FreeBayes, TASSEL, Stacks, updog, polyRAD, superMASSA, GUSMap, and OneMap to perform read alignment, SNP and genotype calling, and linkage mapping.
Once you have identified the best pipeline for your data—using Reads2Map or another approach—and generated a VCF file, you can follow this tutorial to conduct the analysis on your complete dataset.
For F2s, backcrosses, and RILs, two input formats are accepted. The
user can choose between the standard OneMap
file format or
the same raw file used by MAPMAKER/EXP
(Lander et al.,
1987). Therefore, one should have no difficulty in using data sets
already available for MAPMAKER/EXP
when deciding to try
OneMap
.
Both types of raw files can contain phenotypic information, but this will not be used during map construction, that requires only genotypic information (made available by molecular markers).
MAPMAKER/EXP
data fileThe MAPMAKER/EXP
raw file, combined with the map file
produced by OneMap
, can be readily used for QTL mapping
using R/qtl
(Broman et al., 2008) or
QTL Cartographer
(Wang et al., 2010), among others.
Here, we briefly present how to set up this data file. For more
detailed information see the MAPMAKER/EXP
manual (Lincon et
al., 1993), available here.
The first line of your data file should be:
data type xxxx
where xxxx
is one of the following data types:
xxxx |
Population type |
---|---|
f2 backcross |
Backcross |
f2 intercross |
F2 |
ri self |
RIL, produced by selfing |
ri sib |
RIL, produced by sib mating |
The second line should contain the number of individuals in the progeny, the number of markers, and the number of quantitative traits. So, for example, a valid line would be
10 5 2
for a data set with 10 individuals (yes, very small, but this is just an example), 5 markers, and 2 traits evaluated.
Then, the genotype information is included for each marker. The
character *
indicates the beginning of information of a
marker, followed by the marker name. For instance, here is an example of
such a file for an F2 population with 10 individuals, 5 markers, and 2
quantitative traits:
data type f2 intercross
10 5 2
*M1 A B H H A - B A A B
*M2 C - C C C - - C C A
*M3 D B D D - - B D D B
*M4 C C C - A C C A A C
*M5 C C C C C C C C C C
*weight 10.2 - 9.4 11.3 11.9 8.9 - 11.2 7.8 8.1
*length 1.7 2.1 - 1.8 2.0 1.0 - 1.7 1.0 1.1
The codification for genotypes is the following:
Code | Meaning |
---|---|
A |
homozygous for allele A (from parent 1 - AA) |
B |
homozygous for allele B (from parent 2 - BB) |
H |
heterozygous carrying both alleles (AB) |
C |
Not homozygous for allele A (Not AA) |
D |
Not homozygous for allele B (Not BB) |
- |
Missing data for the individual at this marker |
The symbols
option (not included in this example), used
in MAPMAKER/EXP
files, is also accepted (please, see its
manual for details).
The quantitative trait data should come after the genotypic data and
has a similar format, except the trait values for each individual must
be separated by at least one space, a tab, or a line break. A dash
(-
) indicates missing data.
This file must be saved in plain text format using a simple text
editor such as notepad on Microsoft Windows.
Historically, MAPMAKER/EXP
uses the .raw
extension for this file; however, you can use any other extensions, such
as .txt
.
If you want to see more examples about this file type, open
mapmaker_example_bc.raw
and
mapmaker_example_f2.raw
, both available with
OneMap
and saved in the directory extdata
on
your computer, in the folder where you installed OneMap
(use system.file(package="onemap")
to see where it is
located on your computer).
Now, let us load OneMap
:
To save your project anytime, type:
if you are using Windows, otherwise, adapt the code. Notice that you need to specify where to save and the name of the file. You can also use the toolbar, of course.
The OneMap
data file has few differences compared to
MAPMAKER/EXP format. As MAPMAKER/EXP format, the input
OneMap
file is a text file, where the first line indicates
the cross-type and the second line provides information about the number
of individuals, the number of markers, and the number of quantitative
traits. Here, the format also supports keeping physical markers location
information. The followed numbers indicate the presence/absence (1/0) of
chromosome and position information and the presence/absence(1/0) of
phenotypic data.
The third line contains sample IDs. Then, the genotype information is
included separately for each marker. The character *
indicates the beginning of information input for a new marker, followed
by the marker name. Next, there is a code indicating the marker type
according to:
Code | Type |
---|---|
A.H.B |
Codominant marker |
C.A |
Dominant marker for allele B |
D.B |
Dominant marker for allele A |
A.H |
Marker for backcross |
A.B |
Marker for ril self/sib cross |
Finally, after each marker type, comes the genotype data for the
segregating population. Missing data are indicated with the character
-
(minus sign) and an empty space separates the information
for each individual. Positions and phenotype information, if present,
follows genotypic data with a similar structure. Details are found with
the help of function read_onemap
.
Here is an example of such file for 10 individuals, 5 markers (the two zeros in the second line indicate that there is no chromosome information, physical position information), and two phenotypic data, respectively). It is very similar to a MAPMAKER/EXP file, but has additional information about the cross_type.
data type f2 intercross
10 5 0 0 2
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 A.H.B ab a - ab b a ab - ab b
*M2 A.H.B a - ab ab - b a - a ab
*M3 C.A c a a c c - a c a c
*M4 A.H.B ab b - ab a b ab b - a
*M5 D.B b b d - b d b b b d
*fen1 10.3 11.2 11.1 - 9.8 8.9 11.0 10.7 - 10.1
*fen2 42 49 - 45 51 42 28 32 38 40
In case you have physical chromosome and position information:
data type f2 intercross
10 5 1 1 2
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 A.H.B ab a - ab b a ab - ab b
*M2 A.H.B a - ab ab - b a - a ab
*M3 C.A c a a c c - a c a c
*M4 A.H.B ab b - ab a b ab b - a
*M5 D.B b b d - b d b b b d
*CHROM 1 1 1 2 2
*POS 2391 3812 5281 1823 3848
*fen1 10.3 11.2 11.1 - 9.8 8.9 11.0 10.7 - 10.1
*fen2 42 49 - 45 51 42 28 32 38 40
The input file must be saved in text format, with extensions like
.raw
. It is a good idea to open the text files called
onemap_example_f2.raw
, onemap_example_bc
,
onemap_example_riself
(available in extdata
with OneMap
and saved in the directory you installed it) to
see how this file should be. You can see where OneMap
is
installed using the command
system.file(package="onemap")
.
MAPMAKER/EXP
fileOnce you created your data file with raw data, you can use
OneMap
function read_mapmaker
to import it to
OneMap
:
The first argument is the directory where the input file is located, modify it accordingly. The second one is the data file name.
In this example, an object named mapmaker_example_f2.raw
was created. Notice that if you leave the argument dir
blank, the file will be read from your current working
directory. To set a working directory, see Introduction
to R (Importing and Exporting Data).
mapmaker_example_f2 <- read_mapmaker(file= system.file("extdata/mapmaker_example_f2.raw",
package = "onemap"))
For this example, we will use a simulated data set from an F2
population which is distributed along with OneMap
. Because
this particular data set is distributed along with the package, you can
load it by typing:
To see what this data set is about, type:
mapmaker_example_f2
#> This is an object of class 'onemap'
#> Type of cross: f2
#> No. individuals: 200
#> No. markers: 66
#> CHROM information: no
#> POS information: no
#> Percent genotyped: 85
#>
#> Segregation types:
#> AA : AB : BB --> 36
#> Not AA : AA --> 15
#> Not BB : BB --> 15
#>
#> No. traits: 1
#> Missing trait values:
#> Trait_1: 0
As you can see, the data consists of a sample of 200 individuals
genotyped for 66 markers (36 co-dominant (AA
,
AB
or BB
), 15 dominant in one parent
(Not AA
or AA
) and 15 dominant in the other
parent (Not BB
or BB
) with 15% of missing
data. You can also see that there is phenotypic information for one
trait in the data set, that can be used for QTL mapping.
OneMap
raw fileThe same procedure is made for the OneMap
raw file, but,
instead of using the function read_mapmaker
we use
read_onemap
to read the OneMap
format.
In this example, an object named onemap_example_f2.raw
was created. The data set containing the same markers and individuals of
the mapmaker_example_f2.raw
file. Would be a good idea to
open these two files in a text editor and compare them to better
understand the differences between the two kinds of input files. We can
read the onemap_example_f2.raw
using:
onemap_example_f2 <- read_onemap(inputfile= system.file("extdata/onemap_example_f2.raw",
package = "onemap"))
Or, because this particular data are available together with the
OneMap
package:
To see what this data set is about, type:
onemap_example_f2
#> This is an object of class 'onemap'
#> Type of cross: f2
#> No. individuals: 200
#> No. markers: 66
#> CHROM information: no
#> POS information: no
#> Percent genotyped: 85
#>
#> Segregation types:
#> AA : AB : BB --> 36
#> Not AA : AA --> 15
#> Not BB : BB --> 15
#>
#> No. traits: 1
#> Missing trait values:
#> Trait_1: 0
As you can see, the mean difference in the output object is that the
read_onemap
function keeps chromosome and position
information. Because the objects mapmaker_example_f2
and
onemap_example_f2
are practically the same, from now we
will use only onemap_example_f2
.
VCF
fileIf you are working with biallelic markers, as SNPs and indels (only
codominant markers A.H.B), in VCF (Variant Call Format) files, you can
import information from VCF
to OneMap
using
onemap_read_vcfR
function.
With the onemap_read_vcfR
you can convert VCF file
directly to onemap
. The onemap_read_vcfR
function keeps chromosome and position information for each marker at
the end of the raw file.
We will use the same example file vcf_example_f2.vcf
to
show how it works.
Here we use the the vcfR
package internally to help this
conversion. The vcfR
authors mentioned in their tutorials
that RAM memory use is an important consideration when using the
package. Depending of your dataset, the object created can be huge and
occupy a lot of memory.
You can use onemap_read_vcfR
function to convert the VCF
file to onemap
object. The parameters used are the
vcf
with the VCF file path, the identification of each
parent (here, you must define only one sample for each parent) and the
cross type.
vcf_example_f2 <- onemap_read_vcfR(vcf = system.file("extdata/vcf_example_f2.vcf.gz", package = "onemap"),
parent1 = "P1",
parent2 = "P2",
cross = "f2 intercross")
Depending on your dataset, this function can take some time to run.
NOTE: From version 2.0.6 to 2.1.1005,
OneMap
had the vcf2raw
function to convert
vcf
to .raw
. Now, this function is defunct,
but it can be replaced by a combination of onemap_read_vcfR
and write_onemap_raw
functions. See Exporting .raw file from
onemap object session to further information about
write_onemap_raw
.
Before building your linkage map, you should take a look at your data
set. First, notice that by reading the raw data into
OneMap
, an object of classes onemap
and
f2
was produced:
class(onemap_example_f2)
#> [1] "onemap" "f2"
class(vcf_example_f2)
#> f2 intercross
#> "onemap" "f2"
In fact, functions read_mapmaker
and
read_onemap
will produce objects of classes
backcross
, riself
, risib
or
f2
, according to the information in the data file for
inbred-based populations. Therefore, you can use OneMap
’s
version of function plot
to produce a graphic with
information about the raw data. It will automatically recognize the
class of the object and produce the graphic. To see it in action,
try:
The graphic is self-explanatory. If you want to save it, see the help
for function plot.onemap
:
This graphic shows that missing data is somehow randomly distributed;
also, the proportion of dominant markers is relatively high for this
data set. In OneMap
’s notation, codominant markers are
classified as of B type; dominant ones, by C type (for details about
this notation, see the vignette for outcrossing species). You can see
the number of loci within each type using function
plot_by_segreg_type
:
So, as shown before, the object onemap_example_f2
has 36
codominant markers and 30 dominant ones and the
vcf_example_f2
has only codominant markers.
Function create_depth_profile
generates dispersion
graphics with x and y axis representing, respectively, the reference and
alternative allele depths. The function is only available for biallelic
markers in VCF files with allele counts information. Each dot represents
a genotype for mks
markers and inds
individuals. If both arguments receive NULL
, all markers
and individuals are considered. Dots are colored according to the
genotypes present in the OneMap
object
(GTfrom = onemap
) or in the VCF file
(GTfrom = vcf
). An rds file is generated with the data in
the graphic (rds.file
). The alpha
argument
controls the transparency of the color of each dot. Control this
parameter is a good idea when having a big amount of markers and
individuals. The x_lim
and y_lim
control the
axis scale limits, by default, it uses the maximum value of the
counts.
Here is an example of the simulated dataset.
simu_f2_obj <- onemap_read_vcfR(vcf = system.file("extdata/vcf_example_f2.vcf.gz", package="onemap"),
cross = "f2 intercross",
parent1 = "P1", parent2 = "P2")
#> 1 Markers were removed from the dataset because one or both of the parents are heterozygotes, we do not expect heterozygotes parents in F2 populations.
library(vcfR)
vcfR_object <- read.vcfR(system.file("extdata/vcf_example_f2.vcf.gz", package="onemap"))
#> Scanning file to determine attributes.
#> File attributes:
#> meta lines: 8
#> header_line: 9
#> variant count: 26
#> column count: 203
#> Meta line 8 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#> Character matrix gt rows: 26
#> Character matrix gt cols: 203
#> skip: 0
#> nrows: 26
#> row_num: 0
#> Processed variant: 26
#> All variants processed
create_depths_profile(onemap.obj = simu_f2_obj,
vcfR.object = vcfR_object,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_f2.rds")
Selecting genotypes from vcf
(GTfrom = “vcf”) the colors
are separated by VCF genotypes: 0/0
homozygote for the
reference allele, 0/1
heterozygote, and 1/1
homozygote for the alternative allele. Depending on your VCF, you can
also have phased genotypes, which are presented by pipe (|) instead of
the bar (/).
By default, OneMap
sets a error probability of \(10^{-5}\) for every genotype:
head(simu_f2_obj$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 0.00001 9.999900e-01 9.999900e-01 1.000000e-05
#> SNP2_IND1 0.00001 9.999900e-01 9.999900e-01 1.000000e-05
#> SNP3_IND1 1.00000 1.000000e+00 1.000000e+00 1.000000e+00
#> SNP5_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP6_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP7_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
See Taniguti et. al, 2023 Supplementary File 1 for details about the
$error
object format.
For markers from sequencing technology, this value is unrealistic and
generates inflated linkage maps. OneMap
3.0 can consider
three types of genotype probabilities to consider error in the HMM
chain. A single global value (global_error), a matrix with dimensions
(number of individuals) x (number of markers) with genotypes errors
values (genotypes_errors), and a matrix with dimensions (number of
individuals)*(number of markers) x possible genotypes (genotypes_probs).
See details in the function create_probs
:
If you have a VCF file, you can use the function
extract_depth
to obtain the genotypes_errors and
genotypes_probs from the GQ or PL or GL format field:
genotypes_errors <- extract_depth(vcfR.object = vcfR_object,
onemap.object=simu_f2_obj,
vcf.par= "GQ",
parent1="P1",
parent2="P2",
recovering=FALSE)
genotypes_errors[10:50, 1:5]
#> SNP1 SNP2 SNP3 SNP5 SNP6
#> IND12 2.511886e-10 1.584893e-10 2.511886e-10 1.258925e-10 1.258925e-10
#> IND13 NA NA 1.258925e-10 NA NA
#> IND14 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND15 1.258925e-10 1.258925e-10 2.511886e-10 3.981072e-10 1.258925e-10
#> IND16 3.981072e-10 2.511886e-10 NA 2.511886e-10 1.584893e-10
#> IND17 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.584893e-10
#> IND18 1.258925e-10 1.258925e-10 1.584893e-10 1.584893e-10 1.584893e-10
#> IND19 1.258925e-10 1.584893e-10 1.258925e-10 1.584893e-09 1.258925e-10
#> IND20 2.511886e-10 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND21 2.511886e-10 3.981072e-10 1.258925e-10 2.511886e-10 1.258925e-10
#> IND22 1.584893e-09 3.981072e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND23 1.584893e-10 3.981072e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND25 1.258925e-10 1.258925e-10 1.584893e-10 1.258925e-10 1.258925e-10
#> IND26 3.981072e-10 1.258925e-08 1.258925e-10 1.258925e-10 1.258925e-10
#> IND27 NA NA 1.258925e-10 1.258925e-10 3.981072e-10
#> IND28 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND30 NA 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND31 1.258925e-10 1.258925e-10 NA 1.258925e-10 1.258925e-10
#> IND32 1.258925e-10 1.584893e-09 NA 1.258925e-10 1.258925e-10
#> IND33 1.258925e-10 2.511886e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND34 3.981072e-10 1.584893e-09 1.258925e-10 1.258925e-10 1.258925e-10
#> IND35 1.258925e-10 1.258925e-10 NA 1.258925e-10 1.258925e-10
#> IND36 NA NA 1.258925e-10 NA NA
#> IND37 1.258925e-10 1.258925e-10 NA 1.258925e-10 1.258925e-10
#> IND38 3.981072e-10 1.258925e-10 1.258925e-10 2.511886e-10 1.584893e-10
#> IND39 1.258925e-10 2.511886e-10 3.981072e-10 1.584893e-10 1.584893e-09
#> IND40 1.258925e-10 1.258925e-10 1.258925e-10 NA NA
#> IND42 1.584893e-10 1.258925e-10 1.258925e-10 3.981072e-10 2.511886e-10
#> IND43 NA NA 1.258925e-10 3.981072e-10 2.511886e-10
#> IND44 1.584893e-10 3.981072e-10 1.584893e-09 1.584893e-09 1.258925e-10
#> IND45 1.258925e-10 1.258925e-10 NA 1.258925e-10 1.258925e-10
#> IND46 2.511886e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND47 NA NA 2.511886e-10 1.584893e-09 1.584893e-10
#> IND48 1.258925e-10 1.258925e-10 1.258925e-10 NA NA
#> IND49 1.258925e-10 1.258925e-10 NA 1.258925e-10 3.981072e-10
#> IND50 NA NA 1.258925e-10 NA 3.981072e-10
#> IND51 NA NA 1.584893e-09 3.981072e-10 2.511886e-10
#> IND52 3.981072e-10 1.584893e-10 2.511886e-10 1.258925e-10 1.258925e-10
#> IND53 2.511886e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND54 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND55 1.258925e-08 2.511886e-10 1.258925e-10 NA 1.258925e-08
onemap_obj_errors <- create_probs(simu_f2_obj, genotypes_errors = genotypes_errors)
head(onemap_obj_errors$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 1.258925e-10 1.000000e+00 1.000000e+00 1.258925e-10
#> SNP2_IND1 1.258925e-10 1.000000e+00 1.000000e+00 1.258925e-10
#> SNP3_IND1 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> SNP5_IND1 1.000000e+00 5.282977e-11 5.282977e-11 5.282977e-11
#> SNP6_IND1 1.000000e+00 1.327024e-10 1.327024e-10 1.327024e-10
#> SNP7_IND1 1.000000e+00 5.282977e-11 5.282977e-11 5.282977e-11
genotypes_probs <- extract_depth(vcfR_object,
simu_f2_obj,
vcf.par = "PL",
parent1 = "P1",
parent2 = "P2")
genotypes_probs[1:5, ]
#> [,1] [,2] [,3]
#> SNP1_IND1 7.943282e-12 1.00000000 1.258925e-22
#> SNP2_IND1 3.981072e-15 1.00000000 2.511886e-22
#> SNP3_IND1 1.000000e+00 1.00000000 1.000000e+00
#> SNP5_IND1 9.843983e-01 0.01560166 2.472697e-22
#> SNP6_IND1 9.406491e-01 0.05935094 3.744791e-15
onemap_obj_probs <- create_probs(simu_f2_obj, genotypes_probs = genotypes_probs)
head(onemap_obj_probs$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 7.943282e-12 1.00000000 1.00000000 1.258925e-22
#> SNP2_IND1 3.981072e-15 1.00000000 1.00000000 2.511886e-22
#> SNP3_IND1 1.000000e+00 1.00000000 1.00000000 1.000000e+00
#> SNP5_IND1 9.843983e-01 0.01560166 0.01560166 2.472697e-22
#> SNP6_IND1 9.406491e-01 0.05935094 0.05935094 3.744791e-15
#> SNP7_IND1 9.843983e-01 0.01560166 0.01560166 2.472697e-22
According to results from Reads2Map using a global error rate can be the best solution in most cases. However, the software probabilities are useful to filter markers before starting the linkage map building:
hist(genotypes_errors) # Check distribution to define threshold - it will change according to the software used for genotype calling
summary(as.vector(genotypes_errors))
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0 0 0 0 0 0 676
onemap_obj_prob_filt <- filter_prob(onemap_obj_probs, threshold = 0.9)
#> 296 genotypes were converted to missing data.
Now, we can set the genotype probabilities according to the selected global error:
Note: This section is included solely to demonstrate the possibility of adjusting the error probability. This adjustment is not applied in subsequent steps of this tutorial.
If you have multiallelic markers (MNPs) in your VCF file, set the
only_biallelics
to FALSE
. Here, we have the
multiallelic markers in a separated file:
onemap_obj_multi <- onemap_read_vcfR("roses_populations.haps.new.names.vcf", parent1 = "PH", parent2 = "J14-3", cross = "outcross", only_biallelic = FALSE) # Example data not available
onemap_obj_multi
6659 Markers were removed of the dataset because one or both of parents have no informed genotypes (are missing data)
8351 Markers were removed from the dataset because both of parents are homozygotes, these markers are considered non-informative in outcrossing populations.
This is an object of class 'onemap'
Type of cross: outcross
No. individuals: 138
No. markers: 7991
CHROM information: yes
POS information: yes
Percent genotyped: 81
Segregation types:
A.1 --> 830
A.2 --> 1891
B3.7 --> 548
D1.10 --> 1949
D1.9 --> 673
D2.14 --> 641
D2.15 --> 1459
No. traits: 0
This one does not have genotype probabilities information and erroneous multiallelic markers can generate higher impact in the map quality. Therefore, we will use a global error rate of 0.1 for them (value also defined using Reads2Map workflows):
OneMap
objectsIf you have more than one dataset of markers, all from the same
cross-type, you can use the function combine_onemap
to
merge them into only one onemap
object.
In our example, we have two onemap
objects:
onemap_example_f2
(equivalent to
mapmaker_example_f2
) with 66 markers and 200
individualsvcf_example_f2
with 25 biallelic markers and
192 individuals.The combine_function
recognizes the correspondent
individuals by the ID, thus, it is important to define exactly the same
IDs to respective individuals in both raw
files. Compared
with the first file, the second file does not have markers information
for 8 individuals. The combine_onemap
will complete that
information with NA.
In our examples, we have only genotypic information, but the function can also merge the phenotypic information if it exists.
comb_example <- combine_onemap(onemap_example_f2, vcf_example_f2)
comb_example
#> This is an object of class 'onemap'
#> Type of cross: f2
#> No. individuals: 200
#> No. markers: 91
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 84
#>
#> Segregation types:
#> AA : AB : BB --> 61
#> Not AA : AA --> 15
#> Not BB : BB --> 15
#>
#> No. traits: 1
#> Missing trait values:
#> Trait_1: 0
The function arguments are the names of the OneMap
objects you want to combine.
Plotting markers genotypes from the outputted OneMap
object, we can see that there are more missing data -
(black vertical lines) for some individuals because they were missing in
the second file.
It is possible that there are redundant markers in your dataset, especially when dealing with too many markers. Redundant markers have the same genotypic information that others markers, usually because didn’t happen recombination events between each other. They will not increase information on the map but will increase computational effort during the map building. Therefore, it is a good practice to remove them to build the map and, once the map is already built, they can be added again.
First, we use the function find_bins
to group the
markers into bins according to their genotypic information. In other
words, markers with the same genotypic information will be in the same
bin.
bins <- find_bins(comb_example, exact = FALSE)
bins
#> This is an object of class 'onemap_bin'
#> No. individuals: 200
#> No. markers in original dataset: 91
#> No. of bins found: 90
#> Average of markers per bin: 1.011111
#> Type of search performed: non exact
The first argument is the OneMap
object and the
exact
argument specifies if only markers with exact same
information will be at the same bin. Using FALSE
at this
second argument, missing data will not be considered and the marker with
the lowest amount of missing data will be the representative marker on
the bin.
Our example dataset has only two redundant markers. We can create a
new OneMap
object without them, using the
create_data_bins
function. This function keeps only the
most representative marker of each bin from the bins
object.
bins_example <- create_data_bins(comb_example, bins)
bins_example
#> This is an object of class 'onemap'
#> Type of cross: f2
#> No. individuals: 200
#> No. markers: 90
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 84
#>
#> Segregation types:
#> AA : AB : BB --> 60
#> Not AA : AA --> 15
#> Not BB : BB --> 15
#>
#> No. traits: 1
#> Missing trait values:
#> Trait_1: 0
The arguments for create_data_bins
function are the
OneMap
object and the object created by
find_bins
function.
OneMap
objectThe functions onemap_read_vcfR
generates new
OneMap
objects without use a input .raw
file.
Also, the function combine_onemap
manipulates the
information of the original .raw
file and creates a new
data set. In both cases, you do not have an input file .raw
that contains the same information as your current onemap object If you
want to create a new input file with the data set you are working on
after using these functions, you can use the function
write_onemap_raw
.
The file new_dataset.raw
will be generated in your
working directory. In our example, it contains markers from
onemap_example_f2
and vcf_example_f2
data
sets.
Now, it should be interesting to see if markers are segregating
following what is expected by Mendel’s law. You first need to use
function test_segregation
using as argument an object of
class onemap
.
This will produce an object of class
onemap_segreg_test
:
You cannot see the results if you simply type the object name; use
OneMap
’s version of the print
function for
objects of class onemap_segreg_test
:
(Nothing is shown!)
print(f2_test)
#> Marker H0 Chi-square p-value % genot.
#> 1 M1 1:2:1 0.206896552 0.901722662 87.0
#> 2 M2 3:1 0.292682927 0.588506355 82.0
#> 3 M3 3:1 0.124031008 0.724703003 86.0
#> 4 M4 3:1 0.292682927 0.588506355 82.0
#> 5 M5 3:1 0.159763314 0.689374522 84.5
#> 6 M6 3:1 0.620689655 0.430791121 87.0
#> 7 M7 1:2:1 3.185185185 0.203397600 81.0
#> 8 M8 3:1 0.165644172 0.684012345 81.5
#> 9 M9 3:1 0.126984127 0.721579725 84.0
#> 10 M10 1:2:1 1.721893491 0.422761645 84.5
#> 11 M11 3:1 1.390946502 0.238245323 81.0
#> 12 M12 1:2:1 1.678160920 0.432107681 87.0
#> 13 M13 3:1 0.154285714 0.694472964 87.5
#> 14 M14 1:2:1 5.337209302 0.069348924 86.0
#> 15 M15 3:1 0.403292181 0.525393917 81.0
#> 16 M16 1:2:1 2.036144578 0.361290733 83.0
#> 17 M17 1:2:1 1.000000000 0.606530660 81.0
#> 18 M18 1:2:1 3.413793103 0.181427972 87.0
#> 19 M19 3:1 0.279069767 0.597311573 86.0
#> 20 M20 3:1 4.462626263 0.034644194 82.5
#> 21 M21 1:2:1 4.892857143 0.086602329 84.0
#> 22 M22 1:2:1 2.862068966 0.239061489 87.0
#> 23 M23 3:1 0.008032129 0.928587488 83.0
#> 24 M24 3:1 8.649325626 0.003271824 86.5
#> 25 M25 1:2:1 0.686746988 0.709373215 83.0
#> 26 M26 1:2:1 7.022598870 0.029858091 88.5
#> 27 M27 1:2:1 0.108433735 0.947226662 83.0
#> 28 M28 1:2:1 3.319526627 0.190183989 84.5
#> 29 M29 1:2:1 0.161676647 0.922342801 83.5
#> 30 M30 1:2:1 0.607142857 0.738177160 84.0
#> 31 M31 3:1 0.007843137 0.929430415 85.0
#> 32 M32 3:1 0.016949153 0.896416968 88.5
#> 33 M33 1:2:1 0.640718563 0.725888192 83.5
#> 34 M34 3:1 0.235867446 0.627206926 85.5
#> 35 M35 3:1 0.606741573 0.436017310 89.0
#> 36 M36 3:1 0.272727273 0.601508134 88.0
#> 37 M37 1:2:1 3.169491525 0.204999905 88.5
#> 38 M38 1:2:1 3.055555556 0.217017393 90.0
#> 39 M39 3:1 0.163636364 0.685830434 82.5
#> 40 M40 1:2:1 4.872093023 0.087506123 86.0
#> 41 M41 1:2:1 4.253012048 0.119253235 83.0
#> 42 M42 3:1 0.124031008 0.724703003 86.0
#> 43 M43 1:2:1 0.927272727 0.628992237 82.5
#> 44 M44 1:2:1 2.000000000 0.367879441 86.0
#> 45 M45 3:1 0.008032129 0.928587488 83.0
#> 46 M46 1:2:1 0.740112994 0.690695307 88.5
#> 47 M47 1:2:1 0.758620690 0.684333200 87.0
#> 48 M48 1:2:1 2.122807018 0.345969898 85.5
#> 49 M49 3:1 1.190476190 0.275233524 87.5
#> 50 M50 3:1 1.068686869 0.301242240 82.5
#> 51 M51 3:1 0.031746032 0.858586201 84.0
#> 52 M52 1:2:1 2.684848485 0.261211660 82.5
#> 53 M53 3:1 0.017142857 0.895830102 87.5
#> 54 M54 1:2:1 0.772455090 0.679615865 83.5
#> 55 M55 1:2:1 0.655172414 0.720661163 87.0
#> 56 M56 1:2:1 2.310734463 0.314941859 88.5
#> 57 M57 1:2:1 6.159763314 0.045964696 84.5
#> 58 M58 3:1 2.050847458 0.152121494 88.5
#> 59 M59 3:1 0.074074074 0.785494747 81.0
#> 60 M60 3:1 0.050505051 0.822186767 82.5
#> 61 M61 1:2:1 0.655172414 0.720661163 87.0
#> 62 M62 1:2:1 1.047337278 0.592343463 84.5
#> 63 M63 1:2:1 2.147928994 0.341651353 84.5
#> 64 M64 3:1 0.238658777 0.625176499 84.5
#> 65 M65 1:2:1 4.571428571 0.101701392 84.0
#> 66 M66 1:2:1 0.452513966 0.797513128 89.5
#> 67 SNP1 1:2:1 1.278787879 0.527612092 82.5
#> 68 SNP2 1:2:1 1.559523810 0.458515169 84.0
#> 69 SNP3 1:2:1 0.503105590 0.777592404 80.5
#> 70 SNP5 1:2:1 4.450000000 0.108067419 80.0
#> 71 SNP6 1:2:1 2.641975309 0.266871595 81.0
#> 72 SNP7 1:2:1 3.745341615 0.153712576 80.5
#> 73 SNP8 1:2:1 4.164705882 0.124636604 85.0
#> 74 SNP9 1:2:1 2.578313253 0.275503037 83.0
#> 75 SNP10 1:2:1 3.359281437 0.186440949 83.5
#> 76 SNP11 1:2:1 2.395061728 0.301938820 81.0
#> 77 SNP12 1:2:1 4.731707317 0.093869134 82.0
#> 78 SNP13 1:2:1 2.963855422 0.227199291 83.0
#> 79 SNP14 1:2:1 7.123456790 0.028389714 81.0
#> 80 SNP15 1:2:1 3.109090909 0.211285400 82.5
#> 81 SNP16 1:2:1 2.751552795 0.252643368 80.5
#> 82 SNP17 1:2:1 5.850299401 0.053656659 83.5
#> 83 SNP18 1:2:1 3.108433735 0.211354837 83.0
#> 84 SNP19 1:2:1 3.335403727 0.188680181 80.5
#> 85 SNP20 1:2:1 3.867469880 0.144607090 83.0
#> 86 SNP21 1:2:1 3.238095238 0.198087264 84.0
#> 87 SNP22 1:2:1 4.455621302 0.107764105 84.5
#> 88 SNP23 1:2:1 2.847058824 0.240862412 85.0
#> 89 SNP24 1:2:1 1.778443114 0.410975549 83.5
#> 90 SNP26 1:2:1 1.981595092 0.371280460 81.5
This shows the results of the Chi-square test for the expected Mendelian segregation pattern of each marker locus. This depends of course on the marker type, because codominant markers can show heterozygous genotypes. The appropriate null hypothesis is selected by the function. The proportion of individuals genotyped is also shown.
To declare statistical significance, remember that you should
consider that multiple tests are being performed. To guide you in the
analysis, function Bonferroni_alpha
shows the alpha value
that should be considered for this number of loci if applying
Bonferroni’s correction with global alpha of 0.05:
You can subset object f2_test
to see which markers are
distorted under Bonferroni’s criterion, but it is easier to see the
proportion of markers that are distorted by drawing a graphic using
OneMap
’s version of the function plot
for
objects of class onemap_segreg_test
:
The graphic is self-explanatory: p-values were transformed
by using -log10(p-values)
for better visualization. A
vertical line shows the threshold for tests if Bonferroni’s correction
is applied. Significant and non-significant tests are identified. In
this particular example, no test was statistically significant, so none
will be discarded.
Please, remember that Bonferroni’s correction is conservative, and also that discarding marker data might not be a good approach to your analysis. This graphic is just to suggest a criterion, so use it with caution.
You can see a list of markers with non-distorted segregation using
function select_segreg
:
select_segreg(f2_test)
#> [1] "M1" "M2" "M3" "M4" "M5" "M6" "M7" "M8" "M9"
#> [10] "M10" "M11" "M12" "M13" "M14" "M15" "M16" "M17" "M18"
#> [19] "M19" "M20" "M21" "M22" "M23" "M24" "M25" "M26" "M27"
#> [28] "M28" "M29" "M30" "M31" "M32" "M33" "M34" "M35" "M36"
#> [37] "M37" "M38" "M39" "M40" "M41" "M42" "M43" "M44" "M45"
#> [46] "M46" "M47" "M48" "M49" "M50" "M51" "M52" "M53" "M54"
#> [55] "M55" "M56" "M57" "M58" "M59" "M60" "M61" "M62" "M63"
#> [64] "M64" "M65" "M66" "SNP1" "SNP2" "SNP3" "SNP5" "SNP6" "SNP7"
#> [73] "SNP8" "SNP9" "SNP10" "SNP11" "SNP12" "SNP13" "SNP14" "SNP15" "SNP16"
#> [82] "SNP17" "SNP18" "SNP19" "SNP20" "SNP21" "SNP22" "SNP23" "SNP24" "SNP26"
To get a list of distorted ones (none in this example):
It is not recommended, but you can define a different threshold value
by changing the threshold
argument of the function
select_segreg
.
For the next steps will be useful to know the numbers of each marker with segregation distortion, so then you can keep those out of your map building analysis. These numbers refer to the lines where markers are located on the data file.
To access the corresponding number for of these markers you can
change the numbers
argument:
no_dist <- select_segreg(f2_test, distorted = FALSE, numbers = TRUE) #to show the markers numbers without segregation distortion
no_dist
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#> [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
dist <- select_segreg(f2_test, distorted = TRUE, numbers = TRUE) #to show the markers numbers with segregation distortion
dist
#> integer(0)
After visualizing raw data and checking for segregation distortion, let us now estimate recombination fractions between all pairs of markers (two-point tests). This is necessary to allow us to test which markers are linked. At this point, you should pay no attention if markers show segregation distortion or not, that is, simply use all of them.
There are two optional arguments in function rf_2pts
:
LOD
and max.rf
which indicate the minimum LOD
Score and the maximum recombination fraction to declare linkage (they
default to 3.0 and 0.5, respectively).
The default for the recombination fraction is easy to understand,
because if max.rf < 0.5
we could state that markers are
linked. The LOD Score is the statistic used to evaluate the significance
of the test for max.rf = 0.50
. This needs to take into
consideration the number of tests performed, which of course depends on
the number of markers. Function suggest_lod
can help users
to find an initial value to use for their linkage test. For this
example:
Thus, one should consider using LOD = 4.145858
for the
tests. Please, notice that this is just a guide, not a value to take
without any further consideration. For now, we will keep the default
values, but later will show that results do not change in our example by
using LOD = 3
or LOD = 4.145858
.
If you want to see the results for a single pair of markers, say
M12
and M42
, use:
print(twopts_f2, c("M12", "M42"))
#> Results of the 2-point analysis for markers: M12 and M42
#> Criteria: LOD = 3 , Maximum recombination fraction = 0.5
#>
#> rf LOD
#> CC 0.6416905 1.9315611
#> CR 0.3858177 0.0302037
#> RC 0.3858177 0.0302037
#> RR 0.3583095 1.9315611
Since version 2.3.0, we estimate phases for F2 populations too, so here you can see the recombination fractions and LOD values for each possible phase. For RILs and backcross, you will obtain only one value.
This was possible because OneMap
has a version of the
print
function that can be applied to objects of class
rf_2pts
:
However, objects of this type are too complex to print if you do not specify a pair of markers:
print(twopts_f2)
#> This is an object of class 'rf_2pts'
#>
#> Criteria: LOD = 3 , Maximum recombination fraction = 0.5
#>
#> This object is too complex to print
#> Type 'print(object, c(mrk1=marker, mrk2=marker))' to see
#> the analysis for two markers
#> mrk1 and mrk2 can be the names or numbers of both markers
In this example we follow two different strategies:
Using only recombinations information.
Using the recombinations and also the reference genome
information, once our example has CHROM
and
POS
information for some of the markers.
First, we will apply the strategy using only recombinations information. In the second part of this tutorial, we show a way to use also reference genome information.
To assign markers to linkage groups, first, use the function
make_seq
to create a (un-ordered) sequence with all
markers:
Function make_seq
is used to create sequences from
objects of several different classes. Here, the first argument is of
class rf_2pts
and the second argument specifies which
markers one wants to use ("all"
indicates that all markers
will be analyzed). The object mark_all_f2
is of class
sequence
:
If you want to form groups with a subset of markers, say
M1
, M3
and M7
, use:
In this case, it was easy because marker names and order in the
objects (indicated in vector c(1, 3, 7)
) are closely
related, that is, you can easily know the position of markers in the
object once you know their names. However, this is not true for real
data sets, where markers do not have simple names such as
M1
or M2
.
A good example is to use the vector of markers without segregation distortion that we selected when applying the Chi-square tests.
In our example, there are no markers with segregation distortion,
then the object mark_no_dist_f2
is equivalent to
mark_all_f2
.
OneMap
has two functions to perform the markers
grouping. The first presented here is the group
function:
LGs_f2 <- group(mark_all_f2)
#> Selecting markers:
#> group 1
#> ...................................
#> group 2
#> .................
#> group 3
#> ...................................
LGs_f2
#> This is an object of class 'group'
#> It was generated from the object "mark_all_f2"
#>
#> Criteria used to assign markers to groups:
#> LOD = 3 , Maximum recombination fraction = 0.5
#>
#> No. markers: 90
#> No. groups: 3
#> No. linked markers: 90
#> No. unlinked markers: 0
#>
#> Printing groups:
#> Group 1 : 36 markers
#> M1 M3 M4 M6 M7 M9 M12 M13 M17 M23 M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44 M46 M53 M55 M58 M60 M63 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
#>
#> Group 2 : 18 markers
#> M2 M5 M8 M10 M11 M25 M32 M33 M37 M41 M43 M45 M51 M54 M61 M66 SNP24 SNP26
#>
#> Group 3 : 36 markers
#> M14 M15 M16 M18 M19 M20 M21 M22 M24 M28 M38 M39 M47 M48 M49 M50 M52 M56 M57 M59 M62 M64 M65 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23
This will show the linkage groups which are formed with criteria
defined by max.rf
and LOD
. These criteria are
applied as thresholds when assigning markers to linkage groups. If not
modified, the same values used for the object twopts
(from
two-point analysis) will be maintained (so, LOD = 3.0
and
max.rf = 0.5
in this example).
Users can easily change the default values. For example, using LOD
suggested by suggest_lod
(rounded up):
(LGs_f2 <- group(mark_all_f2, LOD = LOD_sug, max.rf = 0.5))
#> Selecting markers:
#> group 1
#> ...................................
#> group 2
#> .................
#> group 3
#> ...................................
#> This is an object of class 'group'
#> It was generated from the object "mark_all_f2"
#>
#> Criteria used to assign markers to groups:
#> LOD = 4.145858 , Maximum recombination fraction = 0.5
#>
#> No. markers: 90
#> No. groups: 3
#> No. linked markers: 90
#> No. unlinked markers: 0
#>
#> Printing groups:
#> Group 1 : 36 markers
#> M1 M3 M4 M6 M7 M9 M12 M13 M17 M23 M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44 M46 M53 M55 M58 M60 M63 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
#>
#> Group 2 : 18 markers
#> M2 M5 M8 M10 M11 M25 M32 M33 M37 M41 M43 M45 M51 M54 M61 M66 SNP24 SNP26
#>
#> Group 3 : 36 markers
#> M14 M15 M16 M18 M19 M20 M21 M22 M24 M28 M38 M39 M47 M48 M49 M50 M52 M56 M57 M59 M62 M64 M65 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23
In our case, nothing different happens. (The parentheses above are
just to avoid typing LGs_f2
in a new row to have the object
printed).
We can see that the markers were assigned to only one linkage group. This division isn’t so trustful, mostly if you are also working with dominant markers.
Another option is the group_upgma
function. It is an
adapted version of MAPpoly grouping
function.
You can define the expected number of groups in the
expected.groups
argument and check how the markers are
split in the plotted dendrogram. Using argument inter=TRUE
you can change interactively the number of groups defined by the red
squares in the graphic.
If you have reference genome information for some of your markers,
you can separate the groups using it and the group_seq
function (further information in Using the recombinations and the reference genome information
)
to add the markers without reference genome information. We can also
confirm the separation of linkage groups in the ordering procedure.
Notice the class of object LGs_f2
and
LGs_upgma
:
After assigning markers to linkage groups, the next step is to order the markers within each group.
First, let us choose the mapping function used to display the genetic map. We can choose between Kosambi or Haldane mapping functions. To use Haldane, type
To use Kosambi’s function:
If none is set, kosambi function is applied.
First, you must extract
it from the object of class
group
. Let us extract the group 1 using function
make_seq
:
The first argument is an object of class group
and the
second is a number indicating which linkage group will be extracted. In
this case, the object LGs_f2
, generated by function
group
, is of class group
. In fact, this
function can handle different classes of objects.
If you type:
LG1_f2
#>
#> Number of markers: 36
#> Markers in the sequence:
#> M1 M3 M4 M6 M7 M9 M12 M13 M17 M23 M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44
#> M46 M53 M55 M58 M60 M63 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
#>
#> Parameters not estimated.
you will see which markers are comprised in the sequence. But notice that no parameters have been estimated so far (the function says Parameters not estimated). This refers to the fact that so far we only attributed markers to linkage groups, but we did not perform any analysis for them as a group - only as pairs. (Does it seem complicated? Do not worry, you will understand the details in a moment).
Notice the class of object LG1_f2
:
To order markers in this group, you can use a two-point based algorithm such as Seriation (Buetow and Chakravarti, 1987), Rapid Chain Delineation (Doerge, 1996), Recombination Counting and Ordering (Van Os et al., 2005) and Unidirectional Growth (Tan and Fu, 2006):
LG1_rcd_f2 <- rcd(LG1_f2, hmm = FALSE)
LG1_rec_f2 <- record(LG1_f2, hmm = FALSE)
LG1_ug_f2 <- ug(LG1_f2, hmm = FALSE)
Argument hmm
defines if the function should run the HMM
chain multipoint approach to estimate the genetic distances given the
marker order provided by the two-points ordering algorithm. We set here
the argument hmm=FALSE
because we just want to obtain the
marker order. We are not yet estimating the genetic distances. We
suggest to use hmm=TRUE
only when you already decided which
order is the best because the HMM chain is the most computationally
intensive step in the map building (mainly if F2 intercross population).
You can use rf_graph_table
to check the ordering quality
(see details below) and make editions in the marker order using
drop_marker
. After, you can use map
or
map_avoid_unlinked
functions to estimate the genetic
distances (check session Map estimation for an
arbitrary order).
The algorithms provided different results (results not printed in this vignette). For an evaluation and comparison of these methods, see Mollinari et al. (2009).
In this particular case, seriation will return an expected error:
Another method that has been demonstrated to be very efficient in
ordering markers uses the multidimensional scaling (MDS) approach.
OneMap
has the mds_onemap
function that makes
an interface with the MDSMap package to apply this approach to order
markers. This is particularly useful if you are dealing with many
markers (up to 20). The method also provides diagnostics graphics and
parameters to find outliers to help users to filter the dataset. You can
find more information in MDSMap
vignette. Our
mds_onemap
does not present all possibilities of analysis
that the MDSMap package presents. See help page ?mds_onemap
to check which ones we implemented.
LG1_mds_f2 <- mds_onemap(input.seq = LG1_f2, hmm = F)
#> Stress: 0.269179553167368
#> Mean Nearest Neighbour Fit: 10.0424504966687
If you only specify the input sequence, mds_onemap will use the
default parameters. It will also generate the MDSMap input file in
out.file
file. You can use out.file
in the
MDSMap package to try other parameters too. The default method used is
the principal curves, know more about using ?mds_onemap
and
reading the MDSMap vignette. Besides
these algorithms use a two-point approach to order the markers, if you
set hmm=TRUE
a multipoint approach is applied to estimate
the genetic distances after the order is estimated. Thus, it can happen
that some markers are not considered linked when evaluated by multipoint
information, and the function will return an error like this:
ERROR: The linkage between markers 1 and 2 did not reach the OneMap default criteria. They are probably segregating independently
You can automatically remove these markers setting argument
rm_unlinked = TRUE
. The marker will be removed, and the
ordering algorithms will be restarted. Warning messages will inform
which markers were removed. If you don’t get warning messages, it means
that any marker needed to be removed. This is our case in this example,
but if you obtain an error or warning running your dataset, you already
know what happened.
NOTE: (new!) If
you are working with f2 intercross mapping population and have many
markers (more than 60), we suggest to first use hmm=FALSE to check the
ordering and after speed up mds
using BatchMap
parallelization approach. See section Speed up analysis with parallelization
for more information.
By now, we ordered our group in several ways, but, which one result in the best order? We can check it by plotting the color scale recombination fraction matrix and see if the generated maps obey the colors patterns expected.
It is possible to plot the recombination fraction matrix and LOD
Scores based on a color scale using the function
rf_graph_table
. This matrix can be useful to make some
diagnostics about the map.
Ordered markers are presented on both axes. Hotter the colors lower the recombination fraction between markers related to each cell. Good map orders have hot colors in the diagonal and it gradually turns to blue at the superior left and inferior right corners. This pattern means that markers that have low recombination fractions are really positioned closely on the map.
Let’s see the maps we have until now:
With default arguments, the graphic cells represent the recombination
fractions. If you change the argument to graph.LOD = TRUE
,
LOD score values are plotted. The color scale varies from red (small
distances and big LODs) to dark blue. You can also change the number of
colors from the rainbow
palette with argument
n.colors
, add/remove graphic main and axis title
(main
and lab.xy
), and shows marker numbers,
instead of names in the axis (mrk.axis
).
We can see that any of the algorithms gave an optimal ordering, there are many red cells out of the diagonal, the color pattern is broke in all graphics, but we need to do an effort to find which one of the algorithms gave the best result. Then, we continue the analysis by doing hand manipulations.
Here, mds
and record
approaches are the
ones that gave results more close to the expected pattern. We can also
see that probably there is more than one group, because of the big gaps.
Let’s see the map generated by the mds
algorithm with more
details using the interactive mode of the rf_graph_table
function:
An interactive version of the graphic will pop up (not shown here) in your internet browser end generated an HTML file in your work directory. Hover the mouse cursor over the cell corresponding to two markers, you can see some useful information about them.
Let’s separate the other groups and order again using the same method:
LG2_f2 <- make_seq(LGs_f2, 2)
LG2_mds_f2 <- mds_onemap(LG2_f2,hmm = FALSE)
#> Stress: 0.116483468692064
#> Mean Nearest Neighbour Fit: 8.68454410966198
LG3_f2 <- make_seq(LGs_f2, 3)
LG3_mds_f2 <- mds_onemap(LG3_f2,hmm = FALSE)
#> Stress: 0.21866828169345
#> Mean Nearest Neighbour Fit: 9.57620631087726
When possible (i.e., when groups have a small number of
markers, in general up to 10 or 11), one should select the best order by
comparing the multipoint likelihood of all possible orders between
markers (exhaustive search). This procedure is implemented in the
function compare
. Although feasible for up to 10 or 11
markers, with 7 or more markers it will take a couple of hours until you
see the results (depending of course on the computational resources
available).
All our groups have more than 7 markers, so using the function
compare
is infeasible. Thus we will apply a heuristic that
shows reliable results. First, we will choose a moderate number of
markers, say 6, to create a framework using the function
compare
, and then we will position the remaining markers
into this framework using the function try_seq
. The way we
choose these initial markers in inbred-based populations is somewhat
different from what we did for outcrossing populations, where there is a
mixture of segregation patterns (see the vignette for details).
In our scenario, we recommend two methods:
Randomly choose a number of markers and calculate the multipoint
likelihood of all possible orders (using the function
compare
). If the LOD Score of the second-best order is
greater than a given threshold, say, 3, then take the best order to
proceed with the next step. If not, repeat the procedure.
Use some two-point based algorithm to construct a map; then, take
equally spaced markers from this map. Then, create a framework of
ordered markers using the function compare
. Next, try to
map the remaining markers, one at a time, beginning with co-dominant
ones (most informative ones), then add the dominant ones.
You can do this procedure manually, in a similar way as done for
outcrossing species (see the vignette for details). However, this
procedure is automated in function order_seq
, which we will
use here (it will take some time to run):
LG1_f2_ord <- order_seq(input.seq = LG1_mds_f2, n.init = 5,
subset.search = "twopt",
twopt.alg = "rcd", THRES = 3)
The first argument is an object of class sequence
(LG1_mds_f2). n.init = 5
means that five markers will be
used in the compare
step. The argument
subset.search = "twopt"
indicates that these five markers
should be chosen by using a two-point method, which will be Rapid Chain
Delineation, as indicated by the argument
twopt.alg = "rcd"
. THRES = 3
indicates that
the try_seq
step will only add markers to the sequence
which can be mapped with LOD Score greater than 3.
Check the order obtained by this procedure:
Markers 5
, 66
, and 2
could not
be safely mapped to a single position (LOD Score > THRES
in absolute value). The output displays the safe
order and
the most likely positions for markers not mapped, where ***
indicates the most likely position, and *
corresponds to
other plausible positions. (If you are familiar with MAPMAKER/EXP, you
will recognize the representation).
To get the safe
order, use:
and to get the order with all markers (i.e., including the ones not mapped to a single position), use:
which places markers 5
, 66
, and
2
into their most likely positions.
Although some old publications presented maps with only
safe
orders, we see no reason not to use the option
force
and recommend it for users. In the next, steps we can
make modifications to the map based on more information.
The order_seq
function can perform two rounds of the
try_seq
step, first using THRES
and then
THRES - 1
as the threshold. This generally results in safe
orders with more markers mapped but takes longer to run. To do this,
type (it will take some time to run):
LG1_f2_ord <- order_seq(input.seq = LG1_mds_f2, n.init = 5,
subset.search = "twopt",
twopt.alg = "rcd", THRES = 3,
touchdown = TRUE)
#> | | | 0% | |= | 2% | |== | 3% | |==== | 5% | |===== | 7% | |====== | 8% | |======= | 10% | |======== | 12% | |========= | 13% | |========== | 15% | |============ | 17% | |============= | 18% | |============== | 20% | |=============== | 22% | |================ | 23% | |================== | 25% | |=================== | 27% | |==================== | 28% | |===================== | 30% | |====================== | 32% | |======================= | 33% | |======================== | 35% | |========================== | 37% | |=========================== | 38% | |============================ | 40% | |============================= | 42% | |============================== | 43% | |================================ | 45% | |================================= | 47% | |================================== | 48% | |=================================== | 50% | |==================================== | 52% | |===================================== | 53% | |====================================== | 55% | |======================================== | 57% | |========================================= | 58% | |========================================== | 60% | |=========================================== | 62% | |============================================ | 63% | |============================================== | 65% | |=============================================== | 67% | |================================================ | 68% | |================================================= | 70% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 75% | |====================================================== | 77% | |======================================================= | 78% | |======================================================== | 80% | |========================================================= | 82% | |========================================================== | 83% | |============================================================ | 85% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 90% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 95% | |==================================================================== | 97% | |===================================================================== | 98% | |======================================================================| 100%
The output is too big to be included here, so please try it to see
what happens. In short, for this particular sequence, the
touchdown
the step could additionally map markers
5
and 66
, but this depends on the dataset. Let
us continue our analysis using the order with all markers as suggested
by the function order_seq
:
(LG1_f2_final <- make_seq(LG1_f2_ord, "force"))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 55 M55 0.00 a | | b a | | b
#> 9 M9 7.42 o | | o a | | o
#> 27 M27 8.20 a | | b a | | b
#> 3 M3 19.85 o | | o a | | o
#> 4 M4 22.96 o | | a o | | o
#> 42 M42 29.13 o | | a o | | o
#> 53 M53 30.37 o | | o a | | o
#> 46 M46 34.76 a | | b a | | b
#> 1 M1 43.31 a | | b a | | b
#> 69 SNP3 48.45 a | | b a | | b
#> 30 M30 51.25 a | | b a | | b
#> 70 SNP5 61.54 a | | b a | | b
#> 7 M7 63.85 a | | b a | | b
#> 72 SNP7 65.66 a | | b a | | b
#> 71 SNP6 70.57 a | | b a | | b
#> 6 M6 77.59 o | | o a | | o
#> 13 M13 82.80 o | | o a | | o
#> 35 M35 89.63 o | | a o | | o
#> 58 M58 96.56 o | | a o | | o
#> 74 SNP9 105.07 a | | b a | | b
#> 73 SNP8 111.16 a | | b a | | b
#> 12 M12 112.76 a | | b a | | b
#> 75 SNP10 114.61 a | | b a | | b
#> 17 M17 119.91 a | | b a | | b
#> 31 M31 123.97 o | | o a | | o
#> 34 M34 132.10 o | | o a | | o
#> 63 M63 132.10 a | | b a | | b
#> 26 M26 139.89 a | | b a | | b
#> 40 M40 143.54 a | | b a | | b
#> 36 M36 150.49 o | | o a | | o
#> 68 SNP2 153.24 a | | b a | | b
#> 44 M44 155.11 a | | b a | | b
#> 67 SNP1 157.77 a | | b a | | b
#> 29 M29 166.35 a | | b a | | b
#> 60 M60 169.70 o | | a o | | o
#> 23 M23 177.05 o | | o o | | a
#>
#> 36 markers log-likelihood: -2468.139
rf_graph_table(LG1_f2_final)
Finally, to check for alternative orders, use the
ripple_seq
function:
The second argument, ws = 5
, means that subsets
(windows) of five markers will be permuted sequentially (5! orders for
each window), to search for other plausible orders. The LOD
argument means that only orders with LOD Score smaller than 3 will be
printed.
The output shows sequences of five numbers, because
ws = 5
. They can be followed by an OK
, if
there are no alternative orders with LOD Scores smaller than
LOD = 3
in absolute value, or by a list of alternative
orders.
In this example, the first two windows showed alternative orders with
LOD smaller than LOD = 3. However, the best order was that obtained with
the order_seq function (LOD = 0.00). If there were an alternative order
more likely than the original, one should check the difference between
them and, if necessary, change the order with (for example) functions
drop_marker
(see Section about using an arbitrary order) and
try_seq
, or simply by typing the new order. For that, use
LG2_f2_final$seq.num
to obtain the original order; then
make the necessary changes (by copying and pasting) and use the function
map to reestimate the genetic map for the new order.
Here we will use the multipoint approach to re-order the LG2. The
approach is the automatic usage of the try
algorithm:
LG2_f2_ord <- order_seq(input.seq = LG2_mds_f2, n.init = 5,
subset.search = "twopt",
twopt.alg = "rcd", THRES = 3,
touchdown = TRUE, rm_unlinked = TRUE)
Get the order with all markers:
(LG2_f2_final <- make_seq(LG2_f2_ord, "force"))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 41 M41 0.00 a | | b a | | b
#> 10 M10 8.48 a | | b a | | b
#> 2 M2 15.92 o | | a o | | o
#> 11 M11 18.70 o | | a o | | o
#> 43 M43 22.22 a | | b a | | b
#> 32 M32 26.58 o | | o a | | o
#> 45 M45 31.50 o | | a o | | o
#> 54 M54 34.03 a | | b a | | b
#> 66 M66 40.90 a | | b a | | b
#> 61 M61 42.58 a | | b a | | b
#> 25 M25 48.19 a | | b a | | b
#> 5 M5 51.79 o | | a o | | o
#> 51 M51 56.09 o | | a o | | o
#> 90 SNP26 64.25 a | | b a | | b
#> 33 M33 65.61 a | | b a | | b
#> 89 SNP24 70.35 a | | b a | | b
#> 37 M37 78.54 a | | b a | | b
#> 8 M8 84.79 o | | a o | | o
#>
#> 18 markers log-likelihood: -1259.53
rf_graph_table(LG2_f2_final)
Our heatmap is already very good with an automatic approach, but I
may want to see what happens if I remove a marker and try to reposition
it. To remove a marker use function drop_marker
:
After, you need to re-estimate the parameters using the same previous
order. For that, use the map
function:
Warning: If you find an error message like:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
It’s because the map
function has a very common name and
you can have in your environment other functions with the same name. In
the case of the presented error, R is using the map
function from purrr
package instead of OneMap
,
to solve this, simply specify that you want the OneMap
function with ::
command from stringr
package:
NOTE: (new!) If
you are working with f2 intercross mapping population and have many
markers (more than 60), we suggest to speed up map
using
BatchMap parallelization approach. See section Speed up analysis with parallelization
for more information.
See what happened:
And try to reposition the marker:
(LG2_temp <- try_seq(input.seq = LG2_edit_map, mrk = 41))
#>
#> LOD scores correspond to the best linkage phase combination
#> for each position
#>
#> The symbol "*" outside the box indicates that more than one
#> linkage phase is possible for the corresponding position
#>
#>
#> Marker tested: 41
#>
#> Markers LOD
#> ======================
#> | |
#> | -5.71 | 1
#> | 10 |
#> | -16.06 | 2
#> | 2 |
#> | -25.02 | 3
#> | 11 |
#> | -25.16 | 4
#> | 43 |
#> | -52.92 | 5
#> | 32 |
#> | 0.00 | 6
#> | 45 |
#> | -26.22 | 7
#> | 54 |
#> | -52.67 | 8
#> | 66 |
#> | -83.46 | 9
#> | 61 |
#> | -69.26 | 10
#> | 25 |
#> | -65.43 | 11
#> | 5 |
#> | -71.72 | 12
#> | 51 |
#> | -62.45 | 13
#> | 90 |
#> | -100.63 | 14
#> | 33 |
#> | -89.63 | 15
#> | 89 |
#> | -78.51 | 16
#> | 37 |
#> | -56.98 | 17
#> | 8 |
#> | -37.40 | 18
#> | |
#> ======================
The result shows us that the best position for this marker (with higher LOD) is the 1 (as before). Let’s now include the marker at this position using:
Check the final map (results not shown):
Print it:
LG2_f2_final
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 41 M41 0.00 a | | b a | | b
#> 10 M10 8.45 a | | b a | | b
#> 2 M2 15.87 o | | a o | | o
#> 11 M11 18.57 o | | a o | | o
#> 43 M43 21.86 a | | b a | | b
#> 32 M32 25.03 o | | o a | | o
#> 45 M45 33.81 a | | o o | | o
#> 54 M54 111.78 b | | a a | | b
#> 66 M66 118.29 b | | a a | | b
#> 61 M61 119.91 b | | a a | | b
#> 25 M25 125.64 b | | a a | | b
#> 5 M5 127.70 a | | o o | | o
#> 51 M51 132.62 a | | o o | | o
#> 90 SNP26 141.95 b | | a a | | b
#> 33 M33 143.29 b | | a a | | b
#> 89 SNP24 148.02 b | | a a | | b
#> 37 M37 156.20 b | | a a | | b
#> 8 M8 162.38 a | | o o | | o
#>
#> 18 markers log-likelihood: -1410.762
This is the final version of the map for this linkage group.
Automatic usage of try algorithm.
LG3_f2_ord <- order_seq(input.seq = LG3_mds_f2, n.init = 5,
subset.search = "twopt",
twopt.alg = "rcd", THRES = 3,
touchdown = TRUE)
A careful examination of the graphics can be a good source of information about how markers were placed.
Now, get the order with all markers:
(LG3_f2_final <- make_seq(LG3_f2_ord, "force"))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 47 M47 0.00 a | | b a | | b
#> 19 M19 7.71 o | | a o | | o
#> 39 M39 8.94 o | | o a | | o
#> 38 M38 15.62 a | | b a | | b
#> 49 M49 23.51 o | | o a | | o
#> 59 M59 23.77 o | | a o | | o
#> 80 SNP15 34.43 a | | b a | | b
#> 76 SNP11 38.32 a | | b a | | b
#> 28 M28 40.17 a | | b a | | b
#> 78 SNP13 43.74 a | | b a | | b
#> 83 SNP18 52.41 a | | b a | | b
#> 14 M14 54.23 a | | b a | | b
#> 85 SNP20 56.64 a | | b a | | b
#> 82 SNP17 60.76 a | | b a | | b
#> 16 M16 67.80 a | | b a | | b
#> 77 SNP12 78.67 a | | b a | | b
#> 65 M65 84.16 a | | b a | | b
#> 79 SNP14 89.34 a | | b a | | b
#> 62 M62 98.49 a | | b a | | b
#> 64 M64 104.83 o | | a o | | o
#> 21 M21 111.27 a | | b a | | b
#> 24 M24 114.08 o | | o a | | o
#> 15 M15 115.79 a | | o o | | o
#> 20 M20 121.55 o | | o a | | o
#> 50 M50 148.19 o | | o a | | o
#> 56 M56 149.51 a | | b a | | b
#> 87 SNP22 161.21 a | | b a | | b
#> 86 SNP21 167.61 a | | b a | | b
#> 88 SNP23 172.76 a | | b a | | b
#> 18 M18 174.33 a | | b a | | b
#> 22 M22 178.43 a | | b a | | b
#> 57 M57 181.67 a | | b a | | b
#> 48 M48 187.45 a | | b a | | b
#> 84 SNP19 195.91 a | | b a | | b
#> 52 M52 198.03 a | | b a | | b
#> 81 SNP16 202.56 a | | b a | | b
#>
#> 36 markers log-likelihood: -2670.04
Check heatmap:
Markers 24
and 64
seem to broke the color
pattern. We will remove them and use try_seq to find better
locations:
LG3_edit <- drop_marker(LG3_f2_final, c(24, 64))
LG3_edit_map <- order_seq(LG3_edit) # We remove several markers maybe it's better to order again
#> | | | 0% | |= | 2% | |== | 3% | |==== | 5% | |===== | 7% | |====== | 8% | |======= | 10% | |======== | 12% | |========= | 13% | |========== | 15% | |============ | 17% | |============= | 18% | |============== | 20% | |=============== | 22% | |================ | 23% | |================== | 25% | |=================== | 27% | |==================== | 28% | |===================== | 30% | |====================== | 32% | |======================= | 33% | |======================== | 35% | |========================== | 37% | |=========================== | 38% | |============================ | 40% | |============================= | 42% | |============================== | 43% | |================================ | 45% | |================================= | 47% | |================================== | 48% | |=================================== | 50% | |==================================== | 52% | |===================================== | 53% | |====================================== | 55% | |======================================== | 57% | |========================================= | 58% | |========================================== | 60% | |=========================================== | 62% | |============================================ | 63% | |============================================== | 65% | |=============================================== | 67% | |================================================ | 68% | |================================================= | 70% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 75% | |====================================================== | 77% | |======================================================= | 78% | |======================================================== | 80% | |========================================================= | 82% | |========================================================== | 83% | |============================================================ | 85% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 90% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 95% | |==================================================================== | 97% | |===================================================================== | 98% | |======================================================================| 100%
LG3_edit_map <- make_seq(LG3_edit_map, "force")
rf_graph_table(LG3_edit_map)
Adding one by one we can see how each one changes the map log-likelihood, map size, and the color pattern in the heatmap and judge if we will remove them.
LG3_edit <- try_seq(LG3_edit_map, 24)
LG3_edit_temp <- make_seq(LG3_edit, 22)
LG3_edit_map <- LG3_edit_temp # include
LG3_edit <- try_seq(LG3_edit_map, 64)
LG3_edit_temp <- make_seq(LG3_edit, 24) # Not included
LG3_f2_final <- LG3_edit_map
Check the final map (not shown):
In our example, we have reference genome chromosome and position information for some of the markers, here we will exemplify one method of using this information to help build the genetic map.
With the CHROM
information in the input file, you can
identify markers belonging to some chromosome using the function
make_seq
with the rf_2pts
object. For example,
assign the string "1"
for the second argument to get
chromosome 1 makers. The output sequence will be automatically ordered
by POS
information.
CHR1 <- make_seq(twopts_f2, "1")
CHR1
#>
#> Number of markers: 9
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
#>
#> Parameters not estimated.
CHR2 <- make_seq(twopts_f2, "2")
CHR3 <- make_seq(twopts_f2, "3")
According to CHROM
information we have three defined
linkage groups, now we can try to group the markers without chromosome
information to them using recombination information. For this, we can
use the function group_seq
:
CHR_mks <- group_seq(input.2pts = twopts_f2, seqs = "CHROM", unlink.mks = mark_all_f2,
repeated = FALSE)
#> Selecting markers:
#> group 1
#> ...................................
#> group 2
#> ...............
#> group 3
#> ......................
#> Selecting markers:
#> group 1
#> ...................................
#> group 2
#> ..........................
#> group 3
#> ...............
#> Selecting markers:
#> group 1
#> .................
#> group 2
#> ..........................
#> group 3
#> ......................
The function works as the function group
but considering
pre-existing sequences. Setting seqs
argument with the
string "CHROM"
, it will consider the pre-existing sequences
according to CHROM
information. You can also indicate other
pre-existing sequences if it makes sense for your study. For that, you
should inform a list with objects of class sequences
, as
the example:
CHR_mks <- group_seq(input.2pts = twopts_f2, seqs = list(CHR1=CHR1, CHR2=CHR2, CHR3=CHR3),
unlink.mks = mark_all_f2, repeated = FALSE)
In this case, the command had the same effect as the previous, because we indicate chromosome sequences, but others sequences can be used.
The unlink.mks
argument receives an object of class
sequence
, this defines which markers will be tested to
group with the sequences in seqs
. In our example, we will
indicate only the markers with no segregation distortion, using the
sequence mark_no_dist
. It is also possible to use the
string "all"
to test all the remaining markers at the
rf_2pts
object.
In some cases, the same marker can group into more than one sequence,
those markers will be considered repeated
. We can choose if
we want to remove or not (FALSE/TRUE
) them of the output
sequences, with the argument rm.repeated
. Anyway, their
numbers will be informed at the list repeated
in the output
object.
In the example case, there are no repeated markers. However, if they exist, it could indicate that their groups actually constitute the same group. Also, genotyping errors can generate repeated markers. Anyway, they deserve better investigations.
We can access detailed information about the results just by printing:
CHR_mks
#> This is an object of class 'group_seq'
#> Criteria used to assign markers to groups:
#> LOD = 3 , Maximum recombination fraction = 0.5
#>
#> No. markers in input sequences:
#> CHR1 : 9 markers
#> CHR2 : 13 markers
#> CHR3 : 2 markers
#>
#> No. unlinked input markers: 66 markers
#>
#> No. markers in output sequences:
#> CHR1 : 36 markers
#> CHR2 : 36 markers
#> CHR3 : 18 markers
#> No. unlinked: 0 markers
#> No. repeated: 0 markers
#>
#> Printing output sequences:
#> Group CHR1 : 36 markers
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 M1 M3 M4 M6 M7 M9 M12 M13 M17 M23 M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44 M46 M53 M55 M58 M60 M63
#>
#> Group CHR2 : 36 markers
#> SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 M14 M15 M16 M18 M19 M20 M21 M22 M24 M28 M38 M39 M47 M48 M49 M50 M52 M56 M57 M59 M62 M64 M65
#>
#> Group CHR3 : 18 markers
#> SNP24 SNP26 M2 M5 M8 M10 M11 M25 M32 M33 M37 M41 M43 M45 M51 M54 M61 M66
#>
#> Unlinked markers: 0 markers
#>
#>
#> Repeated markers: 0 markers
#>
Also, we can access the numbers of repeated markers with:
In the same way, we can access the output sequences:
CHR_mks$sequences$CHR1
#>
#> Number of markers: 36
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 M1 M3 M4 M6 M7 M9 M12 M13 M17 M23
#> M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44 M46 M53 M55 M58 M60 M63
#>
#> Parameters not estimated.
# or
CHR_mks$sequences[[1]]
#>
#> Number of markers: 36
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 M1 M3 M4 M6 M7 M9 M12 M13 M17 M23
#> M26 M27 M29 M30 M31 M34 M35 M36 M40 M42 M44 M46 M53 M55 M58 M60 M63
#>
#> Parameters not estimated.
For this function, optional arguments are LOD
and
max.rf
, which define thresholds to be used when assigning
markers to linkage groups. If none is provided (default), criteria
previously defined for the object rf_2pts
are used.
In our example, all markers without genomic information grouped with all chromosomes, then this approach did not give us more information about grouping and from here we could follow the same strategy did in Ordering markers within linkage groups.
If you have some information about the order of the markers, for
example, from a reference genome or previously published paper, you can
define a sequence of those markers in a specific order (using the
function make_seq
) and then use the function
map
to estimate the final genetic map (based on multi-point
analysis). For example, let us assume that we know that the following
markers are ordered in this sequence: M47
,
M38
, M59
, M16
, M62
,
M21
, M20
, M48
and
M22
. In this case, select them from the two-point analysis,
and use function map
:
Warning: If you find an error message like:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
It’s because the map
function has a very common name and
you can have in your environment another function with the same name. In
the case of the presented error, R is using the map
function from purrr
package instead of OneMap
,
to solve this, simply specify that you want the OneMap
function with ::
command from stringr
package:
library(stringr)
(LG3seq_f2_map <- onemap::map(LG3seq_f2))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 47 M47 0.00 a | | b a | | b
#> 38 M38 14.69 a | | b a | | b
#> 59 M59 24.45 o | | a o | | o
#> 16 M16 38.96 a | | b a | | b
#> 62 M62 49.22 a | | b a | | b
#> 21 M21 56.19 a | | b a | | b
#> 20 M20 65.10 o | | o a | | o
#> 48 M48 73.14 a | | b a | | b
#> 22 M22 81.80 a | | b a | | b
#>
#> 9 markers log-likelihood: -1004.624
NOTE: (new!) If
your map population is F2 intercross and your sequence has many markers
(more than 60), we suggest to speed up map
using BatchMap
parallelization approach. See section Speed up analysis with parallelization
for more information.
This is a subset of the first linkage group. When used this way, the
map
function searches for the best combination of phases
between markers and prints the results.
To see the correspondence between marker names and numbers, use
function marker_type
:
marker_type(LG3seq_f2_map)
#> Marker Marker.name Type
#> 1 47 M47 A.H.B
#> 2 38 M38 A.H.B
#> 3 59 M59 C.A
#> 4 16 M16 A.H.B
#> 5 62 M62 A.H.B
#> 6 21 M21 A.H.B
#> 7 20 M20 D.B
#> 8 48 M48 A.H.B
#> 9 22 M22 A.H.B
If one needs to add or drop markers from a predefined sequence,
functions add_marker
and drop_marker
can be
used. For example, to add markers M18
, M56
and
M50
at the end of LG3seq_f2_map
:
(LG3seq_f2_map <- add_marker(LG3seq_f2_map, c(18, 56, 50)))
#>
#> Number of markers: 12
#> Markers in the sequence:
#> M47 M38 M59 M16 M62 M21 M20 M48 M22 M18 M56 M50
#>
#> Parameters not estimated.
Removing markers M59
and 21
from
LG3seq_f2_map
:
Once all linkage groups were obtained using both strategies, we can
draw a map for each strategy using the function draw_map
.
Since version 2.1.1007, OneMap
has a new version of
draw_map
, called draw_map2
. The new function
draws elegant linkage groups and presents new arguments to personalize
your draw.
If you prefer the old function, we also keep it. Follow examples of how to use both of them.
Draw_map
We can draw a genetic map for all linkage groups using the function
draw_map
. First, we have to create a list of ordered
linkage groups:
Then use function draw_map
for this list:
We also can draw a map for a specific linkage group:
Function draw_map
draws a very simple graphic
representation of the genetic map. More recently, we developed a new
version called draw_map2
that draws a more sophisticated
figure. Furthermore, once the distances and the linkage phases are
estimated, other map figures can be drawn by the user with any
appropriate software.
Draw_map2
The same figures did with draw_map
can be done with the
draw_map2
function. But it has different capacities and
arguments. Here are some examples, but you can find more options on the
help page ?write_map2
.
Let’s draw all three groups built:
draw_map2(LG1_f2_final, LG2_f2_final, main = "Only linkage information",
group.names = c("LG1", "LG2", "LG3"), output = "map.eps")
NOTE: Check the GitHub vignette version to visualize the graphic.
You can include all sequence
objects referring to the
groups as the first arguments. The main
argument defines
the main title of the draw and group.names
define the names
of each group. If no output
file and file extension is
defined, the draw will be generated at your working directory as
map.eps
. The eps extension is only the default option but
there are others that can be used. You can have access to a list of them
on the help page.
We also can draw a map for a specific linkage group:
NOTE: Check the GitHub vignette version to visualize the graphic.
You can also change the default colors using the
col.group
and col.mark
arguments.
With argument tag
you can highlight some markers at the
figure according to your specific purpose.
Warning: Only available for outcrossing and f2 intercross populations.
As already mentioned, OneMap
uses HMM multipoint
approach to estimate genetic distances, a very robust method, but it can
take time to run if you have many markers. In 2017, Schiffthaler et. al
release an OneMap
fork with modifications in CRAN and in
GitHub with the possibility of parallelizing the HMM chain dividing
markers in batches and use different cores for each phase. Their
approach speeds up our HMM and keeps the genetic distances estimation
quality. It allows dividing the job into a maximum of four cores
according to the four possible phases for outcrossing and f2 mapping
populations. We add this parallelized approach to the functions:
map
, mds_onemap
, seriation
,
rcd
, record
and ug
. For better
efficiency it is important that batches are composed of 50 markers or
more, therefore, this approach is only recommended for linkage groups
with many markers.
The parallelization is here available for all types of operational
systems, however, we suggest setting argument
parallelization.type
to FORK
if you are not
using Windows system. It will improve the procedure speed.
Here we will show an example of how to use the BatchMap approach in some functions that requires HMM. For this, we simulated a dataset with a group with 300 markers (we don’t want this vignette to take too much time to run, but usually maps with markers from high-throughput technologies result in larger groups). Before start, you can see the time spent for each approach (See also Session Info) in this example:
Without parallelization (h) | With parallelization (h) | |
---|---|---|
rcd | 0.6801889 | 0.1260436 |
record_map | 1.8935892 | 0.3330297 |
ug_map | 1.1002725 | 0.2256356 |
mds_onemap | 2.1478900 | 0.4443492 |
map | 2.1042114 | 0.6156722 |
simParallel <- read_onemap(system.file("extdata/simParall_f2.raw", package = "onemap")) # dataset available only in onemap github version
plot(simParallel)
# Calculates two-points recombination fractions
twopts <- rf_2pts(simParallel)
seq_all <- make_seq(twopts, "all")
# There are no redundant markers
find_bins(simParallel)
# There are no distorted markers
print(test_segregation(simParallel)) # Not shown
To prepare the data with defined bach size we use function
pick_batch_sizes
. It selects a batch size that splits the
data into even groups. Argument size
defines the batch size
next to which an optimum size will be searched. overlap
defines the number of markers that overlap between the present batch and
next. This is used because pre-defined phases at these overlap markers
in the present batch are used to start the HMM in the next batch. The
around
argument defines how much the function can vary
around the defined number in size
to search for the optimum
batch size.
Some aspects should be considered to define these arguments because if the batch size were set too high, there would be less gain in execution time. If the overlap size would be too small, phases would be incorrectly estimated and large gaps would appear in the map, inflating its size. In practice, these values will depend on many factors such as population size, marker quality, and species. BatchMap authors recommended trying several configurations on a subset of data and select the best performing one.
batch_size <- pick_batch_sizes(input.seq = seq_all,
size = 80,
overlap = 30,
around = 10)
batch_size
To use the parallelized approach you just need to include the arguments when using the functions:
# Without parallelization
rcd_map <- rcd(input.seq = seq_all)
# With parallelization
rcd_map_par <- rcd(input.seq = seq_all,
phase_cores = 4,
size = batch_size,
overlap = 30)
Because we simulate this dataset we know the correct order. We can
use map_overlapping_batches
to estimate genetic distance in
this case. This is equivalent to map
, but with the
parallelized process.
Similarly with map
, using argument
rm_unlinked = TRUE
the function will return a vector with
marker numbers without the problematic marker. To repeat the analysis
removing automatically all problematic markers use
map_avoid_unlinked
:
# Without parallelization
batch_map <- map_avoid_unlinked(input.seq = seq_all)
# With parallelization
batch_map_par <- map_avoid_unlinked(input.seq = seq_all,
size = batch_size,
phase_cores = 4,
overlap = 30)
As you can see in the above maps, heuristic ordering algorithms do not return an optimal order result, mainly if you don’t have many individuals in your population. Because of the erroneous order, generated map sizes are not close to the simulated size (100 cM) and their heatmaps don’t present the expected color pattern. Two of them get close to the color pattern, they are the ug and the MDS method. They present good global ordering but not local. If you have a reference genome, you can use its position information to rearrange local ordering.
Function progeny_haplotypes
generates a data.frame with
progeny phased haplotypes estimated by OneMap
HMM. For
progeny, the HMM results in probabilities for each possible genotype,
then the generated data.frame contains all possible genotypes. If
most_likely = TRUE
the most likely genotype receives 1 and
the rest 0 (if there are two most likely both receive 0.5), if
most_likely = FALSE
genotypes probabilities will be
according to the HMM results. You can choose which individual to be
evaluated in ind
. The data.frame is composed of the
information: individual (ind) and group (grp) ID, position in
centimorgan (pos), progeny homologs (homologs), and from each parent the
allele came (parents).
(progeny_haplot <- progeny_haplotypes(LG2_f2_final, most_likely = TRUE, ind = 2, group_names = "LG2_final"))
#> ind marker grp pos prob progeny.homologs parents allele
#> 1 IND2 M41 LG2_final 0.000000 0 H1 P1 H1_P1
#> 2 IND2 M10 LG2_final 8.446307 0 H1 P1 H1_P1
#> 3 IND2 M2 LG2_final 15.869538 0 H1 P1 H1_P1
#> 4 IND2 M11 LG2_final 18.569906 0 H1 P1 H1_P1
#> 5 IND2 M43 LG2_final 21.855233 0 H1 P1 H1_P1
#> 6 IND2 M32 LG2_final 25.033814 0 H1 P1 H1_P1
#> 7 IND2 M45 LG2_final 33.805625 0 H1 P1 H1_P1
#> 8 IND2 M54 LG2_final 111.777702 1 H1 P1 H1_P1
#> 9 IND2 M66 LG2_final 118.291527 1 H1 P1 H1_P1
#> 10 IND2 M61 LG2_final 119.907276 1 H1 P1 H1_P1
#> 11 IND2 M25 LG2_final 125.638345 1 H1 P1 H1_P1
#> 12 IND2 M5 LG2_final 127.700640 1 H1 P1 H1_P1
#> 13 IND2 M51 LG2_final 132.621357 1 H1 P1 H1_P1
#> 14 IND2 SNP26 LG2_final 141.945625 0 H1 P1 H1_P1
#> 15 IND2 M33 LG2_final 143.293882 0 H1 P1 H1_P1
#> 16 IND2 SNP24 LG2_final 148.015674 0 H1 P1 H1_P1
#> 17 IND2 M37 LG2_final 156.195398 0 H1 P1 H1_P1
#> 18 IND2 M8 LG2_final 162.384077 0 H1 P1 H1_P1
#> 19 IND2 M41 LG2_final 0.000000 1 H1 P2 H1_P2
#> 20 IND2 M10 LG2_final 8.446307 1 H1 P2 H1_P2
#> 21 IND2 M2 LG2_final 15.869538 1 H1 P2 H1_P2
#> 22 IND2 M11 LG2_final 18.569906 1 H1 P2 H1_P2
#> 23 IND2 M43 LG2_final 21.855233 1 H1 P2 H1_P2
#> 24 IND2 M32 LG2_final 25.033814 1 H1 P2 H1_P2
#> 25 IND2 M45 LG2_final 33.805625 1 H1 P2 H1_P2
#> 26 IND2 M54 LG2_final 111.777702 0 H1 P2 H1_P2
#> 27 IND2 M66 LG2_final 118.291527 0 H1 P2 H1_P2
#> 28 IND2 M61 LG2_final 119.907276 0 H1 P2 H1_P2
#> 29 IND2 M25 LG2_final 125.638345 0 H1 P2 H1_P2
#> 30 IND2 M5 LG2_final 127.700640 0 H1 P2 H1_P2
#> 31 IND2 M51 LG2_final 132.621357 0 H1 P2 H1_P2
#> 32 IND2 SNP26 LG2_final 141.945625 1 H1 P2 H1_P2
#> 33 IND2 M33 LG2_final 143.293882 1 H1 P2 H1_P2
#> 34 IND2 SNP24 LG2_final 148.015674 1 H1 P2 H1_P2
#> 35 IND2 M37 LG2_final 156.195398 1 H1 P2 H1_P2
#> 36 IND2 M8 LG2_final 162.384077 1 H1 P2 H1_P2
#> 37 IND2 M41 LG2_final 0.000000 0 H2 P1 H2_P1
#> 38 IND2 M10 LG2_final 8.446307 0 H2 P1 H2_P1
#> 39 IND2 M2 LG2_final 15.869538 0 H2 P1 H2_P1
#> 40 IND2 M11 LG2_final 18.569906 0 H2 P1 H2_P1
#> 41 IND2 M43 LG2_final 21.855233 0 H2 P1 H2_P1
#> 42 IND2 M32 LG2_final 25.033814 0 H2 P1 H2_P1
#> 43 IND2 M45 LG2_final 33.805625 0 H2 P1 H2_P1
#> 44 IND2 M54 LG2_final 111.777702 0 H2 P1 H2_P1
#> 45 IND2 M66 LG2_final 118.291527 0 H2 P1 H2_P1
#> 46 IND2 M61 LG2_final 119.907276 0 H2 P1 H2_P1
#> 47 IND2 M25 LG2_final 125.638345 0 H2 P1 H2_P1
#> 48 IND2 M5 LG2_final 127.700640 0 H2 P1 H2_P1
#> 49 IND2 M51 LG2_final 132.621357 0 H2 P1 H2_P1
#> 50 IND2 SNP26 LG2_final 141.945625 0 H2 P1 H2_P1
#> 51 IND2 M33 LG2_final 143.293882 0 H2 P1 H2_P1
#> 52 IND2 SNP24 LG2_final 148.015674 0 H2 P1 H2_P1
#> 53 IND2 M37 LG2_final 156.195398 0 H2 P1 H2_P1
#> 54 IND2 M8 LG2_final 162.384077 0 H2 P1 H2_P1
#> 55 IND2 M41 LG2_final 0.000000 1 H2 P2 H2_P2
#> 56 IND2 M10 LG2_final 8.446307 1 H2 P2 H2_P2
#> 57 IND2 M2 LG2_final 15.869538 1 H2 P2 H2_P2
#> 58 IND2 M11 LG2_final 18.569906 1 H2 P2 H2_P2
#> 59 IND2 M43 LG2_final 21.855233 1 H2 P2 H2_P2
#> 60 IND2 M32 LG2_final 25.033814 1 H2 P2 H2_P2
#> 61 IND2 M45 LG2_final 33.805625 1 H2 P2 H2_P2
#> 62 IND2 M54 LG2_final 111.777702 1 H2 P2 H2_P2
#> 63 IND2 M66 LG2_final 118.291527 1 H2 P2 H2_P2
#> 64 IND2 M61 LG2_final 119.907276 1 H2 P2 H2_P2
#> 65 IND2 M25 LG2_final 125.638345 1 H2 P2 H2_P2
#> 66 IND2 M5 LG2_final 127.700640 1 H2 P2 H2_P2
#> 67 IND2 M51 LG2_final 132.621357 1 H2 P2 H2_P2
#> 68 IND2 SNP26 LG2_final 141.945625 1 H2 P2 H2_P2
#> 69 IND2 M33 LG2_final 143.293882 1 H2 P2 H2_P2
#> 70 IND2 SNP24 LG2_final 148.015674 1 H2 P2 H2_P2
#> 71 IND2 M37 LG2_final 156.195398 1 H2 P2 H2_P2
#> 72 IND2 M8 LG2_final 162.384077 1 H2 P2 H2_P2
You can also have a view of progeny estimated haplotypes using
plot
. It shows which markers came from each parent’s
homologs. position
argument defines if haplotypes will be
plotted by homologs (stack
) or alleles
(split
). split
option is a good way to better
view the likelihoods of each allele.
OneMap is designed with a nested object structure, ensuring that
every sequence
object contains all the necessary
information to reproduce your analysis. While this structure enhances
recoverability and organization, it can lead to significantly large
objects when saving your data. To address this, we have developed
functions to streamline and optimize the saving process.
my_sequences <- list(LG1_f2_final, LG2_f2_final, LG2_f2_final)
save_onemap_sequences(my_sequences, filename = "f2_final_sequences.RData")
my_sequences <- load_onemap_sequences("f2_final_sequences.RData")
# access the sequences
my_sequences[[1]]
# access the onemap object
onemap.obj <- my_sequences[[1]]$data.name
# access the twopoints object
onemap.obj <- my_sequences[[1]]$twopt
At this point, it should be clear that any potential
OneMap
user must have some knowledge about genetic mapping
and also the R language, because the analysis is not done with
only one mouse click. In the future, perhaps a
graphical interface will be made available to make this software is a
lot easier to use.
We do hope that OneMap
is useful to researchers
interested in genetic mapping in outcrossing or inbred-based
populations. Any suggestions and critics are welcome.
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.2
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] stringr_1.5.1 vcfR_1.15.0 onemap_3.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] polynom_1.4-1 gridExtra_2.3 permute_0.9-7
#> [4] rlang_1.1.4 magrittr_2.0.3 rebus.base_0.0-3
#> [7] e1071_1.7-16 compiler_4.4.1 mgcv_1.9-1
#> [10] gdata_3.0.0 reshape2_1.4.4 vctrs_0.6.5
#> [13] memuse_4.2-3 pkgconfig_2.0.3 shape_1.4.6.1
#> [16] fastmap_1.2.0 backports_1.5.0 labeling_0.4.3
#> [19] utf8_1.2.4 rebus_0.1-3 rmarkdown_2.29
#> [22] heplots_1.7.0 nloptr_2.1.1 purrr_1.0.2
#> [25] xfun_0.49 glmnet_4.1-8 jomo_2.7-6
#> [28] cachem_1.1.0 jsonlite_1.8.9 pan_1.9
#> [31] broom_1.0.7 parallel_4.4.1 cluster_2.1.6
#> [34] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
#> [37] RColorBrewer_1.1-3 smacof_2.1-6 car_3.1-3
#> [40] boot_1.3-30 rpart_4.1.23 jquerylib_0.1.4
#> [43] Rcpp_1.0.13-1 iterators_1.0.14 knitr_1.49
#> [46] base64enc_0.1-3 weights_1.0.4 Matrix_1.7-0
#> [49] nnls_1.5 splines_4.4.1 nnet_7.3-19
#> [52] tidyselect_1.2.1 rstudioapi_0.17.1 abind_1.4-8
#> [55] yaml_2.3.10 viridis_0.6.5 vegan_2.6-8
#> [58] doParallel_1.0.17 codetools_0.2-20 plyr_1.8.9
#> [61] rebus.datetimes_0.0-2 lattice_0.22-6 tibble_3.2.1
#> [64] withr_3.0.2 evaluate_1.0.1 pinfsc50_1.3.0
#> [67] foreign_0.8-86 survival_3.6-4 proxy_0.4-27
#> [70] rebus.numbers_0.0-1 pillar_1.9.0 ggpubr_0.6.0
#> [73] carData_3.0-5 mice_3.16.0 checkmate_2.3.2
#> [76] foreach_1.5.2 ellipse_0.5.0 plotly_4.10.4
#> [79] generics_0.1.3 ggplot2_3.5.1 munsell_0.5.1
#> [82] candisc_0.9.0 scales_1.3.0 minqa_1.2.8
#> [85] gtools_3.9.5 princurve_2.1.6 class_7.3-22
#> [88] glue_1.8.0 lazyeval_0.2.2 Hmisc_5.1-3
#> [91] tools_4.4.1 dendextend_1.19.0 data.table_1.16.4
#> [94] lme4_1.1-35.5 ggsignif_0.6.4 rgl_1.3.1
#> [97] grid_4.4.1 plotrix_3.8-4 ape_5.8
#> [100] tidyr_1.3.1 colorspace_2.1-1 nlme_3.1-164
#> [103] rebus.unicode_0.0-2 htmlTable_2.4.3 Formula_1.2-5
#> [106] cli_3.6.3 fansi_1.0.6 viridisLite_0.4.2
#> [109] dplyr_1.1.4 gtable_0.3.6 rstatix_0.7.2
#> [112] sass_0.4.9 digest_0.6.37 wordcloud_2.6
#> [115] farver_2.1.2 htmlwidgets_1.6.4 htmltools_0.5.8.1
#> [118] lifecycle_1.0.4 httr_1.4.7 mitml_0.4-5
#> [121] MASS_7.3-60.2
Adler, J. R in a Nutshell A Desktop Quick Reference, 2009.
Broman, K. W., Wu, H., Churchill, G., Sen, S., Yandell, B. qtl: Tools for analyzing QTL experiments R package version 1.09-43, 2008.
Buetow, K. H., Chakravarti, A. Multipoint gene mapping using seriation. I. General methods. American Journal of Human Genetics 41, 180-188, 1987.
Doerge, R.W. Constructing genetic maps by rapid chain delineation. Journal of Agricultural Genomics 2, 1996.
Garcia, A.A.F., Kido, E.A., Meza, A.N., Souza, H.M.B., Pinto, L.R., Pastina, M.M., Leite, C.S., Silva, J.A.G., Ulian, E.C., Figueira, A. and Souza, A.P. Development of an integrated genetic map of a sugarcane Saccharum spp. commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases. Theoretical and Applied Genetics 112, 298-314, 2006.
Haldane, J. B. S. The combination of linkage values and the calculation of distance between the loci of linked factors. Journal of Genetics 8, 299-309, 1919.
Jiang, C. and Zeng, Z.-B. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101, 47-58, 1997.
Kosambi, D. D. The estimation of map distance from recombination values. Annuaire of Eugenetics 12, 172-175, 1944.
Lander, E. S. and Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363-2367, 1987.
Lander, E.S., Green, P., Abrahanson, J., Barlow, A., Daly, M.J., Lincoln, S.E. and Newburg, L. MAPMAKER, An interactive computing package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1, 174-181, 1987.
Lincoln, S. E., Daly, M. J. and Lander, E. S. Constructing genetic linkage maps with MAPMAKER/EXP Version 3.0: a tutorial and reference manual. A Whitehead Institute for Biomedical Research Technical Report 1993.
Matloff, N. The Art of R Programming. 2011. 1st ed. San Francisco, CA: No Starch Press, Inc., 404 pages.
Margarido, G. R. A., Souza, A.P. and Garcia, A. A. F. OneMap: software for genetic mapping in outcrossing species. Hereditas 144, 78-79, 2007.
Mollinari, M., Margarido, G. R. A., Vencovsky, R. and Garcia, A. A. F. Evaluation of algorithms used to order markers on genetics maps. Heredity 103, 494-502, 2009.
Oliveira, K.M., Pinto, L.R., Marconi, T.G., Margarido, G.R.A., Pastina, M.M., Teixeira, L.H.M., Figueira, A.M., Ulian, E.C., Garcia, A.A.F., Souza, A.P. Functional genetic linkage map on EST-markers for a sugarcane (Saccharum spp.) commercial cross. Molecular Breeding 20, 189-208, 2007.
Oliveira, E. J., Vieira, M. L. C., Garcia, A. A. F., Munhoz, C. F.,Margarido, G. R.A., Consoli, L., Matta, F. P., Moraes, M. C., Zucchi, M. I., and Fungaro,M. H. P. An Integrated Molecular Map of Yellow Passion Fruit Based on Simultaneous Maximum-likelihood Estimation of Linkage and Linkage Phases J. Amer. Soc. Hort. Sci. 133, 35-41, 2008.
Tan, Y., Fu, Y. A novel method for estimating linkage maps. Genetics 173, 2383-2390, 2006.
Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092
Van Os H, Stam P, Visser R.G.F., Van Eck H.J. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112, 30-40, 2005.
Voorrips, R.E. MapChart: software for the graphical presentation of linkage maps and QTLs. Journal of Heredity 93, 77-78, 2002.
Wang S., Basten, C. J. and Zeng Z.-B. Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC, 2010. https://brcwebportal.cos.ncsu.edu/qtlcart/
Wu, R., Ma, C.X., Painter, I. and Zeng, Z.-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology 61, 349-363, 2002a.
Wu, R., Ma, C.-X., Wu, S. S. and Zeng, Z.-B. Linkage mapping of sex-specific differences. Genetical Research 79, 85-96, 2002b.