| Type: | Package | 
| Title: | Tidy Verbs for Dealing with Genomic Data Frames | 
| Version: | 0.1.2 | 
| Description: | Handle genomic data within data frames just as you would with 'GRanges'. This packages provides method to deal with genomic intervals the "tidy-way" which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular 'bedtools' and the genome_join() method from the 'fuzzyjoin' package. | 
| URL: | https://github.com/const-ae/tidygenomics | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | dplyr, rlang, purrr, tidyr, fuzzyjoin (≥ 0.1.3), IRanges, Rcpp | 
| Suggests: | testthat, knitr, rmarkdown | 
| RoxygenNote: | 6.1.1 | 
| LinkingTo: | Rcpp | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | yes | 
| Packaged: | 2019-08-08 11:42:50 UTC; ahlmanne | 
| Author: | Constantin Ahlmann-Eltze
     | 
| Maintainer: | Constantin Ahlmann-Eltze <artjom31415@googlemail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2019-08-08 11:50:02 UTC | 
Cluster ranges which are implemented as 2 equal-length numeric vectors.
Description
Cluster ranges which are implemented as 2 equal-length numeric vectors.
Usage
cluster_interval(starts, ends, max_distance = 0L)
Arguments
starts | 
 A numeric vector that defines the starts of each interval  | 
ends | 
 A numeric vector that defines the ends of each interval  | 
max_distance | 
 The maximum distance up to which intervals are still considered to be the same cluster. Default: 0.  | 
Examples
starts <- c(50, 100, 120)
ends <- c(75, 130, 150)
j <- cluster_interval(starts, ends)
j == c(0,1,1)
Intersect data frames based on chromosome, start and end.
Description
Intersect data frames based on chromosome, start and end.
Usage
genome_cluster(x, by = NULL, max_distance = 0,
  cluster_column_name = "cluster_id")
Arguments
x | 
 A dataframe.  | 
by | 
 A character vector with 3 entries which are the chromosome, start and end column.
For example:   | 
max_distance | 
 The maximum distance up to which intervals are still considered to be the same cluster. Default: 0.  | 
cluster_column_name | 
 A string that is used as the new column name  | 
Value
The dataframe with the additional column of the cluster
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 120, 300, 260),
                 end = c(150, 250, 350, 450))
genome_cluster(x1, by=c("chromosome", "start", "end"))
genome_cluster(x1, by=c("chromosome", "start", "end"), max_distance=10)
Calculates the complement to the intervals covered by the intervals in
a data frame. It can optionally take a chromosome_size data frame
that contains 2 or 3 columns, the first the names of chromosome and in case
there are 2 columns the size or first the start index and lastly the end index
on the chromosome.
Description
Calculates the complement to the intervals covered by the intervals in
a data frame. It can optionally take a chromosome_size data frame
that contains 2 or 3 columns, the first the names of chromosome and in case
there are 2 columns the size or first the start index and lastly the end index
on the chromosome.
Usage
genome_complement(x, chromosome_size = NULL, by = NULL)
Arguments
x | 
 A data frame for which the complement is calculated  | 
chromosome_size | 
 A dataframe with at least 2 columns that contains
first the chromosome name and then the size of that chromosome. Can be NULL
in which case the largest value per chromosome from   | 
by | 
 A character vector with 3 entries which are the chromosome, start and end column.
For example:   | 
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
genome_complement(x1, by=c("chromosome", "start", "end"))
Intersect data frames based on chromosome, start and end.
Description
Intersect data frames based on chromosome, start and end.
Usage
genome_intersect(x, y, by = NULL, mode = "both")
Arguments
x | 
 A dataframe.  | 
y | 
 A dataframe.  | 
by | 
 A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:   | 
mode | 
 One of "both", "left", "right" or "anti".  | 
Value
The intersected dataframe of x and y with the new boundaries.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr2"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr2", "chr1"),
                 start = c(140, 210, 400, 300),
                 end = c(160, 240, 415, 320))
j <- genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
print(j)
Join intervals on chromosomes in data frames, to the closest partner
Description
Join intervals on chromosomes in data frames, to the closest partner
Usage
genome_join_closest(x, y, by = NULL, mode = "inner",
  distance_column_name = NULL, max_distance = Inf, select = "all")
genome_inner_join_closest(x, y, by = NULL, ...)
genome_left_join_closest(x, y, by = NULL, ...)
genome_right_join_closest(x, y, by = NULL, ...)
genome_full_join_closest(x, y, by = NULL, ...)
genome_semi_join_closest(x, y, by = NULL, ...)
genome_anti_join_closest(x, y, by = NULL, ...)
Arguments
x | 
 A dataframe.  | 
y | 
 A dataframe.  | 
by | 
 A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:   | 
mode | 
 One of "inner", "full", "left", "right", "semi" or "anti".  | 
distance_column_name | 
 A string that is used as the new column name with the distance.
If   | 
max_distance | 
 The maximum distance that is allowed to join 2 entries.  | 
select | 
 A string that is passed on to   | 
... | 
 Additional arguments parsed on to genome_join_closest.  | 
Value
The joined dataframe of x and y.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr2"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr2", "chr1"),
                 start = c(140, 210, 400, 300),
                 end = c(160, 240, 415, 320))
j <- genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
print(j)
Subtract one data frame from another based on chromosome, start and end.
Description
Subtract one data frame from another based on chromosome, start and end.
Usage
genome_subtract(x, y, by = NULL)
Arguments
x | 
 A dataframe.  | 
y | 
 A dataframe.  | 
by | 
 A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:   | 
Value
The subtracted dataframe of x and y with the new boundaries.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr1", "chr1"),
                 start = c(120, 210, 300, 400),
                 end = c(125, 240, 320, 415))
j <- genome_subtract(x1, x2, by=c("chromosome", "start", "end"))
print(j)