The ipaddress R package was heavily influenced by the design of the ipaddress module in the Python Standard Library. For this reason, the package is centered around 3 data classes: ip_address(), ip_network() and ip_interface().
This vignette is styled after the official tutorial for the Python module. It introduces you to these classes, laying the groundwork for the rest of the package functionality.
IP addresses are used to facilitate communications between computers connected to the internet. At this highest level, an IP address is analogous to a mailing address.
It’s important to know there are two versions of the Internet Protocol in wide usage today. IPv4 stores addresses using 32 bits, which provides 4,294,967,296 unique addresses. But given the rapid growth of the internet, this address space was quickly depleted. The replacement protocol (known as IPv6) stores addresses using 128 bits, which provides a far greater number of unique addresses (sufficient for the foreseeable future). The transition to IPv6 is currently ongoing, so it is still very common to see IPv4 addresses.
To make IP addresses easier for humans to interpret, they are usually represented as character strings.
0 to 255 separated by periods (e.g. 192.168.0.1). Each group corresponds to 8 bits.0000 to ffff separated by colons (e.g. 2001:0db8:85a3:0000:0000:8a2e:0370:7334). Each group corresponds to 16 bits. This representation can also be compressed by removing leading zeros and replacing consecutive groups of zeros with double-colon (e.g. 2001:db8:85a3::8a2e:370:7334).An ip_address() vector is constructed from a character vector of these human-readable strings. It can handle IPv4 and IPv6 addresses simultaneously:
ip_address(c("192.168.0.1", "2001:db8::8a2e:370:7334"))
#> <ip_address[2]>
#> [1] 192.168.0.1             2001:db8::8a2e:370:7334IP addresses are often stored as integers for convenience, so you might need to encode or decode this format. The ip_to_integer() and integer_to_ip() functions are provided for this purpose. We recommend reading the documentation before use, because these functions can be slightly counter-intuitive (due to circumventing limitations of the R integer data type).
The above example looks like we’ve simply stored the character vector. However, the constructor has actually validated each input and stored the native bit representation of each address. The print() function has then converted them back to the human-readable character representation. We can see this in action by passing an invalid address:
ip_address("255.255.255.256")
#> Warning: Invalid value on row 1: 255.255.255.256
#> <ip_address[1]>
#> [1] <NA>There are two main advantages to storing IP data in their native bit representation:
An IP network is a contiguous range of IP addresses (also known as an IP block). These networks are very important to how address allocation and routing work.
The size of a network is determined by its prefix length. This indicates how many bits are reserved (counting from the left) for the routing prefix address (i.e. the start of the address range). The remaining bits are available for allocation to hosts (so all hosts on a network will share the same prefix bits). This means a network with a larger prefix length is actually a smaller network.
The most common representation of an IP network is called CIDR notation. This shows the routing prefix address and the prefix length, separated by a forward slash. This notation is used by both IPv4 and IPv6 networks.
As an example, the 192.168.0.0/24 network represents the address range from 192.168.0.0 to 192.168.0.255.
An ip_network() vector is constructed from a character vector of these CIDR strings. For example:
ip_network(c("192.168.0.0/24", "2001:db8::/48"))
#> <ip_network[2]>
#> [1] 192.168.0.0/24 2001:db8::/48An IP network cannot have any host bits set. If host bits were set, this would refer to a specific host on the network and not the network as a whole – this is the purpose of the ip_interface() class (see below).
The ip_network() constructor enforces this rule during input validation. If an input has host bits set, a warning is emitted and NA is returned. However, you can mask out the host bits using strict = FALSE.
We’ve learned about host addresses and how they are grouped into networks. Unsurprisingly then, people often think about an IP address within the context of its network (i.e. storing both pieces of information simultaneously). The ipaddress package refers to this concept as an IP interface.
An IP interface could be represented in many different ways (e.g. two addresses containing the network bits and host bits separately). However, the most common representation is CIDR notation again.
An ip_interface() vector is constructed from a character vector of CIDR strings, just like an ip_network() vector. However, unlike ip_network(), the ip_interface() class retains the host bits.
ip_interface(c("192.168.0.1/10", "2001:db8:c3::abcd/45"))
#> <ip_interface[2]>
#> [1] 192.168.0.1/10       2001:db8:c3::abcd/45Since this class represents a host on a specific network, most functions will treat an ip_interface() vector like an ip_address() vector. Some exceptions are listed under help(ip_interface).
The address and network components can be extracted using the as_ip_address() and as_ip_network() functions, respectively.
The ipaddress package provides the ip_address() and ip_network() data classes, which represent the most fundamental aspects of IP networking. The majority of functions contained in this package use these classes.
An ip_interface() class is also provided, which is a hybrid class describing a specific host on a specific network. Although most functions treat this class like an ip_address(), the constituent address and network components can be extracted using as_ip_address() and as_ip_network().
Summary of ipaddress classes