Monday, April 21, 2008

Default Address Selection Part 1

If you are familiar with ipv6 then you'd be aware that default address selection is a very important concept. It was defined in RFC 3484 . Due to space constraint, i have decided to split this topic into 2 parts. The first part will deal with just introduction and how to use this feature. The second part will explain the kernel/glibc internals involved in this implementation. Hope i will write the second part soon. Its advisable to read the RFC before proceeding. To give a brief idea of what default address selection is, i would like to take an example of a host having multiple ipv6 address and needing to decide which address to be used for communication. For a communication to happen there must be a source address and destination address, but the problem arises when there are multiple source and destination address to select from. IPv6 by default allows a hosts to configure multiple addresses, so there is a need for an algorithm to sort this list. We can broadly classify default address selection into 2 types:

1) Default source address selection
2) Default destination address selection

Say a host A wants to communicate with another host B (can be external/internal system), it needs to know the destination ip of host B. To get the destination ip, a dns query is sent to the configured dns server and the response is taken as destination ip. What if the dns reply has multiple ip's to the same domain name? That is when destination address selection comes into picture. Now that we "somehow" selected the destination ip, we now need to select appropriate source ip. A question that might arise is, why do we need to do that? Cant we just pick the first ip from the list of source ip's and start the communication? The answer is no. This is because IPv6 ip's can be link-local , site-local or global ip. If the destination ip is a global ip and first source ip we select from the list is a link-local ip then obviously the communication cannot happen because of scope mismatch. So we need some intelligent algorithm to select the correct source ip.

Another interesting aspect in destination address selection is to decided which ip to use if the dns query returns an IPv4 as well as IPv6 address. There needs to be some factor to decide this selection. More on all this in Part 2 :-) . So, we have a situation where these addresses are selected based on a certain criteria. By default the criteria's are as per RFC. For most users this should hold good, but what if it needs to be changed? Let's say by default, IPv6 address is given more precedence than IPv4 , but the administrator wants IPv4 as higher precedence. In these cases there needs to be a way to configure source address selection and destination address selection. For this reason RFC defines User Configuration Tables for Source/Destination selection.
Before we go into the configuration tables lets look at a basic fact. The source address selection is implemented in the kernel and destination address selection in glibc. Wonder why? The reason is very simple. Glibc implements dns query api's like gethostbyname and family which triggers the dns query. So it is obvious that this api will get all the replies as well. It makes sense to implement the algorithm in glibc api's.

Lets look at the user configuration tables for both source address selection and destination address selection. There is an interesting article from the glibc maintainer Ulrich Drepper . You can find the article here .

Basic Requirements:

- Linux Kernel 2.6.24 or higher
- iproute2 utilities compiled for 2.6.24 (Check to see if "#ip help" supports 'addrlabel')
Once we have the prerequisites we are good to go.

User Configuration Table For Source Address Selection :
[root@t6018ab-009124035140 ip]# ./ip addrlabel show
prefix ::1/128 label 0
prefix ::/96 label 3
prefix ::ffff:0.0.0.0/96 label 4
prefix 2001::/32 label 6
prefix 2002::/16 label 2
prefix fc00::/7 label 5
prefix ::/0 label 1

This is the default source address user configuration table. The "label" field is a very important aspect of the table. The prefix with lower label value is given higher preference than the one with higher. For example prefix ::1 is given the highest preference when it is prefix label matching.
Lets say we have two prefix of same type
prefix 2003:470:1f00:ffff::4/64 label 8
prefix 2003:470:1f00:ffff::5/64 label 8
Source Address Selection List:
2003:470:1f00:ffff::4
2003:470:1f00:ffff::5
2003:470:1f00:ffff::6

Destination Address
2003:470:1f00:ffff::7

Irrespective of the order of the source address list the ip 2003:470:1f00:ffff::6 will be selected as the correct source candidate since the other two address have a label value of 8 where as 2003:470:1f00:ffff::6 will pass on the rule "prefix ::/0 label 1". Thus the lowest label value will be given higher priority. We can play around with giving different label value to different prefixes. Since source address selection works in conjunction with destination address selection ,we shall look into testing this aspect a little later.

User Configuration Table For Destination Address Selection :

The destination address user configuration table is based on a conf file called gai.conf. This is placed in /etc/. Distros dont place this file here for a certain reason. For more information please read the article by Ulrich Drepper as stated above. In my system the gai.conf file is located in /usr/share/doc/glibc-common-2.6/gai.conf. This file must be coped to /etc/ if you intend to change the default behavior.

A typical gai.conf file



# label
# Add another rule to the RFC 3484 label table. See section 2.1 in
# RFC 3484. The default is:
#
#label ::1/128 0
#label ::/0 1
#label 2002::/16 2
#label ::/96 3
#label ::ffff:0:0/96 4
#label fec0::/10 5
#label fc00::/7 6

#

# precedence
# Add another rule the to RFC 3484 precedence table. See section 2.1
# and 10.3 in RFC 3484. The default is:
#
#precedence ::1/128 50
#precedence ::/0 40
#precedence 2002::/16 30
#precedence ::/96 20
#precedence ::ffff:0:0/96 10
#
# For sites which prefer IPv4 connections change the last line to
#
#precedence ::ffff:0:0/96 100


For destination address selection, two main criteria's to be considered are label and precedence. It must always be remembered that precedence is associated with destination address selection only. Where as label is common for both source and destination address selection. It is for this reason both the tables must remain in sync for correct result.

Testing Destination Address Selection

To test destination selection algorithm we need to write a small program to test it. The best way to test the destination address selection algorithm is to use the examples given in RFC 3484. See section 10.2

Few Requirements:

- Add a entry "multi on" in /etc/host.conf
- Stop the name service caching daemon (service nscd stop)
- Compile the program given below (This will test the result of default address selection)


#include <errno.h>
#include <error.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>

char buf[INET6_ADDRSTRLEN];

int
main(int argc, char *argv[])
{
int err;
struct addrinfo *ai;
struct addrinfo hints;
struct addrinfo *runp;
int sock;

memset(&hints, '\0', sizeof(hints));
hints.ai_protocol = IPPROTO_TCP;

// dummy gethostbyname call so that /etc/host.conf is read
gethostbyname(argv[1]);

err = getaddrinfo(argv[1], "", &hints, &ai);
if (err != 0)
error(EXIT_FAILURE, 0, "getaddrinfo(%d): %s", err,
gai_strerror(err));
runp = ai;
while (runp != NULL) {
getnameinfo(runp->ai_addr, runp->ai_addrlen, buf,
INET6_ADDRSTRLEN, NULL, 0, NI_NUMERICHOST);

printf("family:%2d socktype:%2d protocol:%3d addr:%s(%d)\n",
runp->ai_family, runp->ai_socktype, runp->ai_protocol,
buf, runp->ai_addrlen);
runp = runp->ai_next;
}

freeaddrinfo(ai);
}




Example taken from section 10.2 of the RFC:
Candidate Source Addresses: 2001::2 or fec0::2 or fe80::2
Destination Address List: 2001::1 or fec0::1 or fe80::1
Result: fe80::1 (src fe80::2) then fec0::1 (src fec0::2) then 2001::1 (src 2001::2) (prefer smaller scope)

The destination address selection will be demonstrated using a example from RFC.
The first step is to add multiple dns entry in the dns server. This is big process, so i will use /etc/hosts file to make things simple (This works similar to dns server replies).

So add the following in /etc/hosts
fec0::1 rockon
2001::1 rockon
fe80::1 rockon

Add source addresses to the interface
#ip -6 addr add 2001::2 dev eth0
Similarly for fec0::2 and fe80::2

Next step is to make sure every destination route added in /etc/hosts must have valid route entry. Else the above will not work.
For Eg : fec0::1 is the destination ip. So the algorithm will choose this only if we have a valid route for this ip.
#ip -6 route add fec0::1 dev eth0
Similarly add routes for the other destination candidates.

To execute the program
#./a.out rockon
family:10 socktype: 1 protocol: 6 addr:fe80::1(28)
family:10 socktype: 1 protocol: 6 addr:fec0::1(28)
family:10 socktype: 1 protocol: 6 addr:2001::1(28)

The result shows the order in which destination addresses are sorted. Rest of the examples can be tried out. The destination user configuration table (gai.conf) can be modified to see different results.


Testing Source Address Selection :


Lets look at how to test source address selection functionality. The best way to do so is to follow the test cases specified in RFC 3484. See section 10.1.
For testing source address selection use ping6.

Example taken from section 10.1 of the RFC
Destination: 2001::1
Candidate Source Addresses: 3ffe::1 or fe80::1
Result: 3ffe::1 (prefer appropriate scope)

Configure IPv6 address for interface eth0
#ip -6 addr add 3ffe::1 dev eth0
fe80::1 can be ignored as you will have by default a linklocal address
Add a valid route to the destination ip.
#ip -6 route add 2001::1 dev eth0
#ping6 2001::1

The result will be destination unreachable if 2001::1 doesnt exits. But thats not our issue. The unreachable message will show what source address is selected. This is how one can test the source address selection algorithm. Try out all the different examples given in RFC.Now, by tweaking the user configuration table as mentioned in "User Configuration Table For Source Address Selection" we can modify the behavior.

Hope this little write was useful in understanding how address selection works. In the part 2 article i will explain how the algorithm work.

Update: Thanks to Brandon for pointing out a mistake in the post. Check comments for details.

3 comments:

Anonymous said...

I realize this is an old post but if you are checking comments I was wondering if you can explain why in the section "User Configuration Table For Source Address Selection" you say that 2003:470:1f00:ffff::6 will pass on the rule "prefix ::/96 label 3". It seems like it will only pass on "prefix ::/0 label 1". Obviously the outcome is the same, but I was just curious if this was a typo or am I misunderstanding something.

varun said...

yes, it was typo or rather carelessness. Thanks for pointing it out, i have made the necessary changes. Hope the post was clear enough? :-)

Anonymous said...

Great post!