This is the same DOM that I described in the first JavaScript tutorial, with some significant modifications. Now you can access any element on a web page through it -- but I'm getting ahead of myself. The purpose of this "redesign" was, presumably, to accomodate the new XML standard, and that is where we, too, will begin: by using the DOM to traverse and otherise fiddle with an XML file.

Before I begin THAT, though, there's a function which you'll need that loads and XML file into a web browser's DOM so that you can engage in aforementioned fiddling. Here it is, in its entirity. Just copy/paste it into a set of script tags and it'll be ready for use. If your browser supports this functionality, it'll return a reference to the root node of the file. If your browser does NOT suport this functionality or if you asked for an invalid or non-existant XML file, the function will return null so that you can gracefully degrade your program -- or, in other words, present the information in a more basic manner or inform your user of what he or she will need in order to view that information. But I digress. Here's the function:

function loadXMLDoc(dname) 
{
	try //Internet Explorer
	{
		xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
	}catch(e)
	{
		try //Firefox, Mozilla, Opera, etc.
		{
			xmlDoc=document.implementation.createDocument("","",null);
		}catch(e) {alert(e.message)}
	}
	try 
	{
		xmlDoc.async=false;
		xmlDoc.load(dname);
		return(xmlDoc);
	}catch(e) 
	{
		alert(e.message)
	}
	return(null);
}

Now onto the meat of this lesson. As I said, this function will return a reference to the root node of the XML file it loads, so you need only set the function equal to a variable. From then on, you can use that variable to get at any information in the document. There are many ways to do this, but I'll begin with the simplest: navigating through its child nodes manually. This is done by way of th childNodes array, which exists in every single node you'll ever gander at, except in the case of the lowest level nodes. The nodes are loaded into the array in a top-down left-right order, so the first one in the array will be the first one you'd see when reading the file yourself. Each node also, for convenience, carries two other objects: firstChild and lastChild. These hold the first and last node that a given node is parent to, respectively. Take our XML file and the code below it:

//people.xml
<?xml version="1.0" encoding="UTF-8"?>
<people>
    <person>
        <name prefix="Mr">Joseph Smith</name>
        <age>26</age>
    </person>
    <person>
        <name prefix="Ms">Josephina Smith</name>
        <age>26</age>
    </person>
</people>

var http = loadXMLDoc("people.xml");

var rootNode = http.documentElement;
var firstChildNode = rootNode.firstChild;
var lastChildNode = rootNode.lastChild; 
var childOfFirstChild = firstChildNode.childNodes[0]; 

The first line of code simply loads the file people.xml file into the http variable. The second line places a reference to the root element of the file (which is people) into the rootNode variable. You'll remember that references are basically variables that store the location of an object so that 1) your code looks shorter and cleaner, and 2) the browser doesn't have to devote processor time to searching for the object you're asking for every time you want to use that object. Anyways, firstChildNode grabs the first child of rootNode, which is the person node (Joseph Smith), and lastChildNode grabs Josephina Smith's person node. Finally, childOfFirstChild grabs the first child node of firstChildNode, which is Joseph Smith's name node.

But wait! If you're following along with this code in FireFox, you'll find that, sadly, the code doesn't work. This is because FireFox, for whatever reasons, inserts empty nodes whenever it sees a line break. Fortunately, there are a contingencies in place to circumvent this. For now, you can either dust off Internet Explorer 7 or just follow along.

The enterprising programmer may have seen this simple code, thought he had it all figured out, and tried something like, "rootNode.firstChild.firstChild" in order to get to the name node. And he would have been correct; that line does put you on the name node. But how do we get at the information? Sadly, it's not so simple as invoking a value variable or anything to that effect. The variable is nodeValue. But it isn't that simple, either. The name node, while it does contain information, does not store the information in itself. It creates a child node that stores the information. It is in THAT node that the nodeValue variable contains the information. So, in order to get to the value of the name node, we must do this:

var rootNode = http.documentElement; 
var nameInfo = rootNode.firstChild.firstChild.firstChild.nodeValue;

So, you go from the people node, to the person node, to the name node, to the information-contained-in-the-name-node node, to the VALUE of the information-contained-in-the-name-node node. Confused yet?!? Good, because now we're going to get that attribute. Attributes are stored in an array contained within the node that contains them. In this case, the name node itself contains the attributes array, not the information-contained-in-the-name-node node. I'll just make this simple and print it out for you:

var rootNode = http.documentElement; 
var nameInfo = rootNode.firstChild.firstChild.attributes[0].nodeValue;

As you can see, you still have to use the nodeValue variable to get the information. You'll also want to note that Internet Explorer stores a VAST array of information in every single attributes array on every single element, so at this point using attributes in this fashion is unfashionable. Chances are the attributes you specifically defined will be first, but that will not always be the case (such as if you define an attribute after the page loads).

Now, onto those aforementioned contingencies. The first is a variable called nodeType. It stores the "type" of node as an integer. 1 is an element, 2 is an attribute, and 3 is text. This variable CAN be used to determine if the nodeValue of the node you're on contains information. If the nodeType is 1, nodeValue won't have anything. If it is 2, you're in an attribute and you probably already know that fact. If it is 3, then it MAY contain information, or it could be an empty node that decided it wanted to call itself a text node. Since even empty nodes have nodeTypes, it is impossible to reliably determine if it is a real node or not with this variable.. The second is nodeName, which stores the name of the element from which the node was created, such as in our case people, person, name, and age. This is much more valuable, because empty nodes have easily identifiable information in their nodeName variable. In fact, "#text" is the only thing I've seen. So, you can simply loop through all of the elements and place references to them into your own array based on whether you find them to be "real" or not. Further simplifying this operation are the nextSibling and previousSibling objects, which...well, that's obvious. If you'll remember from last lesson, an example of siblings are the two person nodes: siblings are any nodes with the same parent.

var http = loadXMLDoc("people.xml");

var rootNode = http.documentElement; 
var curNode = rootNode.firstChild;
var nodeArray = new Array();
var i = 0;
do
{
	if(curNode.nodeName != "#text")
	{
		nodeArray[i] = curNode;
		i++;
	}	
}while(curNode = curNode.nextSibling);

The first five lines should be familiar: we load the document. get the root node, get the first child of the root node (which is put into curNode), create a new array, and initialize an integer. Then we go into a do-while loop and check the nodeName of curNode. If it doesn't look fake, we place it into the first element of our array, increment the integer, and assign curNode a reference to its next sibling. That last line is interesting, though. What first happens is that the assignment is made. If curNode is assigned the value of an existing object, the loop continues. If, however, a "nextSibling" doesn't exist, the curNode becomes null and the loop exits. Pretty spiffy, eh?

This may seem simple and efficient, but when you consider that you have to do that for EVERY node from which you want to obtain information, things quickly get pretty complicated. Even if you create a simple recursive function, processing time for what should be a simple operation could get pretty big. Next lesson we'll learn a few methods of circumventing this problem entirely.