Friday, September 17, 2004

Whoa... cracked the myweb problem

I think I managed to crack this puppy! I managed to get some answers in one of the developer forums. Thanks to mod_edir being an open-source project, I was able to track down which function call was giving me the grief it was. NXCreatePathContext for the curious.

I won't go into the how I got to this point, but it seems that cluster nodes with SERVERID set don't work well. When I comment out SERVERID from the autoexec.ncf file it seems to work like the Faculty-side does.

WHY this works is a touch of a mystery, but I can see where things are getting confused. With SERVERID not set, part of the connect-to-remote-server process involves resolving the cluster-node name:
   Result Flags: 0x00000040

Entry ID: 0x000084c8
Referral Records: 3
NDS Referral Record #1
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
(UDP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
NDS Referral Record #2
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
NDS Referral Record #3
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
And if you compare it to the result you get WITH SERVERID set:
    Result Flags: 0x00000040

Entry ID: 0x000084c8
Referral Records: 3
NDS Referral Record #1
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00370001 (00370001)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
(UDP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
NDS Referral Record #2
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00008012 (00008012)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
NDS Referral Record #3
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00008013 (00008013)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
Note the fact that the IPX address is first. I'm thinking that this confuses the querying server (the web-server in this case) somehow. The next action after the resolve is the make-or-break point. When it works, the querying server follows the referral and translates the entryID it got above into a fully-qualified name of the cluster-node. When it fails, the querying server queries the cluster-node (on its cluster IP for that particualr resource) for the information and gets told -601 not-found.

Adding IPX to the querying server didn't help, nor did putting it also on the cluster node (through SCMD, since IPX isn't enabled on the cluster subnet). But forcing it to go PURE IP, it seems to work now.

Unfortunately, getting rid of SERVERID will break our backup agent. Fortunately(?) our network is slow enough that the improvement from using the agent is very slight. If/when we get gigE in, then we may have to revisit this issue.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?