Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

linimon
Synopsis: [patch] 6.0 boot: name resolution broken for daemon startup

Responsible-Changed-From-To: freebsd-bugs->freebsd-rc
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue Dec 27 20:52:56 UTC 2005
Responsible-Changed-Why:
Patch addresses a possible problem in rc.

http://www.freebsd.org/cgi/query-pr.cgi?pr=90863
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

JoaoBR
The following reply was made to PR conf/90863; it has been noted by GNATS.

From: JoaoBR <[hidden email]>
To: [hidden email], [hidden email]
Cc:  
Subject: Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup
Date: Tue, 27 Dec 2005 20:29:05 -0200

 I think that named is not starting first and so I guess the rc start order =
 is=20
 wrong and not that named do not answer queries,=20
 
 I reported similar here:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=3D86668
 
 also I believe that this touches the same problem that the timeout is too=20
 long, see this PR:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dbin/62139
 (even if this is apearently to ssh related the problem is the sshd dns reso=
 lve=20
 timeout which is too long)
 
 In my opinion the rc order needs to be corrected and the resolv timeout nee=
 ds=20
 to be shorter and a proper error messages on startup would help to understa=
 nd=20
 the problem because hanging on
 
  starting sendmail .
 
 makes believe the problem is in sendmail configuration
 
 Jo=E3o
 
 
 
 
 
 
 
 A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
 Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-rc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

Garrett Wollman-2
In reply to this post by linimon
The following reply was made to PR conf/90863; it has been noted by GNATS.

From: Garrett Wollman <[hidden email]>
To: JoaoBR <[hidden email]>
Cc: [hidden email]
Subject: Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup
Date: Tue, 27 Dec 2005 17:51:07 -0500

 <<On Tue, 27 Dec 2005 20:29:05 -0200, JoaoBR <[hidden email]> said:
 
 > I think that named is not starting first and so I guess the rc start order is
 > wrong and not that named do not answer queries,
 
 No, on my system named definitely is started in the correct order:
 
 wollman@xyz(4)$ echo `rcorder *` | fold -s
 rcconf.sh dumpon initrandom geli gbde encswap ccd swap1 ramdisk early.sh fsck
 root mountcritlocal var cleanvar random adjkerntz atm1 hostname ipfilter ipnat
 ipfs kldxref sppp addswap sysctl serial pccard netif isdnd ppp-user ipfw
 nsswitch ip6addrctl atm2 pfsync pflog pf routing ip6fw network_ipv6 mroute6d
 route6d mrouted routed dhclient NETWORKING devd mountcritremote devfs ipmon
 ramdisk-own newsyslog syslogd savecore SERVERS named ntpdate rpcbind nisdomain
 [...]
 
 The problem seems to be related to the fact that the bge(4) network
 interface in this machine takes a long time bring the link up.  When
 named starts, it attempts to validate the root zone cache before the
 network link comes up, forks, and returns SERVFAIL (?) to all requests
 until it is finally able to validate.  Older versions of named did not
 daemonize until the root zone cache was validated.
 
 This would not be a problem (that's why a server should always have
 another server after itself in /etc/resolv.conf) except that the stub
 resolver considers any reply (even "no I can't do that now") to be
 authoritative.  If named simply failed to respond to these queries,
 the resolver would fail over to the other server.
 
 -GAWollman
 
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-rc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

dougb
In reply to this post by linimon
Synopsis: [patch] 6.0 boot: name resolution broken for daemon startup

State-Changed-From-To: open->feedback
State-Changed-By: dougb
State-Changed-When: Sat Dec 31 01:50:46 UTC 2005
State-Changed-Why:

This is an interesting problem, and I have several responses. :)

First, if you're sure that the problem is with the bge interface,
I would prefer to see the problem fixed generically there, rather
than in rc.d/named. However, I can see some value for having some
sort of watchdog timer, similar to how it's done in /etc/rc.shutdown,
that insures named is working, or barks loudly if it's not. I will
give some thought as to how to make that a more generic interface
so that not just rc.shutdown and named can use it. Also, if we do
this I think it should be behind a knob that is off by default.

As for the boot order of named, Garrett is right, it starts as
soon as it's possible for it to start. If Greg wants to change
rc.d/sendmail to REQUIRE: named, that's up to him.


Responsible-Changed-From-To: freebsd-rc->dougb
Responsible-Changed-By: dougb
Responsible-Changed-When: Sat Dec 31 01:50:46 UTC 2005
Responsible-Changed-Why:

I will take responsibility for looking at the issue of a more
generic watchdog timer that boot scripts can use.

http://www.freebsd.org/cgi/query-pr.cgi?pr=90863
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-rc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

Garrett Wollman-2
<<On Sat, 31 Dec 2005 01:57:56 GMT, Doug Barton <[hidden email]> said:

> First, if you're sure that the problem is with the bge interface,
> I would prefer to see the problem fixed generically there, rather
> than in rc.d/named.

It's not a problem with bge(4), it's a general problem with network
interfaces that take a long time to bring the link up after it is
initialized.  (I expect to have the same problem with ti(4) on a
machine I'm upgrading right now.)  In this particular case I'm willing
to wait forever, since the machine can't do anything useful until it
has network, but that would be unacceptable for the general case.
Ordinary workstations using DHCP don't see this, because you obviously
can't get a lease until you can communicate with the DHCP server.

What I'd like would be to have a "don't fork until you're really
ready" option for named (or even better, for that to be restored as
the default behavior); servers without a local resolver don't have
this problem, because the stub resolver will retry requests that don't
elicit a response.  I think that's a superior solution to anything
that requires explicit configuration on the part of the sysadmin.

> As for the boot order of named, Garrett is right, it starts as
> soon as it's possible for it to start. If Greg wants to change
> rc.d/sendmail to REQUIRE: named, that's up to him.

sendmail already REQUIRE:s LOGIN, so there's no issue there.

-GAWollman

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-rc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: conf/90863: [patch] 6.0 boot: name resolution broken for daemon startup

Brooks Davis
On Fri, Dec 30, 2005 at 09:22:44PM -0500, Garrett Wollman wrote:

> <<On Sat, 31 Dec 2005 01:57:56 GMT, Doug Barton <[hidden email]> said:
>
> > First, if you're sure that the problem is with the bge interface,
> > I would prefer to see the problem fixed generically there, rather
> > than in rc.d/named.
>
> It's not a problem with bge(4), it's a general problem with network
> interfaces that take a long time to bring the link up after it is
> initialized.  (I expect to have the same problem with ti(4) on a
> machine I'm upgrading right now.)  In this particular case I'm willing
> to wait forever, since the machine can't do anything useful until it
> has network, but that would be unacceptable for the general case.
> Ordinary workstations using DHCP don't see this, because you obviously
> can't get a lease until you can communicate with the DHCP server.
>
> What I'd like would be to have a "don't fork until you're really
> ready" option for named (or even better, for that to be restored as
> the default behavior); servers without a local resolver don't have
> this problem, because the stub resolver will retry requests that don't
> elicit a response.  I think that's a superior solution to anything
> that requires explicit configuration on the part of the sysadmin.
On the whole, daemons should operate on the assumption that the network
will take an arbitrrarily long time to come up and that it may come
and go at any time.  A user should be able to boot their laptop while
on an airplane, suspend to disk for landing, boot up again and aquire
a network connection, and have all their daemons work correctly.
Likewise, a copy of FreeBSD running on a virtual server should support
being suspended, copied to a different datacenter, and coming back up
with a new addresses.  Obviously we're not there yet in a number of
areas, but this is where we should be heading and we can work on
server/libc behavior in advance of the kernel actually working.

-- Brooks

--
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

attachment0 (196 bytes) Download Attachment