USN Rollback, NTDS General Errors, and Paused NetLogon

G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

Hello,

Our Active Directory is on the fritz. We have 7 domain controllers
spread over 3 domains in a single forest. Due to a problem over the
weekend, we restored all of the domain controllers to a point-in-time
backup. Since then, we appear to be in a USN rollback condition (see KB
875495).

We consolidated the FSMO roles and demoted 4 DCs, leaving one Dc per
domain. We waited for replication, then added the DCs back.

The DCs continue to get NTDS General Event ID 2103, "The Active
Directory database has been restored using an unsupported restoration
procedure." NetLogon service starts up paused.

Any idea what to do next? Thanks in advance,

J Wolfgang Goerlich


Microsoft Article 875495, "How to detect and recover from a USN
rollback in Windows Server 2003"
http://support.microsoft.com/?kbid=875495
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

Hello.

Can you tell us some about how you took your backups, and how you restored
them? Were these image based backups?
Also, of the DCs that were left after your mass demotion/promotion, who did
you restore and who did you leave alone? I wans't sure if you restored all
DCs or a subset, or....

Thanks!
~Eric


--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



<jwgoerlich@gmail.com> wrote in message
news:1121869455.888573.154490@o13g2000cwo.googlegroups.com...
> Hello,
>
> Our Active Directory is on the fritz. We have 7 domain controllers
> spread over 3 domains in a single forest. Due to a problem over the
> weekend, we restored all of the domain controllers to a point-in-time
> backup. Since then, we appear to be in a USN rollback condition (see KB
> 875495).
>
> We consolidated the FSMO roles and demoted 4 DCs, leaving one Dc per
> domain. We waited for replication, then added the DCs back.
>
> The DCs continue to get NTDS General Event ID 2103, "The Active
> Directory database has been restored using an unsupported restoration
> procedure." NetLogon service starts up paused.
>
> Any idea what to do next? Thanks in advance,
>
> J Wolfgang Goerlich
>
>
> Microsoft Article 875495, "How to detect and recover from a USN
> rollback in Windows Server 2003"
> http://support.microsoft.com/?kbid=875495
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

Hello Eric,

We are on a SAN. Before the upgrade, I rebooted one DC at a time. While
the DC was down, I made a point-in-time backup (actually, a block-level
image) on the SAN. I did this for all of the DCs, one after the other,
with the maximum time between the first and last backup being 18
minutes. No changes were made during those 18 minutes.

We restored the DCs approximately 15 hours later. We had a black out
period and were able to take all of the DCs down at once. We restored
to the SAN backups. We booted up the PDCs first (which also hold the
GCs), and then the subsequent DCs.

The next day, say about 24 hours after the restore, we diagnosed the
USN rollback. Repadmin /showutdvec showed that the PDCs had the highest
USN. We moved the FSMO roles from the other DCs to the three PDCs. We
then demoted the four DCs, waited for replication (around four hours to
be safe), and then began promoting one DC at a time. The first was
fine. The second and third DC still started with NTDS General Event ID
2103. At this point, adding the forth back in is on hold.

Appreciate the response. Hope this clarifies the situation.

J Wolfgang Goerlich
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

First, I'd like to start by pointing out that what was done is explicitly
against the "rules" of AD. Rolling back a DC w/o using proper procedures
(either the standard backup/restore procedures, or VSS + ad writer and the
like sort of mechanisms) results in exactly where we are here. In some
cases, this can be exceptionally painful in trying to fix it. Sometimes the
forest is never the same.
Really, saying that we need to use the appropriate backup/restore procedures
is not just a line. :) We really do, or the replication model suffers.
Replication has no way of knowing that a DC was rolled back (we can't tell
that the drives were swapped like this), which is what we have made our VSS
provider handle. Please, please don't do this again.
The article you cited has info on this. :)

Ok, I feel better now that I've gotten that off of my chest.... :)

Back to the real issue at hand.
Are there other domains? When you said you went to just the PDC, was this
the only DC in the entire forest, or were there others around? If others can
you tell us about what else is out there?
When others were down, did you metadata clean them up? What procedures were
followed.

Thanks!
~Eric


--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



<jwgoerlich@gmail.com> wrote in message
news:1121950697.473457.240180@g43g2000cwa.googlegroups.com...
> Hello Eric,
>
> We are on a SAN. Before the upgrade, I rebooted one DC at a time. While
> the DC was down, I made a point-in-time backup (actually, a block-level
> image) on the SAN. I did this for all of the DCs, one after the other,
> with the maximum time between the first and last backup being 18
> minutes. No changes were made during those 18 minutes.
>
> We restored the DCs approximately 15 hours later. We had a black out
> period and were able to take all of the DCs down at once. We restored
> to the SAN backups. We booted up the PDCs first (which also hold the
> GCs), and then the subsequent DCs.
>
> The next day, say about 24 hours after the restore, we diagnosed the
> USN rollback. Repadmin /showutdvec showed that the PDCs had the highest
> USN. We moved the FSMO roles from the other DCs to the three PDCs. We
> then demoted the four DCs, waited for replication (around four hours to
> be safe), and then began promoting one DC at a time. The first was
> fine. The second and third DC still started with NTDS General Event ID
> 2103. At this point, adding the forth back in is on hold.
>
> Appreciate the response. Hope this clarifies the situation.
>
> J Wolfgang Goerlich
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

> First, I'd like to start by pointing out that what was done is explicitly
> against the "rules" of AD.

Ah, but it *worked* in the test environment. Note that all DCs were
rolled back simultaneously to backups that occurred with a maximum 18
minute delta and that contained no explicit AD changes. (Had to get it
off of my chest <grin>).

Of course, you are right: this did not work in production. Ok, ok, no
more trying to be fancy.

> Are there other domains?

Three domains total, one forest.

> When you said you went to just the PDC, was this the only DC in the entire forest,
> or were there others around?

We went down to one PDC per domain, three total PDCs online.

> When others were down, did you metadata clean them up?

No, did not do a metadata cleanup. The other DCs removed cleanly. When
the second two DCs came up with problems, I ran an integrity check,
soft recovery, and set the registry for a non-authoratitive restore
(BurFlags). This did not help.

Much obliged for the help,

J Wolfgang Goerlich
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

The problem is that you don't always hit every condition in test that could
arise when you do this. In fact, given that your change rate is different
(and internal change rate within AD is different), it won't be the same. So
no matter what you saw in test, it is still considered to be a high risk
operation.

When you were down to just the three DCs, did you have USN rollback there?
Ideal would be going back to bare-bones numbers, ensuring complete end to
end health with them, then building back from there.
Did you take repadmin to this at all and make any modifications
(specifically any with the /sync switch)?

~Eric

--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



<jwgoerlich@gmail.com> wrote in message
news:1122033931.137415.111040@g49g2000cwa.googlegroups.com...
>> First, I'd like to start by pointing out that what was done is explicitly
>> against the "rules" of AD.
>
> Ah, but it *worked* in the test environment. Note that all DCs were
> rolled back simultaneously to backups that occurred with a maximum 18
> minute delta and that contained no explicit AD changes. (Had to get it
> off of my chest <grin>).
>
> Of course, you are right: this did not work in production. Ok, ok, no
> more trying to be fancy.
>
>> Are there other domains?
>
> Three domains total, one forest.
>
>> When you said you went to just the PDC, was this the only DC in the
>> entire forest,
>> or were there others around?
>
> We went down to one PDC per domain, three total PDCs online.
>
>> When others were down, did you metadata clean them up?
>
> No, did not do a metadata cleanup. The other DCs removed cleanly. When
> the second two DCs came up with problems, I ran an integrity check,
> soft recovery, and set the registry for a non-authoratitive restore
> (BurFlags). This did not help.
>
> Much obliged for the help,
>
> J Wolfgang Goerlich
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

> When you were down to just the three DCs, did you have USN rollback there?

Yes, on the PDC in the root domain. The two other PDCs were fine.

> Ideal would be going back to bare-bones numbers, ensuring complete end to
> end health with them, then building back from there.

Agreed. However, we cannot do this w/o losing domain objects. Given the
size of our domains, this is not an option.

> Did you take repadmin to this at all and make any modifications
> (specifically any with the /sync switch)?

No. Should I have?

Thanks again,

J Wolfgang Goerlich
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

The ideal comment was specific to what you did already.....fewer DCs. So you
went back to just the 3 PDCs, which makes life far easier to troubleshoot.

When you had just those 3 DCs, what NC did you get usn rollback for?


--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



<jwgoerlich@gmail.com> wrote in message
news:1122053658.371186.73680@g43g2000cwa.googlegroups.com...
>> When you were down to just the three DCs, did you have USN rollback
>> there?
>
> Yes, on the PDC in the root domain. The two other PDCs were fine.
>
>> Ideal would be going back to bare-bones numbers, ensuring complete end to
>> end health with them, then building back from there.
>
> Agreed. However, we cannot do this w/o losing domain objects. Given the
> size of our domains, this is not an option.
>
>> Did you take repadmin to this at all and make any modifications
>> (specifically any with the /sync switch)?
>
> No. Should I have?
>
> Thanks again,
>
> J Wolfgang Goerlich
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

Sorry that question wasn't clear.
More generally, what were the errors when you were in that state of very few
DCs? Do you have the logs from that time that I could look at?


--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



"Eric Fleischman [MSFT]" <efleis@online.microsoft.com> wrote in message
news:%23LCO1WujFHA.2180@TK2MSFTNGP15.phx.gbl...
> The ideal comment was specific to what you did already.....fewer DCs. So
> you went back to just the 3 PDCs, which makes life far easier to
> troubleshoot.
>
> When you had just those 3 DCs, what NC did you get usn rollback for?
>
>
> --
> Eric Fleischman [MSFT]
> These postings are provided "AS IS" with no warranties, and confers no
> rights.
>
>
>
> <jwgoerlich@gmail.com> wrote in message
> news:1122053658.371186.73680@g43g2000cwa.googlegroups.com...
>>> When you were down to just the three DCs, did you have USN rollback
>>> there?
>>
>> Yes, on the PDC in the root domain. The two other PDCs were fine.
>>
>>> Ideal would be going back to bare-bones numbers, ensuring complete end
>>> to
>>> end health with them, then building back from there.
>>
>> Agreed. However, we cannot do this w/o losing domain objects. Given the
>> size of our domains, this is not an option.
>>
>>> Did you take repadmin to this at all and make any modifications
>>> (specifically any with the /sync switch)?
>>
>> No. Should I have?
>>
>> Thanks again,
>>
>> J Wolfgang Goerlich
>>
>
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

> More generally, what were the errors when you were in that state of very few
> DCs? Do you have the logs from that time that I could look at?

The only errors were on the root domain PDC. On bootup, this logs NTDS
General Event ID 2103 and pauses the NetLogon service. Curiously, the
second DC in the same domain works fine. Even more curious, the second
DC in the child domain gets 2103 even though its PDC works fine.

I have logs but that is basically it.

J Wolfgang Goerlich
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

Can you email me the logs please? Drop the "online" from my address.


--
Eric Fleischman [MSFT]
These postings are provided "AS IS" with no warranties, and confers no
rights.



<jwgoerlich@gmail.com> wrote in message
news:1122057225.188539.4120@o13g2000cwo.googlegroups.com...
>> More generally, what were the errors when you were in that state of very
>> few
>> DCs? Do you have the logs from that time that I could look at?
>
> The only errors were on the root domain PDC. On bootup, this logs NTDS
> General Event ID 2103 and pauses the NetLogon service. Curiously, the
> second DC in the same domain works fine. Even more curious, the second
> DC in the child domain gets 2103 even though its PDC works fine.
>
> I have logs but that is basically it.
>
> J Wolfgang Goerlich
>
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

> Can you email me the logs please?

Certainly. The logs are on their way.

J Wolfgang Goerlich
 
G

Guest

Guest
Archived from groups: microsoft.public.win2000.active_directory (More info?)

I just wanted to comment on this part, I don't need to talk about the other part
because ~Eric is one of the most qualified to help with it, if he can't help
you, you are in a world of pain.

Anyway, working in test does not make it ok to do in production, doing this kind
of thing is still against the "rules". Test is good for testing things that are
supposed to work but you still need a little confidence boost, say like a schema
change. The fact that your image based backup of AD worked in test only says one
thing, you were lucky, though maybe it doesn't even say that because you went
and did it in production.

You have to keep in mind that AD is a single distributed system. It should not
be thought of as a simple collection of servers. As such, if you have an idea of
backing it up in a way that MS says don't do, at the very least, do a complete
shutdown of every system involved prior to the back up so you are truly at a
dead nothing changing state. That gets you some chance of possibly succeeding.

Basically, just because you didn't make any changes doesn't mean changes aren't
being made and replicated. AD is a livng system that is constantly updating AD
attributes on its own without any guidance from you. Also if you have things
like Exchange or other directory aware apps, they can be making changes as well
that you have no knowledge of.


--
Joe Richards Microsoft MVP Windows Server Directory Services
www.joeware.net


jwgoerlich@gmail.com wrote:
>>First, I'd like to start by pointing out that what was done is explicitly
>>against the "rules" of AD.
>
>
> Ah, but it *worked* in the test environment. Note that all DCs were
> rolled back simultaneously to backups that occurred with a maximum 18
> minute delta and that contained no explicit AD changes. (Had to get it
> off of my chest <grin>).
>
> Of course, you are right: this did not work in production. Ok, ok, no
> more trying to be fancy.
>