USN Rollback, NTDS General Errors, and Paused NetLogon

Archived from groups: microsoft.public.win2000.active_directory (More info?)

Hello,

Our Active Directory is on the fritz. We have 7 domain controllers
spread over 3 domains in a single forest. Due to a problem over the
weekend, we restored all of the domain controllers to a point-in-time
backup. Since then, we appear to be in a USN rollback condition (see KB
875495).

We consolidated the FSMO roles and demoted 4 DCs, leaving one Dc per
domain. We waited for replication, then added the DCs back.

The DCs continue to get NTDS General Event ID 2103, "The Active
Directory database has been restored using an unsupported restoration
procedure." NetLogon service starts up paused.

Any idea what to do next? Thanks in advance,

J Wolfgang Goerlich


Microsoft Article 875495, "How to detect and recover from a USN
rollback in Windows Server 2003"
http://support.microsoft.com/?kbid=875495
12 answers Last reply
More about rollback ntds general errors paused netlogon
  1. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    Hello.

    Can you tell us some about how you took your backups, and how you restored
    them? Were these image based backups?
    Also, of the DCs that were left after your mass demotion/promotion, who did
    you restore and who did you leave alone? I wans't sure if you restored all
    DCs or a subset, or....

    Thanks!
    ~Eric


    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    <jwgoerlich@gmail.com> wrote in message
    news:1121869455.888573.154490@o13g2000cwo.googlegroups.com...
    > Hello,
    >
    > Our Active Directory is on the fritz. We have 7 domain controllers
    > spread over 3 domains in a single forest. Due to a problem over the
    > weekend, we restored all of the domain controllers to a point-in-time
    > backup. Since then, we appear to be in a USN rollback condition (see KB
    > 875495).
    >
    > We consolidated the FSMO roles and demoted 4 DCs, leaving one Dc per
    > domain. We waited for replication, then added the DCs back.
    >
    > The DCs continue to get NTDS General Event ID 2103, "The Active
    > Directory database has been restored using an unsupported restoration
    > procedure." NetLogon service starts up paused.
    >
    > Any idea what to do next? Thanks in advance,
    >
    > J Wolfgang Goerlich
    >
    >
    > Microsoft Article 875495, "How to detect and recover from a USN
    > rollback in Windows Server 2003"
    > http://support.microsoft.com/?kbid=875495
    >
  2. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    Hello Eric,

    We are on a SAN. Before the upgrade, I rebooted one DC at a time. While
    the DC was down, I made a point-in-time backup (actually, a block-level
    image) on the SAN. I did this for all of the DCs, one after the other,
    with the maximum time between the first and last backup being 18
    minutes. No changes were made during those 18 minutes.

    We restored the DCs approximately 15 hours later. We had a black out
    period and were able to take all of the DCs down at once. We restored
    to the SAN backups. We booted up the PDCs first (which also hold the
    GCs), and then the subsequent DCs.

    The next day, say about 24 hours after the restore, we diagnosed the
    USN rollback. Repadmin /showutdvec showed that the PDCs had the highest
    USN. We moved the FSMO roles from the other DCs to the three PDCs. We
    then demoted the four DCs, waited for replication (around four hours to
    be safe), and then began promoting one DC at a time. The first was
    fine. The second and third DC still started with NTDS General Event ID
    2103. At this point, adding the forth back in is on hold.

    Appreciate the response. Hope this clarifies the situation.

    J Wolfgang Goerlich
  3. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    First, I'd like to start by pointing out that what was done is explicitly
    against the "rules" of AD. Rolling back a DC w/o using proper procedures
    (either the standard backup/restore procedures, or VSS + ad writer and the
    like sort of mechanisms) results in exactly where we are here. In some
    cases, this can be exceptionally painful in trying to fix it. Sometimes the
    forest is never the same.
    Really, saying that we need to use the appropriate backup/restore procedures
    is not just a line. :) We really do, or the replication model suffers.
    Replication has no way of knowing that a DC was rolled back (we can't tell
    that the drives were swapped like this), which is what we have made our VSS
    provider handle. Please, please don't do this again.
    The article you cited has info on this. :)

    Ok, I feel better now that I've gotten that off of my chest.... :)

    Back to the real issue at hand.
    Are there other domains? When you said you went to just the PDC, was this
    the only DC in the entire forest, or were there others around? If others can
    you tell us about what else is out there?
    When others were down, did you metadata clean them up? What procedures were
    followed.

    Thanks!
    ~Eric


    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    <jwgoerlich@gmail.com> wrote in message
    news:1121950697.473457.240180@g43g2000cwa.googlegroups.com...
    > Hello Eric,
    >
    > We are on a SAN. Before the upgrade, I rebooted one DC at a time. While
    > the DC was down, I made a point-in-time backup (actually, a block-level
    > image) on the SAN. I did this for all of the DCs, one after the other,
    > with the maximum time between the first and last backup being 18
    > minutes. No changes were made during those 18 minutes.
    >
    > We restored the DCs approximately 15 hours later. We had a black out
    > period and were able to take all of the DCs down at once. We restored
    > to the SAN backups. We booted up the PDCs first (which also hold the
    > GCs), and then the subsequent DCs.
    >
    > The next day, say about 24 hours after the restore, we diagnosed the
    > USN rollback. Repadmin /showutdvec showed that the PDCs had the highest
    > USN. We moved the FSMO roles from the other DCs to the three PDCs. We
    > then demoted the four DCs, waited for replication (around four hours to
    > be safe), and then began promoting one DC at a time. The first was
    > fine. The second and third DC still started with NTDS General Event ID
    > 2103. At this point, adding the forth back in is on hold.
    >
    > Appreciate the response. Hope this clarifies the situation.
    >
    > J Wolfgang Goerlich
    >
  4. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    > First, I'd like to start by pointing out that what was done is explicitly
    > against the "rules" of AD.

    Ah, but it *worked* in the test environment. Note that all DCs were
    rolled back simultaneously to backups that occurred with a maximum 18
    minute delta and that contained no explicit AD changes. (Had to get it
    off of my chest <grin>).

    Of course, you are right: this did not work in production. Ok, ok, no
    more trying to be fancy.

    > Are there other domains?

    Three domains total, one forest.

    > When you said you went to just the PDC, was this the only DC in the entire forest,
    > or were there others around?

    We went down to one PDC per domain, three total PDCs online.

    > When others were down, did you metadata clean them up?

    No, did not do a metadata cleanup. The other DCs removed cleanly. When
    the second two DCs came up with problems, I ran an integrity check,
    soft recovery, and set the registry for a non-authoratitive restore
    (BurFlags). This did not help.

    Much obliged for the help,

    J Wolfgang Goerlich
  5. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    The problem is that you don't always hit every condition in test that could
    arise when you do this. In fact, given that your change rate is different
    (and internal change rate within AD is different), it won't be the same. So
    no matter what you saw in test, it is still considered to be a high risk
    operation.

    When you were down to just the three DCs, did you have USN rollback there?
    Ideal would be going back to bare-bones numbers, ensuring complete end to
    end health with them, then building back from there.
    Did you take repadmin to this at all and make any modifications
    (specifically any with the /sync switch)?

    ~Eric

    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    <jwgoerlich@gmail.com> wrote in message
    news:1122033931.137415.111040@g49g2000cwa.googlegroups.com...
    >> First, I'd like to start by pointing out that what was done is explicitly
    >> against the "rules" of AD.
    >
    > Ah, but it *worked* in the test environment. Note that all DCs were
    > rolled back simultaneously to backups that occurred with a maximum 18
    > minute delta and that contained no explicit AD changes. (Had to get it
    > off of my chest <grin>).
    >
    > Of course, you are right: this did not work in production. Ok, ok, no
    > more trying to be fancy.
    >
    >> Are there other domains?
    >
    > Three domains total, one forest.
    >
    >> When you said you went to just the PDC, was this the only DC in the
    >> entire forest,
    >> or were there others around?
    >
    > We went down to one PDC per domain, three total PDCs online.
    >
    >> When others were down, did you metadata clean them up?
    >
    > No, did not do a metadata cleanup. The other DCs removed cleanly. When
    > the second two DCs came up with problems, I ran an integrity check,
    > soft recovery, and set the registry for a non-authoratitive restore
    > (BurFlags). This did not help.
    >
    > Much obliged for the help,
    >
    > J Wolfgang Goerlich
    >
  6. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    > When you were down to just the three DCs, did you have USN rollback there?

    Yes, on the PDC in the root domain. The two other PDCs were fine.

    > Ideal would be going back to bare-bones numbers, ensuring complete end to
    > end health with them, then building back from there.

    Agreed. However, we cannot do this w/o losing domain objects. Given the
    size of our domains, this is not an option.

    > Did you take repadmin to this at all and make any modifications
    > (specifically any with the /sync switch)?

    No. Should I have?

    Thanks again,

    J Wolfgang Goerlich
  7. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    The ideal comment was specific to what you did already.....fewer DCs. So you
    went back to just the 3 PDCs, which makes life far easier to troubleshoot.

    When you had just those 3 DCs, what NC did you get usn rollback for?


    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    <jwgoerlich@gmail.com> wrote in message
    news:1122053658.371186.73680@g43g2000cwa.googlegroups.com...
    >> When you were down to just the three DCs, did you have USN rollback
    >> there?
    >
    > Yes, on the PDC in the root domain. The two other PDCs were fine.
    >
    >> Ideal would be going back to bare-bones numbers, ensuring complete end to
    >> end health with them, then building back from there.
    >
    > Agreed. However, we cannot do this w/o losing domain objects. Given the
    > size of our domains, this is not an option.
    >
    >> Did you take repadmin to this at all and make any modifications
    >> (specifically any with the /sync switch)?
    >
    > No. Should I have?
    >
    > Thanks again,
    >
    > J Wolfgang Goerlich
    >
  8. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    Sorry that question wasn't clear.
    More generally, what were the errors when you were in that state of very few
    DCs? Do you have the logs from that time that I could look at?


    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    "Eric Fleischman [MSFT]" <efleis@online.microsoft.com> wrote in message
    news:%23LCO1WujFHA.2180@TK2MSFTNGP15.phx.gbl...
    > The ideal comment was specific to what you did already.....fewer DCs. So
    > you went back to just the 3 PDCs, which makes life far easier to
    > troubleshoot.
    >
    > When you had just those 3 DCs, what NC did you get usn rollback for?
    >
    >
    > --
    > Eric Fleischman [MSFT]
    > These postings are provided "AS IS" with no warranties, and confers no
    > rights.
    >
    >
    >
    > <jwgoerlich@gmail.com> wrote in message
    > news:1122053658.371186.73680@g43g2000cwa.googlegroups.com...
    >>> When you were down to just the three DCs, did you have USN rollback
    >>> there?
    >>
    >> Yes, on the PDC in the root domain. The two other PDCs were fine.
    >>
    >>> Ideal would be going back to bare-bones numbers, ensuring complete end
    >>> to
    >>> end health with them, then building back from there.
    >>
    >> Agreed. However, we cannot do this w/o losing domain objects. Given the
    >> size of our domains, this is not an option.
    >>
    >>> Did you take repadmin to this at all and make any modifications
    >>> (specifically any with the /sync switch)?
    >>
    >> No. Should I have?
    >>
    >> Thanks again,
    >>
    >> J Wolfgang Goerlich
    >>
    >
    >
  9. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    > More generally, what were the errors when you were in that state of very few
    > DCs? Do you have the logs from that time that I could look at?

    The only errors were on the root domain PDC. On bootup, this logs NTDS
    General Event ID 2103 and pauses the NetLogon service. Curiously, the
    second DC in the same domain works fine. Even more curious, the second
    DC in the child domain gets 2103 even though its PDC works fine.

    I have logs but that is basically it.

    J Wolfgang Goerlich
  10. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    Can you email me the logs please? Drop the "online" from my address.


    --
    Eric Fleischman [MSFT]
    These postings are provided "AS IS" with no warranties, and confers no
    rights.


    <jwgoerlich@gmail.com> wrote in message
    news:1122057225.188539.4120@o13g2000cwo.googlegroups.com...
    >> More generally, what were the errors when you were in that state of very
    >> few
    >> DCs? Do you have the logs from that time that I could look at?
    >
    > The only errors were on the root domain PDC. On bootup, this logs NTDS
    > General Event ID 2103 and pauses the NetLogon service. Curiously, the
    > second DC in the same domain works fine. Even more curious, the second
    > DC in the child domain gets 2103 even though its PDC works fine.
    >
    > I have logs but that is basically it.
    >
    > J Wolfgang Goerlich
    >
  11. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    > Can you email me the logs please?

    Certainly. The logs are on their way.

    J Wolfgang Goerlich
  12. Archived from groups: microsoft.public.win2000.active_directory (More info?)

    I just wanted to comment on this part, I don't need to talk about the other part
    because ~Eric is one of the most qualified to help with it, if he can't help
    you, you are in a world of pain.

    Anyway, working in test does not make it ok to do in production, doing this kind
    of thing is still against the "rules". Test is good for testing things that are
    supposed to work but you still need a little confidence boost, say like a schema
    change. The fact that your image based backup of AD worked in test only says one
    thing, you were lucky, though maybe it doesn't even say that because you went
    and did it in production.

    You have to keep in mind that AD is a single distributed system. It should not
    be thought of as a simple collection of servers. As such, if you have an idea of
    backing it up in a way that MS says don't do, at the very least, do a complete
    shutdown of every system involved prior to the back up so you are truly at a
    dead nothing changing state. That gets you some chance of possibly succeeding.

    Basically, just because you didn't make any changes doesn't mean changes aren't
    being made and replicated. AD is a livng system that is constantly updating AD
    attributes on its own without any guidance from you. Also if you have things
    like Exchange or other directory aware apps, they can be making changes as well
    that you have no knowledge of.


    --
    Joe Richards Microsoft MVP Windows Server Directory Services
    www.joeware.net


    jwgoerlich@gmail.com wrote:
    >>First, I'd like to start by pointing out that what was done is explicitly
    >>against the "rules" of AD.
    >
    >
    > Ah, but it *worked* in the test environment. Note that all DCs were
    > rolled back simultaneously to backups that occurred with a maximum 18
    > minute delta and that contained no explicit AD changes. (Had to get it
    > off of my chest <grin>).
    >
    > Of course, you are right: this did not work in production. Ok, ok, no
    > more trying to be fancy.
    >
Ask a new question

Read More

Domain Active Directory Windows