In a comment on my article about installing updates on Exchange 2010 DAG members Greg asks:
I would like to know how many Exchange admins [ahem] server admins actually follow these steps when they patch, we have a lab where we just patch Server A in a two multi role server DAG reboot it, then patch Server B and reboot it, then balance out the databases evenly among them and call it an evening, no problems so far
To be fair we didn’t go to Exchange 2010 until SP3 so maybe things are more better with SP 3 but we don’t bother with any of the procedures above and use Exchanges built in intelligence.
Now Greg does say that this is how they patch their lab servers, so it isn’t clear whether this is how they treat their production servers. However it is still a topic worth discussing.
The short answer, from my point of view, is that running the DAG maintenance scripts is the wisest course of action. And I’m sure that anyone who has experienced a scenario where those scripts fail to switchover a database due to some other underlying condition at the time would agree with me.
For anyone else who is just running them because it looks like “the way it is done”, and is curious why this is the wisest course of action, let’s look briefly at the possible consequences of not running the DAG maintenance scripts during server maintenance.
Consider that Exchange Server is an application running on a Windows server. When someone tells that server to shut down or restart, Windows is going to shut down or restart. In most situations the Exchange Server gets little say in the matter.
In the process of Windows Server stopping Exchange services, the Primary Active Manager will try to switchover any active mailbox database copies to another DAG member. In a simple two-member DAG it is pretty obvious where the active database copy will end up. In larger DAGs there is a little more to it. If you want to dive into the full details read up on the Best Copy Selection process here.
If you do read that then you might notice the term “lossy failover”.
A lossy failover is a potential data loss scenario. And the default AutoDatabaseMountDial setting of “GoodAvailability” allows lossy failovers to occur. And even with Exchange Server’s ability to automatically attempt to recover data lost in a lossy failover, it is not a guarantee.
For a test lab this is not likely to be a concern, and most of the time I allow my own test lab servers to automatically patch and restart (on separate schedules) without my manual intervention.
From time to time there is a problem as a result of that, which requires me to apply a little bruce force and occasionally accept some data loss to bring all databases online again.
In a production environment I would really hate to put myself in that situation, or know that others are putting themselves in that situation simply because they can’t be bothered running the DAG maintenance scripts. My recommendation is to always use the DAG maintenance scripts during patching or other server maintenance on your DAG members.
Note: the DAG maintenance scripts used for Exchange 2010 are not suitable for use with Exchange 2013 and 2016. Instead, you should use the following procedures:
[adrotate banner=”49″]
For my two DAG with both server running CAS and Mailbox server, it works better when I shutdown and reboot for Windows server update. When I ran stop and start maintenance script, it got stuck and I had to reboot. I only used it for CU updates. For small environment, it does not take to bring them back
Paul,
What about an electrical shutdown where I have to shutdown the entire dag. I only have two servers and keep one of them as a primary for all databases. I was planning simply to shut them down in order and power them back on in the reverse order to make sure there is no failover. Will this work?
Thanks,
Brad
Manually dismount the databases first so they get a clean shutdown. You’ll need to manually mount them the first time when you bring everything back online too.
“We don’t have the infrastructure to support DAG’s with auto-failover”
What does that mean?
For the rest of your question, I don’t see how anything is going to be simpler or better than just moving the active databases to the other DAG member while you perform maintenance.
We don’t have the infrastructure to support DAG’s with auto-failover but, do run DAG’s and use manual switchover if required. We don’t switchover for minor events under 30 mins such as patches and restarts, short power outages, etc.. We have the DAG’s set with DAC enabled and failover blocked. In this scenario is there any worries using a script or restarting either of the two individual geographically separated servers individually and let mailflow from that server stop until the restart is finished?
Thank you
I ran into issues, but these may be a lack of knowledge on my part.
I have a DAG, and within it two servers. EXC01 and EXC02. The EXC01 server is the PAM. I ran the main scripts on EXC01 and did it in the order of start script, let it complete its changes, reboot, and ran the stop script. I then moved to EXC02 and ran into issues. When I tried to run the start script it errored out with a ton of messages. I had assumed I would be able to simply follow the protocol and do the same on both servers.
In the end, as time was short, I was forced to simply reboot EXC02. All the DBs flipped to EXC01, but I suspect I faced a ‘lossy’ change. In addition, and I don’t know if this is normal, but Outlook clients disconnected when EXC01 was rebooted.
Perhaps the clustering is in some way incomplete. I ran both healthchecks on both servers before I began and they were fine.
I have to say, its very well the world moving to powershell. And its fine when a script works. Its very uncomfortable to end up in a place where a script is run and it bombs. At the time I had no real idea what it had committed or not committed, how far along it had gone or not gone.
Anyway, I’m just curious, any obvious gotcha’s as to why in a two server DAG the secondary server may not be happy with start/stop scripts in the above process?
Cheers
DS
Hi Paul,
Thanks for sharing the detailed tips. I created a presentation on Server Maintenance Tips and would like to share that with you – http://www.slideshare.net/cauvery_varma/ms-exchange-server-database-maintenance-tips
My collegue who is very knowledgeable with Orchestrator has written a nice workflow that fully automates patching – of course it incorporates the DAG maintenance scripts.
So now the full patch cycle can be performed by non-exchange aware operations folks…and my sunday is free again…
Oliver
Oliver,
Would you be willing to share the Orchestrator workflow for the DAG?
Hi Paul,
Hope you ar doing Great as always…..
I am totally agree with Jimmy …… Thanks to Microsoft and Great minds like you, who have made such great Scripts and made the life of Exch. admins so easy… 🙂
Thanks,
Kanta Prasad
Recently had issues installing Rollup 2 for Exchange 2010 SP3 when I put the dag Member into maintenance mode. The part where the rollup would stop Services, it would just hang. The only way I was able to install the rollup was not to run the maintenance script.
Paul, just a short answer:
In my opinion using the maintenance-scripts is mandatory. I have patched a lot of huge (cusomters) Exchange-Environments and used always the scripts – with the result of a happy healthy DAG after the Patch-Day 🙂
have fun, Jimmy
We run the dag maintenance scripts along with disconnecting the Forefront from our Exchange server each time we patch (Exchange and OS). Too many times we’ve had to reseed dbs and a 300gb db takes forever!
I’ve gotten it so scripted that almost anyone in the department can run a patch cycle, if they ‘follow-the-bouncing-ball’.
“step1-Start_dagMaintenance.cmd” – runs the ‘startdagmaintenance.ps1’ script.
“step2-StopExchange_disableFF.CMD” – stops exchange and ff services (and now scom) needed for disconnecting ff.
“step3-Install_Rollup.cmd” – maps to install folder of latest rollup and runs install. if doing OS patching, run WIndows Update.
“step4-reconnectFF.Cmd” – reverses step 2.
“step5-Stop_dagMaintenance.cmd” – reverses step 1 and starts a server reboot.
Thanks. Early in my Exchange 2010 experience I had an issue and called MS paid support. They told me that I only needed the scripts for Exchange roll-ups and not Windows patches.
I am glad now that I run them in either case. I did have to discover that I needed a second quorum server to make them work in my 2 member DAG. If that appears in the documentation anywhere I have yet to find it.