A rant from an IT Manager about W32.Blaster.Worm

<RANT>
Ok, there's been a few people who don't know better talking about how everyone should have been patched for MSBlaster already, and that all admins who haven't patched are morons.

This is a pretty easy statement to make when you are responsible for 1-10 machines, and patching pretty much means hitting windows update.

However, life isn't that simple for everyone.  In addition to my developer hat, I also have the (mis?)-fortune of being the IT manager for my company's site of ~200 nodes, with about a dozen production servers and a similar number of dev & qa servers.  We are part of a bigger, global enterprise network consisting of about 60,000 nodes.

Let me say that again.

60,000 nodes.

My site experienced no disruption by W32.Blaster.Worm, because as IT Manager, I aggressively patched our production servers 3 weeks ago, followed by an equally aggressive client patch plan.  It took two entire weeks to plan, test, and completely deploy the RPC patch across our relatively small site.  In fact, we are still playing 'whack a mole' as developers and dial in users continue to bring up un-patched systems in our environments.

I've got a pretty big advantage that many IT managers don't these days.  I have a generous IT budget that allows me to approve large amounts of overtime and software expenditures.  I still spent many, many hours of overtime making certain that we were protected.  End result?  A single computer was infected, ironically just as the user was hitting http://windowsupdate.microsoft.com from behind our firewall.

Not all of our enterprise network was so lucky.  One of our sites suffered from several hundred infections.  Our network teams quickly deployed rules at our intra-site router boundaries to prevent traffic flowing on RPC ports, at the cost of breaking several enterprise applications, including intra site and external email.  Basically, our Exchange Servers couldn't talk to each other.

Next time you are preaching about admins not deploying the patch of the day, try planning a deployment for 60,000 machines, performing enough testing to ensure that _no_ line of business applications are broken by the patch.  I guarantee that you'll have greater respect for hard working sysadmins.

Also feel free if you've always written code that is free of buffer overflow or other equally critical security bugs.  Remember that it was a developer that wrote the RPC code that is at the root cause of this security issue. 

This doesn't even mentioned the other part of our business that is governed by FDA regulations, where it can actually become a legal offense to implement an IT change without rigorous testing and documentation.  Three weeks would be lightning fast to deploy a patch in such an environment.  To deploy an un-tested patch, I would actually be risking breaking the law. (Note for my best friend, the potential FDA inspector: All CSV procedure were followed, thank you.)

(Let it be noted here that all of the above rant is my personal opinion, and in no way reflects the opinions of my employer).

</RANT>

9 Comments

  • This is a tough call. Do I understand the difficulties in your job? Absolutely. Can I appreciate that many share in the blame beyond sysadmins? Certainly.



    Does this mean you have no culpability if nearly 4 weeks after a security patch is released you still haven't at least dealt with the servers? Absolutely not. 60,000 nodes.... okay sir, let's trim that down to the boxed directly under your group and responsibility. I dunno.... most like about 1000 servers? Clustered such that you could roll out such a patch - requiring no reboot of course provided you weren't on Win2k SP2 (and you ARE at least that much up to date on your servers, right?) - so you have to basically install such a patch in a logical PROACTIVE fashion.



    At that point you've at least achieved CYA. All client machines are either the responsibility of the owner or the help desk, not the sysadmin.



    I'm speaking critically because our 120 server site was forced to have downtime over lunch hour Tuesday because our sysadmin did not do this. And YES&lt; I hold them directly responsible for NOT doing their clearly defined job. (Fortunately it's completely moot. We're small enough that it was a slight inconvenience to have our ERP system go down for less than 60 minutes.)



    Rigorous testing? Falls under the same class as patch installs.... if it is not the responsibility of the sysadmin to keep the servers safe, then whom is?



    Good rant. Much needed. Very true in many ways. Just not 100% accurate IMHO.

  • I think you have to at least understand the problems with dealing with such a large number of machines. Unfortunately, there is a terrible history for patches breaking things. Then there are the cases where the patch is almost impossible to install (remember MSDE patch to allow the patch that would prevent the last SQL Server problem).



    If Microsoft would for an extended period of time create patches that work and don't break other things, perhaps the automatic update feature will be a more attractive feature.



    That said, I do think that in this case, normal sorts of things like routers properly configured would have handled the bulk of this problem.

  • Dave, you are being shortsighted here - I'm not the sysadmin, I'm the IT Manager. I'm ultimatedly responsible for every single node on our network. I can't just point a finger and call it &quot;the end user's fault.&quot;

    Furthermore, deploying a patch into a government regulated environment brings with it paperwork beyond comprehension. You are grossly oversimplifying the problem.

    And I'm not the only one with this problem. A friend of mine is the IT Manager for a network with an SLA (service level agreement) where they pay $100,000 per HOUR of downtime, in addition to an aditional $100,000 of lost revenue during peak usage. In such an environment, proper testing is critical before deployment. Change is not always as simple as you might believe.

    You mentioned that your shop was able to afford an hour of downtime. Could they afford it if it cost $200,000?

    What if it cost $1,000,000? I also know of a business that cannot upgrade from Sybase SQL Server 4.2 (Yes, it was Sybase before it was MS SQL Server. Microsoft 'appropriated' the Sybase codebase) because such an upgrade would cost them approximately one million dollars per HOUR of downtime. There's a lot more to running systems than Start-&gt;Windows Update / Click -Yes- to Reboot now.

  • Jerry,



    Sorry if my comments rubbed you the wrong way. But I don't recall saying that patching, etc. wasn't hard. I'm well aware that it is. What I said was that it was *necessary*, regardless of how hard it is.



    People frequently blame Microsoft for the costs associated with worm and virus outbreaks, and clearly they do bear some responsibility. But so do both businesses and individuals, whether or not there are costs associated with keeping their networks secure.



    Also, the fact that patches can be difficult and/or expensive to roll out and test does not mean that it's not possible to formulate company-wide policies for home workers to try to minimize this kind of outbreak. For one, if employees knew that they were subject to reprimand or termination if a problem with their home machine caused an outbreak within the corporate network, it's reasonable to assume that they'd take better care with the security of their machines.



    Bottom line is that I don't expect IT managers and security personnel to be perfect, but when an organization like the DMV has to shut down because of a worm like this, someone (or multiple someones) hasn't done their job properly. If there are problems that make patching and security policies difficult, find ways to deal with those problems. But I don't expect complaints that nobody understands how difficult the job is. I know it's difficult, and I can sympathize. But I still expect the job to get done.

  • I agree about the cost of downtime, the QA needed, so on and so forth....



    But as IT Manager (sorry for talking like you're a sysadmin) I believe you are then truely misplacing your rant. Why rant at us IT geeks? What did we do?



    The problem is you are either underfunded, understaffed, both... or just plain delinquent in your responsibilities. If I am your supervisor - I would (1) strongly hope it is not the latter and (2) work with you to fully staff/fund your department so this never happens again. And, I'd still have to fully investigate - objectively but most definitely beginning with YOU - why this happened in the first place.



    (Speaking hypothetically of course. I did understand that you personally had no downtime. Thus, you did your job in a tough situation - something to be commended for. I'm also sure that's exatly why you get paid the big buck.)

  • My rant is aimed at developers spouting off about issues when they have absolutely no idea what they are talking about. Non-programmer sysadmins don't tell developers how to avoid writing buffer overflows that would prevent this situation in the first place, and developers without sysadmin experience shouldn't be telling admins how to run their networks.

    I have quite a bit of experience wearing both hats, and I sympathize with both groups. I guess I'm jumping in here because I think that a lot of unfair comments are being made.

  • Jerry wrote:



    &quot;My rant is aimed at developers spouting off about issues when they have absolutely no idea what they are talking about.&quot;



    Jerry, how do you know, other than assuming from their comments on the subject, what a developer does or doesn't know about administration? Just because you disagree with someone's conclusions doesn't mean that &quot;they have absolutely no idea what they are talking about.&quot;



    If there are factual points that are wrong, feel free to correct them. But being insulting about it just makes it seem like you're taking this way too personally. No one was attacking you in particular, after all.

  • Have you EVER heard of automatic patching systems? Radia, and the like? What about domain login scripts for users that do patch updates? What about utilities like dameware, and others that allow you to do remote upgrades. Yes, there is some testing and configuration required, but you don't see network engineers running around, patching machines as they stand next to them. They automate code pushes, and schedule reloads. In the age of technology, you'd think that people would learn to work smarter, not harder.

  • You have to wonder sometimes if people actually both to read your post before offering comments, since it seems rather clear to me that you're talking about testing and documentation, not deployment. It almost makes me wonder if some people that keep posting without reading are simply trying to make their blog more visible -- I mean how many posts should one really make about this issue?

Comments have been disabled for this content.