it’s time to get VIRTUAL
It’s time, kids. Time to get virtual. We have CEO buy-in, the CIO is mad about it (some would say the CIO is mad period but I won’t go there) and we, the folks who do things, are raring to go. The checks have been authorized. The servers have arrived and are sitting pretty in their rack.
There were a few really good reasons to virtualize. One of them would be the five or seven existing servers that are way past end-of-life. Another would be reducing about twenty servers to three, with one failover. Think of the savings on electricity. Think of the cooling savings… on a day like today, when it’s ninety six outside, it’s difficult to keep the server room cool as it is.
We decided on Dell servers, largely because we’re already a Dell shop and also because a lot of companies were using them already for virtualization. The 2960’s seem to be the default but we heard about a new series that was coming out, with a really hot AMD chip that was specifically set up for virtualization. We purchased four servers (R805?) with dual quad Opteron chips. Because I wasn’t there when they were racked, they got tagged Moe, Larry, Curly, and Shemp. If I were there, they would’ve gotten more interesting names.
Here we are, at T minus one week and counting. Where do I start?
Monday, T minus one week
The Storage Area Network (SAN) was scheduled to be delivered today. For some strange reason, it didn’t. It was rescheduled for tomorrow. It must’ve been the heat.. yeah, that’s it… too hot for SANS to be outside today. We wouldn’t want to get the SAN Police involved, would we?
I had emailed the server specs to the V-consultants, who let us know there were a `few inconsistencies.’ Minor stuff, like we had two HBA’s too many per server. We didn’t need them but could use four more ethernet ports. Now keep in mind that we ordered the servers from their specs and they signed off on them. Plus we had to go up a service level from Dell because of the HBA’s.
My coworker flew immediately into a rage (that used to be my job but I got promoted). He wanted to see heads rolling. My boss, quickly assessing the situation, flew immediately into inaction and declared something must be done: we needed a MEETING. And not just any meeting - we needed a phone conference.
I have worked for some interesting people. One of them I suspect would, upon seeing flames shooting out of the server room, schedule a week-long intensive set of meetings to discuss what we might want to consider doing about it. The answer would undoubtedly turn out to be TO HAVE MORE MEETINGS.
Cue phone conference with all interested parties except v-consultants. Main contractor has all sorts of reasons why they ok’d the servers in our rack. Furthermore, he doesn’t understand why we need extra networking plus two alleged Cisco 3750’s to make everything work.
As this conference continues, I am reminded of an email that just came in. This one’s from the SAN people. For some reason known only to the Virtualization Gods, we are receiving an extra rack. Since there’s no room at the inn for another rack, I instructed them to keep it or tie it to their car and use it as a roof rack. SAN people are now quite disturbed and wondering if we have all the correct power connectors because their rack does.
Someone in the SAN department neglected to inform us that they will require a 220V feed. Fortunately we have what we believe is the correct feed coming out of the UPS we purchased for this rack. Still waiting on SAN people to confirm this is the correct (twisty) connector.
Meanwhile, back at the phone conference, we have confirmed that V-consultants are not on same page with Main Contractor. You know what that means, right? We have to schedule another phone conference, this time with V-consultants too.
Not bad for the first day of the blog, one week in advance of start of work…
Oh wait, I forgot to mention that while we were putting out fires, Men With Ladders were poking about in the ceiling. It would seem that our brand new air conditioning unit, installed two weeks ago, is not working all that well.
Happy Monday, folks.
Second phone conference occurred. Lotsa confused parties but a firm plan for going forward. Dump HBAs, install dual gig port cards instead. Call into Dell to see what they can do with out mis-ordered servers. Find out if 10g ports are VMware compatible - V-consultant can’t find them on the compatibility list. Yippie.
Tuesday - T-minus six days
No additional virtualization news but the air conditioning remains in its questionable state. One brand new air conditioning roof unit - already not working correctly. Two other units down for parts or other reasons. And it may hit close to one hundred today. Building Manglement has instructed us to keep our office door closed so we don’t let the heat out into other areas.
A few hours later, the crack building team (so named for smoking so much crack) discovered the source of the problem: the breaker kept popping because it was underrated. The Head Cracker informed us that this is probably the same problem we had with the old unit (we just put up the new unit). IIRC, Head Cracker was the one who decided we needed the new unit anyway. Meanwhile we spent the day sweating and resetting a breaker.
The SAN, delayed from yesterday, was supposed to be delivered first thing in the morning. Because of this, it arrived at 2pm sharp. Since we informed SANCo that we didn’t want their bloody rack, the unit arrived, pre-racked. There was a bit of a dance on the loading dock because they couldn’t get the unit through the front door and had to remove it from the skid and roll it in manually.
Don’t get me wrong; this is a pretty rack. Way too nice for my server room, in fact. Fortunately we had spent some time cleaning earlier, so we were able to actually roll the bugger into the room. A quick peek inside revealed that it was all wired up, probably by one of those guys who lines his cereal boxes up alphabetically. You know - the kind that puts wire ties every six inches, whether they’re needed or not.
We left it there and went back to our sauna, trying to figure out what we were going to do (besides sweat). I thought maybe we could bring a rack into the office. We could fill it up with old switches and make sure the lights blink rapidly. We can give it an esoteric name and all the executives will think it’s sheer genius, not to mention impressive as hell.
Still no word back from Dell on the servers. And just for fun, I heard that the entire project got pushed back one day because someone or other is off Monday. Now Monday is not a surprise - it has been on the books for over a month. Why all of the sudden does someone remember they’re not going to be working on that day?
–> My wife has noted that, among my many little flaws, I have what she refers to as Unreasonable Expectations. For instance, when I go to a restaurant and order a meal, it invariably shows up differently than I ordered it (and frequently late). I get agitated and my wife reminds me that this is one of my Unreasonable Expectations. So expecting the Grand Project to start on the day we agreed to have it start must be another one of my Unreasonable Expectations.
The day went rapidly downhill after this (see the blog).
Wednesday - T minus six days (again?)
After the 85 degree server room debacle, I figured it might calm down ever so slightly. And it did, for approximately six minutes. You just have to know that bringing up all the servers would not proceed smoothly. And sure enough, our problem child for no particular reason, Microsoft Exchange, took a dump. For no particular reason. You can spend hours combing the logs for the actual reason the server stopped serving but your time would be spent much more effectively making up entertaining stories as to why it failed. One of my favorite MS lines ever is found after the server spontaneously reboots: Server restart was unexpected. No, really? If you’re terminally bored that day, you can look that one up on Technet. Or you could just go to the dentist for some elective dental surgery without anesthetic.
Promptly at 9:04 , I sent an email to a friend, asking about lunch. That was the last time I saw my desk.
Also coming in promptly at 9:04 was SanMan, to set up the SAN (the one in the humongous rack we didn’t order). SanMan was an extremely nice and genial fellow who seemed to know what he was doing. This in itself is a little off-putting.
Our first task was power. We were quite satisfied we had it in good quantity. SanMan showed me his plug… wait a minute… the rack’s plug, at which point I dejectedly asked if he had cables, as his 6″ wire was going to need augmenting by approximately ten feet if he hoped to insert it into a compatible socket hooked to Serious Juice. Fortunately they had thought of this and sent a box full of Mega Cables.
Back in my audio days, I would’ve called the connectors twist-locks. In fact I still call them twist-locks because they are twist-locks. When we first rebuilt the network room I had to become familiar with the plug and jack designations so I could order UPSes and ask the nice building folks to install the proper sockets in the wall. Naturally I can’t remember a durn thing, other than we needed something like an L6-30, which translates roughly to a big friggin jack that supplies 208 volts at up to 30 amps.
SanMan threaded his Mega Cable through the back of the racks to me and I got in the traditional network admin position (on my knees) in a vain attempt to plug the cable into the UPS. You know, of course, that it couldn’t simply plug in. No sir, it couldn’t simply plug in. It sure looked like the matching plug, even with my glasses off (the optometrist noted that I can’t see far, so he gave me glasses to help me see near. They don’t work.). SanMan came over and verified that yes, this was the absolutely incorrect jack for the plug. It was probably an L5-20. Either that or an Illudium Q-32 Explosive Space Modulator.
So, aside from the fact that we had no dual thirty amp redundant power for our new rack, we were ready to roll. I’ll just tell the CIO that we managed to purchase a green SAN. It’s so green it doesn’t use any power.
My left-hand man set immediately to work, locating the Big Brown Box of Big Black Adapters for Big Black Cables. In what seemed like an instant, but in reality was only forty-five minutes or so, they had sketched out a plan where by the entire rack would be redundantly powered without drilling any additional holes in my head or calling the electrician.
The electrician, mind you, showed up rather quickly anyway. I suspect it’s because he was literally dragged in by the CIO, who seemed to be rather anxious to have some additional power for his SAN. I only say this because the electrician still had his jammies on. I hereforth and hitherto refuse to speculate on how the CIO found him.
In a statement that could potentially get him barred from whatever group electricians gather in, he stated that he would do anything we wanted, not simply tell us what he could do. We will never again know his like as long as we shall live.
Not to be upstaged, Corporate immediately torpedoed this idea. They didn’t want this guy being used to do the job. Couldn’t we wait a month or six to use their guy. I can only assume that by this, their guy would do it cheaper because he dug ditches during the day but recently found a book in the trash on electrical wiring and swore up and down that he could do it.
It’s like going to the hospital and finding out the lady who just tried to take your blood works mainly emptying trash cans. But that’s the HMOs for you.
SanMan and Left Hand Man worked diligently for the next few hours and by quitting time, we had a fully powered SAN. Apparently SanMan Part Two comes back in a few days to configure the SAN. Why we need two of them I’ll never know.
Where was I during all of this? After diligently attempting to put the Q-32 Explosive Space Modulator into the L5-20, I got hijacked to help find out why a major piece of software wouldn’t work. Mind you, I don’t use the software, I didn’t install it, I don’t know how it works or what it does, but apparently it’s my problem. And to think, I was so confounded happy when they purchased it because they dedicated an entire team to the installation, configuration, and running of the software that didn’t include any of my team.
I am an idiot.
Meanwhile, no answer from Dell on returning server parts. As it had been two days, I figure I’d have to break down and actually call my rep. I don’t like phones, in rather the same sort of way the KKK won’t be voting for Obama this November, so picking up the receiver is a major issue for me.
Once connected to Helpful Dell Rep, she guessed immediately the purpose of the call and promised an answer shortly. Did you know that Dell has seven digit phone extensions? And I’m speaking of Dell Austin, as opposed to Dell India.
Faster than a speeding invoice I received the response from Helpful Dell Rep via email. I hope the poor dear doesn’t feel the same way about phones that I do, or her job is hell on earth. We could return the HBAs and substitute multiport gig nic’s. Unfortunately the 10g internal nic’s weren’t on the VMware hardware compatibility list, so those had to go also. Also unfortunately I spotted a line in the product description that stated `usually ships 1-2 weeks.’ I emailed back, stating that if by `usually ships 1-2 weeks’ they meant `could ship today or tomorrow,’ I’d take four quad-port gig nic’s.
Five minutes later I had a new quote for eight TEO dual-port gig nic’s that would ship today or tomorrow, with next-day shipping. I was repeatedly reminded by Left Hand Man that we needed TEO nic’s although I couldn’t make the definition stick. Apparently it’s TCP Engine Offload (or toe-jam, I forget). They’re ordered.
So that’s it for virtualization today.
Heaven help us.
Friday - T minus four days:
The new NICs arrived, as if they fell from the sky almost as soon as they were ordered. No time to install though, as we’re busy putting out other fires. The server room is so hot, it’s likely to catch fire. There’s some red dude in there with a pitchfork, urging me to leave everything on - it’ll be just fine. Odd, he looks a little like me too.
The Config Guy from San, Inc Webex-ed into the SAN and started setting up the LUNS. At about this point, we started setting up the lunch. He kept calling my coworker, asking questions none of us could answer. Questions that, it seemed, should have been ironed out way in advance, not involving us at all. If we’re training next week, we’re not likely to know the answers to advanced questions about setting up LUNS.
Boss and coworker are both a little put off because they believe we were supposed to get three racks of drives instead of two. I wasn’t paying attention in class that day so I don’t remember. I was using this opportunity to get the teacher to spank me.
By this point we’re all pretty miffed at the people who should have had this taken care of already, so Boss does the only thing he can do - set up another PHONE CONFERENCE for shortly thereafter. Our departmental motto is NO MEETINGS, so this is getting rather irksome.
Friday is our Day of Celebration, where we let off a little steam and get pizza for lunch. The pizza is most of the way to my mouth when Boss informs us the PHONE CONFERENCE is starting now. I’m sure you’ve been in this spot; on the phone and very hungry. You’re quite concerned that chewing anything is going to make you look (sound?) unprofessional.
So I took the only course available: I adjusted my handset so the maximum amount of pizza-chewing audio goodness would leak into the receiver, just for the benefit of the six other people involved. A man’s gotta do what a man’s gotta do, you know.
After the PHONE CONFERENCE, where everyone swore everything would be ok Tuesday, it was time to hastily push lunch down the gullets and go back to the server room, where it was cooling down a bit and there were electricians to argue with.
The SAN rack has quad power feeds, each with one of those mondo 30A twist-locks. Needless to say there wasn’t a single 30A socket to plug one into. The electricians were kind enough to rush in and provide enough electricity to supply every chair in Congress (after they’re convicted, of course).
I run back to my desk to start the BTU calculations for the new air conditioner. Coworker starts investigating new UPSes with 30A sockets. To everyone’s alarm, they’re coming in right around six grand apiece. I figure this might be a good time to replace the entire shebang with one huge freight train of a UPS so I set vendors onto the project. [make sure you get the electrical and cooling requirements of any new equipment - duh]
Monday - T minus one day and counting…
There is absolutely no virtualization news today, other than on of the consultants forgot about a vacation day and a coworker forgot about a weekday wedding he’s in.
Aside from that, Mrs. Lincoln, how was the play?
Tomorrow’s the big day.
Tuesday - Liftoff!
I was almost surprised to not be accosted by anyone with any complaints about something or other being down (with the possible exception of one of my own) on my way in. I was stupidly early again.
I sent a reminder to the company that we were with consultants all week and restated our contact info. I do this as a matter of form, as all my emails seem to have the Ignore Flag present. Either that or I’ve gone back to typing in Swahili again. The team was in place to deal with the inevitable emergencies as well as the minor tragedies masquerading as inevitable emergencies, driven by high-volume whining.
Everyone showed up on time today, which should have served as a warning flare to all of us. I was very happy to have the consultants on-site because we would finally know what we were going to be doing, as opposed to my boss’ view of things, where we sat around a table and decided which of our servers we’d virtualize first by committee. I figured what the hell… why not let the actual professionals tell us how it was going to proceed (wild thought, I know).
Almost as if on cue, the consultants started discovering things that `should have’ been done already. I’m keeping a mental tally of this and keeping my mouth shut at the same time (difficult if you know me). I’m noticing that most implementations of this scale are frought with bulldookey: they start late, end late, run late, are missing various parts, and invariably don’t do what they’re supposed to. It’s probably not limited to this scale either. Nothing seems to work the way it should.
Nothing serious was missing and no one let it get to them, at least at that point. Never mind that the SAN wasn’t fully configured and we weren’t entirely sure if SanMan would be Webex-ing in today to put the finishing touches on the install.
The entire process is referred to as a Jump Start. The concept here is that we learn by doing. The teacher fella would be doing no hands-on; we would. And this guy was good. He had a calm demeanor and knew his stuff, which is really reassuring these days. Even when Weird Stuff started happening, he kept his cool and got through things. I was really amused because of his quizzical expression when things got Odd and told him he hadn’t seen anything yet.
The immediate task would be to install ESX Server on the four new servers. This would have to be preceded by downloading ESX Server (downloading?) and license files. VMware’s site is the absolute pits. To download software that I’ve already paid for, I had to register, send a DNA sample, and navigate the entire site looking for the correct download page. They wanted blood but I told them they’d have to settle for a urine sample instead. A man has to put his foot down sometimes, you know.
We wanted this over with ASAP so we could adjourn to our Temporary Administrative Headquarters and remote install the rest. One good reason for this is that the server room closely resembles a cross between a wind tunnel and an airport, due to three cooling devices and three industrial fans.
As we’re powering up servers, my boss comes in, wanting to know what happened. What happened? Nothing - we were installing ESX Server. He was not too impressed by this as the network was apparently down.
`Down’, as we all know, is a relative term. It can mean anything from needing a computer reboot to total power failure. One thing is certain though: it will never be what the person who reports it thinks it is. I started running diagnostics in the absence of a Spock or Scotty to order around. The thing I immediately discovered was that there was nothing to discover… everything was responding as it should.
Another trip around the block proved that whatever had happened decided to leave us. Everything was up. Everything, that is, except the Extremely Expensive Software, which prompted lots of harumphing and rebooting of servers in different, seemingly random orders to attempt to convince the Extremely Expensive Software to pleeeeeease start up again, ok? Maybe if we paid more for it, it would come up better.
With my boss taking on the role of MVP, we returned to our installing. Not for long, of course, but we certainly tried. It was only an hour or so until the Big Boss popped into the Wind Tunnel to see how things were going with this Somewhat Expensive Project. And by the way, do we know why the network went down yet? Two departments have gone apoplectic and one has developed hives as a result of the network having a thirty second blip today. “They can’t get their work done” was the repeated refrain.
Maybe it’s being married, maybe it’s the mortgage, or maybe I’m just getting old, but it’s becoming just a little bit easier to filter my responses. “As if they’d be working if there was no problem,” was the immediate thought. We gave one person a freshly-imaged pc and within four hours, she had it completely stuffed full of malware. She, of course, had no idea how it got there - she only surfs work-related sites (regardless of what that lying browser history said about her).
Big Boss indicates that we should touch base a little later to discuss this. At this point I’m remembering the email I sent, two hours previous, pleading with people to LEAVE US ALONE so we could learn something. I am not about to provide the Big Boss with a reminder about this, so I said yes. Later on it occurred to me all I had to do was remind them what the consultants were costing us per-hour and we’d probably never hear from them again. For years.
We got ESX Server installed on four servers in barely the amount of time it would take to have them ordered and delivered from Austin, Texas. Proud as hell, we had no choice but to break for breakfast (or as they called it, lunch).
I actually made it back to my desk for a bit and only had to juggle chainsaws for ten minutes. It could’ve been much worse but for the help of the crew.
After the break we made ourselves comfortable in the Temporary Admin Headquarters. We started setting up networking, at which point we discover that we’re short a switch. And since each server will require six connections, we should make it two switches. Oh yeah, there’s the iScsi side, so make it four switches, gig please.
Nobody was able to provide me with even a half-decent reason why we didn’t know we needed four additional switches. This was probably related to the blank stares we provided to the consultants when they asked us questions about how we intended to do certain things. Coworker told them to think about us as two guys who have only seen VMplayer and never ESX Server.
While this drama is playing out for the assembled masses, SanMan phones: he’s ready to finish the configuration. Coworker takes off to facilitate this. I start to make myself useless by entering the new servers into DNS. Due to lack of time, we were stuck with Moe, Larry, Curly, and Shemp. Well, lack of time plus Coworker’s inability to spell the adult consensual act I had in mind for one of the names (we all have our crosses to bear).
For some reason, we can only ping one of the new servers. We said screw it (technical term) and made host files, solving the problem temporarily. Coworker comes back because SanMan wants to talk to someone who can answer any question on Sans.
We get back to setting up networking and discover more problems with the setup. Consultant Guy had to admit that these were pretty good. He would have to spend the evening figuring out what this was about.
So it looked as if Day Two might be a complete rehash of Day One, hopefully without the input of bosses. We’ll just have to wait and see (and order four gig switches for next-day delivery).
It’s not like we could continue anyway. When we closed the door, the temperature shot up twenty degrees. When we opened it, we had to listen to the most annoying voice in the company, possibly the northern hemisphere. We opted for the voice, which quickly became voices. Somebody was leaving and this was the only possible moment to have a going-away party.
The din got to the point that the servers were complaining, yet the revelers would not be dissuaded. I was most impressed by the way they completely ignored a room full of techs attempting to learn something. That probably would’ve brought them down and we wouldn’t have wanted that, would we? Learning is for nerds anyway.
Wednesday - virtually day two
I suspect if I can figure out how to get myself back to sleeping until the alarm goes off I will avoid a lot of difficulty. Only thirty minutes early still proved to be about thirty minutes too early. I was encouraged not to find the server room door already open. I was discouraged to hear a chorus of people calling my name as I went by. They waved their arms wildly over their heads and I almost felt like a Beatle getting off a plane on their first American tour. Fortunately there was no fainting (I always hit my head on the way down anyway).
I did my best nonchalant wave back, followed by my Miss America wave, and finished with the semi-wave of the pope. All I was missing was the beanie.
So what was it this time? It was Very Expensive Software, of course. It had decided to stop working. We have discovered that if you reboot Very Expensive Software’s three servers in random orders, several times, you can get things back up eventually. So I set about shutting down the servers. Unfortunately the specific sequence was only available on a need-to-know basis and I apparently did not have a need-to-know.
This was rather unfortunate as I was the only person in the building at that moment who could reboot any of the servers. After randomly choosing which server to reboot in which order, I did the necessary prep work (prayer, candles, standing on one leg while rebooting because it was Wednesday, and the usual chicken noises).
For some reason we cannot understand, Very Expensive Software came back up and everybody was happy (temporarily).
Temporarily satisfied with this seeming minor triumph, I went off to my desk to see what other disasters awaited my arrival. Finding none, we all reconvened and got back to the serious business of virtualization.
Consultant had done his homework overnight and found out that connections that show up as 100M instead of 1G need to be fixed at the switch itself. Coworker set about setting the affected ports to stop the bloody auto-negotiating. Shortly thereafter he had given up because the laptop on which he was operating had lost its connection to the network.
At this point, what little color I have in my skin drained completely and my cel phone went off. The ringing is usually followed in short order by cursing, talking angrily at the phone, and occasionally what we like to call Cell Phone Aviation. One look at who was calling told me all I needed to know. Yes, another coworker had called just to let me know that the entire network was down (in case I was interested).
We were talking about going to the network room to document our switch connections and I jokingly told them that this was not the place to be right now. I shot out of the room like my two hundred coworkers do when somebody announces free food in the lunchroom, only I did not run over (or maim) a single one of them.
While running, I am greeted by the warm smile of the CEO and her pit bull, who had her Meeting Pad held at the ready and a serious case of Meeting Face. I smiled, made an only semi-sarcastic statement about this being like meeting death and being dragged back just to watch Americans Idle on tv. They started asking me questions but I was already in the server room, which maliciously and dedicatedly kept up its tendency to soun like an airport.
A few words with the other folks from my department left me with a certain suspicion and a bit of detective work pretty much confirmed it: a switch rebooted, momentarily interrupting the network but everything was back and fine.
Well, everything except Very Expensive Software. I spent the next thirty minutes or so attempting to divine the Correct Startup Order for the second time that morning. The thing is that the order never seems to stay the same between failures. With a current mean time between failure of two hours, it’s starting to annoy me a bit.
In the meantime I received an invitation via email. I hate invitations. I can’t stand parties and social events - I’m a geek, dammit. The thing I can’t stand the most is invitations to meetings, of which this was one. It was for 8-9:30am the following morning. Considering that virtualization training begins at 9, this would seen to be a conflict. It will be an interesting conflict to resolve, as the meeting is with my boss, CEO, the aforementioned Pit Bull, and no doubt some functionary whose sole job is to document the time and scheduling of the followup meeting.
Virtualization is not a cheap prospect. But if I don’t get trained, we don’t virtualize. I don’t ask to be left alone often (aside from daily) but I really need to figure this stuff out.
The rest of the day went swimmingly. We finally got the SAN speaking to all the networking (six per server) at the correct speeds. Everything was working perfectly (which usually only means disaster).
I found that my time with VMplayer came in quite handy. Because of my familiarity, I could relate what Virtual Center was doing in terms of VMplayer. Finally, something I use often actually comes in handy. Was it cold in hell today for some reason?
We made ISOs of various servers (easier than using the cd drives because we were halfway across the building from them) and transferred them to our little farm. Isn’t that bucolic sounding? Our little farm, where we’re raising virtual machines. We went over how to build templates and went about doing it. It was all pretty familiar from making my own machines in VMplayer but cool nonetheless. Our old friend VMtools was brought up and installed and we created our first server in about ten minutes. I’m liking this more already.
We took some time to discuss the process and hardware emulation. It occurred to me that some of our servers have 100M NICs. They’re going to be replaced with virtual 1G NICs, resulting in a performance gain. I like this more and more as I learn more and more.
We did our own demo of VMotion. We took a template of Win2003 Server, created an actual server, then migrated it from one ESX server to another. The server stayed up all through the near ten second move. During a prior demo a tech set up a persistent ping to the vm while he was moving it. It lost exactly one ping and that was it.
After some more discussion we adjourned for the day. We very quietly snuck past the main floor, lest we be called in for a group meeting somewhere.
Thursday - 3 Days of Virtualization
I only came in a little early today. No server room open. Nobody waving at me like I am a rock star. Not even one person happy to see me! I attempted to climb the email mountain , slipped and fell back down to the bottom, at which point it was time to go. To the meeting.
I shall detail this on a separate part of the blog when I can type again without trying to grind every single key to a blackish powder. Suffice it to say that I love meetings. Perhaps if I get a doctor’s note I can somehow be excused from them. Besides, someone has to work while everybody’s meeting.
Forty-five minutes late for the training I appeared. Coworker had to sit through a runthrough of everything he had just run through. They felt my pain.
It was an intimate little get-together; just the trainer and us. Boy did we get into it full bore. We built servers, we moved servers, we figured out what had to be put on the ESX servers and on the Virtual Center. We went further into VMotion, which moves the VMs. You can set this to be automatic, semi-automatic, or manual.
I drive a big car with an automatic transmission. Since real men are secure enough to have their gears shifted for them*, why shouldn’t this apply to VMotion? Consultant agreed - he advises his clients to do this almost without exception (naturally we had to kn0w what the exceptions were). [* a fact that was hotly debated on my local linux users group reflector for inside of a month]
I can’t imagine why but I keep thinking of Airplane. It took a few instances before I figured out why… Bosses kept coming in to ask how everyone was doing. Just like Leslie Nielson kept popping into the cockpit…. “I just wanted you to know that we’re all behind you…”
We went over the massive amount of redundancy and how to plan things out with this in mind. Six NICs on each pc. Two separate switches for the iScsi communication only. VLANs! Eventually 10G ports!! Three unreleased recordings of Crosby, Stills, Nash, and Young fighting in the dressing room at the Fillmore East! [1971 Mothers Live at the Fillmore - Zappa]
Then we cloned [respectful silence]. If the president ever found out that VMware clients were cloning, we’d have the FDA breaking down doors, carrying automatic weapons, and confiscating entire networks.
The cloning feature is pretty much what you’d expect. The only downside is that the original machine has to be shut down first, making it no good for production machines. I asked about this in terms of moving over our physical servers. I was told to shut the *$&# up and wait til we covered that piece later. It’s not as bad as coworker, though… he kept getting his wrists slapped for mousing around and clicking things incessantly, like Bart Simpson on speed :)
We also covered updating (both servers and machines). Again, a task best left to people who have automatic transmissions. Let the OS or a utility do it for you.
At this point, Bosses pop around again. I’m watching Consultant, who told me they have them everywhere: people who don’t have a clue popping in and asking irrelevant questions. In Dilbert-speak: Pointy-Haired Bosses. Consultant is solid and silent. Coworker is silent (and this fellow would dispute you if you said `good morning’).
Pointy-Hair said CEO wants to know when we’ll have it all done. Consultant states that this is a Jump Start and as such, teaches the client how everything works and gets them started virtualizing. The idea isn’t to have a full production environment by the end of three to five days.
Pointy-Hair says, “Oh,” and asks how many then.
Consultant, who I’m watching carefully for my own entertainment, develops a minute eyebrow twitch. He agrees to do one before he leaves. This was a source of great joy for Pointy-Hair, who immediately convenes a meeting about which server we think we should do first, then a list of which others. We were quite shocked, as most important IT decisions in our company are made at meetings that have other departments asked for their input. And we hear about them afterwards.
Coworker, meanwhile, is still being quiet. It’s like waiting for your wife to explode. His jaw is clenched so tightly that he’s scratching his forehead with his tongue. But at least he’s not saying anything. Consultant’s eyes are rolling like a slot machine. Guests present stare in shock and horror.
Back at the class, Consultant and Coworker are shaking their heads and wearing those baleful looks that generally follow a Pointy Encounter. I got to see my section, which is Monitoring.
And then it hits me: we’re accessing Virtual Center with a Windows laptop. I asked Consultant where the linux client was. He said there isn’t one.
WHAT? The product is linux. Whaddaya mean there’s no linux client?
Consultant advises me to take it up with VMware. I compliment him on his deflection skills.
We continue, still shaking with confusion and sadness. Next week we’re going to go over specifics of planning for moving machines, move some practice machines, learn about backup, and no doubt get invited to more meetings.
Today - virtually finished…
Today was out last day with our beloved VMware consultant. I have to say that this guy really did right by us. He even did some research on some of the things we wanted to do and called some of his coworkers on our behalf. This is the kind of treatment that gets a return engagement.
I was greeted by my systems monitor telling me that the mail server’s SMTP service was down. This is never a good sign, especially at 8:15am. This is also quite interesting, in that the server didn’t seem to be aware that its SMTP service was down. It was SMTPing just fine. Mail was flowing both ways for damn near another five minute, at which point it simply stopped. A reboot put it back in action with absolutely no indication of why it went down.
We started out our Great Learning Experience frantically looking for a room. Last week’s room had been booked for this week. Apparently we had so many meetings going on, there were no rooms for us, at least according to Support. I had to go around to every room to see if there was anyone scheduled to use it. SURPRISE! Support had neglected to put up the new calendars, so there was no way I could check, even though this is what they requested I do.
Sometimes one wonders why they’re called Support. It doesn’t seem that they do a lot of actual supporting. I dare not mention this, lest I be accused of insensitivity. One simply cannot go around asking for support from Support; it just isn’t done. After all, they’re very busy and can’t possibly do that for you because they don’t have the time today. Last week’s excuse as to why we didn’t get our delivery was because she’s a girl and couldn’t lift those boxes.
Let me attempt to follow the logic here…. Support delivers the boxes and always has. Support can no longer deliver the boxes if they’re female. The last male on the staff just left. We’re screwed.
In the strange universe from whence I came, people are hired for a job after fooling a potential employer in an interview. If one of the requirements is moving boxes, it would seem that the hiring requirements are not being met. As for this lady’s ability to lift boxes, well, she outweighs me and has bigger arms. Methinks something’s afoot. I wonder if I should insist they bring their computers to us before we fix them.
We finally located a room, only having wasted twenty minutes or so. It was about eighty-five in the room, but it was a room nonetheless: we got a fan. We started going over Virtual Backup, which is a great product.
I would be able to tell you lots more about Virtual Backup if a trusted coworker hadn’t appeared and sheepishly asked for help because the mail server was refusing to serve mail.
On the way to my desk I realized that this had to be personal. Of all the days and times to flip out, the mail server chose this one particular day… my last day of training. It was like a greek chorus of voices saying, “HA - if we can’t prevent you learning something important with meetings, we’ll get you with server outages.”
Everybody was pitching in to try and determine what was going on. The server had been rebooted again but it wasn’t doing anything with the mail. Other than that, it was working perfectly. We suspected Kaspersky for Exchange and called their support (as it would be an exercise in futility to call our Support). They claimed that their program didn’t start because Exchange didn’t start. If I wanted that sort of answer, I would have called Microsoft, thank you. I went back to learning but the crew solved the mystery by uninstalling Kaspersky. It stayed up for the remainder of the day. It also became candidate number ONE for virtualization. More accurately, we’re building a new Exchange server in a virtual machine and migrating the data over. If we do a P2V (physical to virtual), we’re terrified at what we might be importing.
There are a number of ways to virtualize. There’s the aforementioned P2V. There’s a cold boot cd that you can use to do it. There’s the migration route above. You can even restore (import) a VM image to a new VM, but you have to tweak the hardware settings after.
As for backing up, you can use the Virtual Backup, you can use an internal agent for an existing system, or you can clone. The decision, like most others with VMware, is made by looking at the pros and cons. Some routes require the VM to be powered off. Some don’t care. We will more than likely opt for mostly Virtual Backup because it’s integrated and can run on a working VM. You have to be careful with existing agents because if they fire at once, you’re going to choke communications. Lotsa overhead.
You come to realize another benefit when going over backup: backup is actually your disaster recovery plan also. If you take an image of a working server/VM and store it offsite, all you have to do is import it to a new VM in the event of a disaster. Instant disaster recovery. Don’t forget to make backups of your ESX servers too, as well as a copy of all of the install media and license files.
We made a decision that with four new servers, we would set things up so everything would work on two in an emergency. We went over how to set this up and how to watch resources so we could add hardware if necessary.
The key word in designing a system like this is redundancy. Massive redundancy. No single points of failure anywhere. Currently our only single point of failure, barring the building exploding, is the SAN. We’re working on this now for an offsite project. You can run mirrored SANs if you have the resources, with one offsite for backup/disaster recovery.
Things get interesting with eliminating single points of failure. Each server has six NICs and everything goes to the next point via two different connections. Two switches. Four if you want to physically separate iScsi traffic. Two separate UPSed power lines. Perhaps a generator if the system is mission-critical. Since our entire block goes out every time it rains, a generator would be nice. People have an annoying habit of driving into electrical poles and transformers by work. One day we were down for four hours after a contractor `helped’ the electric company by hooking something up incorrectly and taking out a transformer.
If an ESX server goes down, it will move the VMs to other servers automatically. You can also kick off the process by putting the server in maintenance mode. All of this can be accomplished via the Virtual Control Center. You can also migrate the machines around manually, if that’s your kind of fun.
Consultant, egged on by coworker, told one of the servers to shut down. We watched it go down, waited a short bit, then watched the only running VM migrate to a different server. Consultant said that was our test of the High Availability sytem. I told Consultant that this was politically correct-speak for forgetting to put the machine into maintenance mode. Years as a business consultant have no doubt honed his BS skills but nothing could have prepared the guy for us. It took us three days, but we finally got him routinely dropping the F-bomb.
Consultant’s colleague suggested we use Exchange 2008. Coworker kept looking at me out of the side of his eye, waiting for me to explode but I wouldn’t give him the satisfaction. The day an untried, untested MS product goes on one of my servers is the day after income taxes have been repealed. Coworker was most disappointed that I wasn’t going to fight him on eventually adoping it (AFTER it got tested).
There are no doubt many topics that I forgot about, as I write this stuff hours later at home. If you have any questions or comments, please write. If I can get you an answer, I will.
If you have people in to teach or assist you, do it away from your desk, preferrably in a place where no one would think to look for you. Make sure there’s sufficient coverage to handle routine matters, as if you weren’t physically in the building. Turn off cell phones and Do Not Disturb the room phone. Have the door welded shut. Turn off pages.
After having finished our jump start, we realized that this is only the first step. Now we have to sit down and figure out the best method to virtualize each server and come up with a very rough timetable. We have to do this early because if the boss gets hold of us, he will insist on a series of meetings, with a virtual committee deciding which servers we’ll virtualize in what order. If his cohorts get involved, we’ll need to bring in representatives from every department to have a say in which servers get migrated when because they need to be included and feel good about the project.
If I think about this a little further I could potentially make this whole thing work for me for once. We get the boss involved and he calls two friends, who each call two friends. The process of getting the committee itself together will take an intensive two month series of meetings (but only if they rush things). We can use that time to go on a well-deserved vacation. When we all get back, they’ll still be working on new names for each VM. We can use the rest of the time to do the other things that have piled up in our absence.
I’m starting to like this idea after all….
more as we figure it out...
A month or so down the road…
Because of the misunderstanding/miscommunication about the type of class we were taking, we’re still ironing things out. We had to correct a lot of misunderstandings and push expectations in several directions.
We got about half of the servers virtualized without too much fanfare. The CIO wanted to make a big deal of virtualization, which I don’t believe was a good idea. I guess he felt he needed to sell the idea. This didn’t work out too well, as I’ll outline.
BULLET POINTS (don’t I sound corporate now?)
- There is such a thing as Too Much Information. No matter how much it hurts, throttle the impulse to explain the benefits of virtualization to anyone other than the check-signers. Involving the rank and file is not a smart idea. Robert A. Heinlein had a proverb about teaching a pig to sing that said `you just wind up annoying the pig.’ Would you really want to explain the benefits of 15k rpm SAN hard drive upgrades to the people from Benefits?
- Think carefully about the entire virtualization process on a per-server basis. Go over each server and look for weird, old, one-off applications that will have to be migrated (if you’re not doing P2V). Call the clueful person from the department that uses the server. Find out interesting info like when is a good time to migrate or switch over to the virtualized server. Depending on the amount of data, it can take quite a while to P2V, move data, or restore to the new server. Can this be done during the day or do you have to do a night or weekend?
- Think about your OS. We had a mix of W2k and W2003 servers. We decided to go pure W2003, resulting in having to build a lot of servers from template, freshly install important apps, and find a way to migrate data. This is an incredible PITA (technical term), especially if your company uses software that’s older than you are and no one is left alive who can operate it.
- Don’t virtualize it… be safe and keep at least one server outside the virtual environment. I like the idea of keeping the PDC on external hardware, as well as the VCC. If there’s any problem in your virtual environment, you’re really sunk if you don’t have anything running externally.
- Make everything at least doubly redundant. If you want to keep things up reliably, you need to eliminate single points of failure. Dual power supplies fed by dual circuits on dual UPSes. Dual paths with dual switches. Think carefully about how to make things in the physical network redundant too. Your external switches should be redundant, as well as their power and feeds. You might want to look into a generator too.
- Overestimate everything. Time, cost, help. Then double it. Let’s face it - a simple plug-in RAID drive replacement that should take five minutes will take five hours. You’ll find all sorts of stuff you didn’t know was broken and don’t know how to fix. When you get the virtual server up, you’ll discover a little-noticed but very important piece of software that you missed or never knew about in the first place.
- Good luck with transferring huge amounts of data, like mail or file servers. You’re going to have to figure out a way to stop time so you don’t lose mail, files, or incremental data. Make this a part of your meeting(s) with other departments. If you state it up front, it will be less painful later on. Others will tell you it’s easier to beg forgiveness than ask for permission.
- Dedicate resources to the project. If you’re in the midst of virtualizing an important server and someone asks you to teach them how to use the scanner, you’re going to lose time (and probably get really aggravated). Trust me on this one: do whatever needs to be done to get yourself assigned to this project alone… nothing else.
- This is a personal thing: Crosstrain. You can never have too many people who know what they’re doing. Any mildly intelligent techie would kill to learn some of this stuff. Withholding knowledge isn’t power. It also looks good on the resume.
- Keeping in mind the paragraph above (Too Much Information), you will have to do some small amount of explaining to people. Completely fail to use the word `virtual.’ Conspicuously endeavor never to utter that word. Don’t say it to any of your coworkers. Never say it to anyone you’re calling for support. They’ll use it as an excuse not to support you. If you have everyone think of virtual servers as physical servers, there’s no re-thinking to be done. Nothing to teach. No difference to them at all.
- If you haven’t played with it, download the free VMplayer or Server and get used to it. This will absolutely help you in your understanding of virtualization on a large scale. I have been running VM’s for quite a while and it made all the difference for me.
- If it ain’t broke, don’t fix it. A teammate went insane over a single file’s fragmentation. I started trembling, knowing that the smallest attempt to `fix’ it would result in blowing up the entire VM.
- I have to give credit where it’s due: Microsoft and others have tools that will help. MS has a utility to transfer all printers and settings from one server to another. This is especially helpful when migrating a print server or terminal server.
In spite of all the dire warnings, we’re finding things to be a lot less scary (and easier) than we thought. Aside from a few oddities, that which gets virtualized works just fine the first time, regardless of how you get there. We haven’t had an instance of wanting to take a server back to physical (yeah, I know, it’s still early….). I’ll get back to you after the Exchange migration :)

Recent Comments