Recently in my workplace we made some pretty serious gear shifts with regard to storage. Being vendor agnostic as I always strive to be I will say that our go-to fabric vendor changed as well as our go-to disk vendor. Since the change things work, but the more I dig into what’s going on under the hood, the more I feel like I’m living in a house of cards.
The change came as the result of 2 companies merging and coalescing on one hardware platform for all systems. Going in with the blinders on I was absolutely indifferent to the fabric change since I’ve worked with the new vendor’s gear in the past and took a liking to it, and as for the storage vendor they had a clean slate to start with so I had no reservations there either. In erecting our new gear some months ago we worked with staff from the company we merged with so as to build like for like environments to ease the merging environments down the road. One thing that made me quite suspect of either the gear we were installing or the practices of the personnel in the other organization was their suggestion to modify b2b credits on the storage side switch ports. Scratching my head I said that I prefer to leave port settings default and only make adjustments AFTER we’ve seen a symptom arise that suggests a need to deviate from default.
Fast forward about 3 quarters and we have 3 new arrays from ACME Storage, 2 directors and 6 pizza box FC switches from ACME SAN. Everything *works* but now that I’ve had time to breath I’m finding things in our environment I don’t like. Let’s compare counters from the busiest ISL links (primaries for a MetroCluster FCVI), before and after. Note that the before counters were last reset about 18 months prior to the dismantling of the ISL while the counters for the new ISL port are reset daily by a script run by our peers:
Frames Transmitted B2B Credit Zero Errors B2BC0 Percentage
Old 2657260246611 2377700058 .089%
New 2851542621 1306395534 45.8%
Both of these ISLs were configured with the staff from the switch manufacturer onsite and configured per best practices published by NetApp. As a matter of fact on the new configuration the B2B credit allocation was padded above and beyond the 150% padding that NetApp recommends. This is the ugliest counter to look at. Other ports have seen similar increases in errors, and seemingly for no good reason. Our production EMR has 8 host ports allocated for both the active and standby nodes, and the standby node is truly standby. Even still I’m seeing many b2b credit zero errors every second on those host ports, not the storage ports.
Mostly this is a pointless diatribe about changes I’m seeing. I’m truly concerned that I’m going to get to start micro-managing my FC ports in order to maintain performance and keep error counters low. If I reach that point I will no doubt be writing another post titled “Don’t buy this vendors junk unless you like being in the business of keeping the lights on.” Watch for that one.
If you know me well you know that I couldn’t care less about what badge your gear wears, as long as it works. On the motocross track I started on red then ended up carrying both yellow and blue in my trailer. I couldn’t tell you which color was my favorite, the all have their own special place in my heart and they all did their job well. The reason I never rode green, I didn’t like the folks who sold it. That fact will play into the remainder of my diatribe.
Paul (my direct peer at work) and myself are trying to make use of the fancy new machine that was set on our floor. HDS came in and configured up our VSP. The tech promised a day to set it up, it ended up being 3.5 days to do 2. I don’t blame the tech that made the promise as he isn’t the one who ended up doing the work; not his fault. That said, if it takes HDS 2 days to set up a paultry 30T (don’t ask how much paultry 30Ts cost!!!) usable array they’re never going to make it in datacenters the likes of a big organization like Microsoft, Google or Amazon. You’d never guess this based upon how much they love boasting how amazing their gear is. Mind you this is our first Hitachi box. We had to listen to endless beating of their own drum Hitachi staff did about how happy we’re going to be getting off IBM and on Hitachi.
Finally the tech is done with the initial setup and it’s ready for us to login to the box and start creating pools, carving LUNs and presenting disks. Probably the most exciting thing for a storage admin, logging onto a fresh array and digging in.
Direct quote from the Hitachi Storage Navigator User Guide:
To log in to Storage Navigator as the super user:
Call your local service representative to obtain the superuser ID and default password.
Yup, here we are. We have an E-mail string going back and forth with pre-sales and the guys who spun the wrenches last week doing the initial config. Meanwhile, Paul and myself are seeing our deadline to have this array online creep closer and closer and the brick is sitting in the datacenter heating our cold air and being very green by wasting 100% of the electricity it’s consuming.
Thus far, I’m not leaping out of my chair with happiness because of our new vendor. If you happen to know the username and password to logon to my array, let me know. I certainly don’t.
I have to dig far too hard to find this information anytime I want it, so I’m putting it here. I just one-lined a CSV file in PowerCLI to list all of my attached LUNs across my entire VMware environment (only 1 vCenter makes it easy). Each LUN “CanonicalName” that is attached to an XIV array starts with “eui.00173800″ regardless of which array the LUN actually lives on. There are 8 hex digits remaining, of them it appears that first 4 are used to identify the array and the last 4 used to identify the LUN serial number.
Verifying that succeeds. I can see that the serial number of the XIV for which I have the most LUNs presented on shows up as the 4 hex digits numbered 9-12 in the euis in my list.
The last 4 don’t jive up initially when I look at LUN serial numbers, but that’s because I chose the very first one to look at, which is 0000 for digits 13-16. It appears that ESXi must create some sort of dummy LUN 0 for each array, all the following arrays make sense. It’s also worth noting that my RDMs show up in this list.
So the final number looks like this:
In the real world if I have a LUN Conanic eui.0017380035bc001d it’s referencing an XIV array S/N 13756 (35bc hex -> dec) and the LUN S/N 29 (001d hex -> dec).
Last note, if you need this reference you probably already are aware of this and even noticed it in the example above, but in VMware land we typically use hexadecimal to reference storage addresses, by default though XIV GUI lists everyting (including array S/N) in decimal. It all has to be converted. The LUN Serial Numbers can be done right in the GUI under Tools>Management, but you’ll have to convert array S/N either by hand or in your head if you can hex like a boss.