SAN/LVM question

msarro · June 28, 2011, 5:00pm

Hey everyone.
I am working on designing a logging solution for a deployment we have going out in a few months. Right now we have a single storage head end, connected via fibre to a SAN. Basically the plan is to create a number of smaller LUNs on the SAN, and then use LVM2 to handle concatenating them all together like a single disk.

The largest volume we expect to see is about 130 TB. So with that in mind, is there any reason I would use GPT vs MBR?

Also, I plan to use XFS as the filesystem due to the possible sizes of some of the volumes (ext3 caps out at ~24TB now?). Is there any known issue with resizing XFS filesystems on top of LVM2? I haven't been able to find much when searching for both of them.

Any help or advice would be appreciated. Thanks!

mark54g · June 28, 2011, 7:56pm

Here's a few questions. Why would you want to leave it as a huge volume? You can increase the size of the XFS file system, but wouldn't it make more sense to add LVs to the VG?

Is there anything you hoped to achieve by concatenation? You are increasing your risk with concatenation, as parts of your data may go missing, especially with regards to RAID types in your SAN array.

What storage vendor are you using? Have you checked performance on XFS deletes with large files and small files? It is tunable, but XFS is slow on deletes.

cjcox · June 28, 2011, 8:07pm

Single large filesystems are asking for trouble. Remember, if anything goes wrong with that 130TB, it could take a MONTH to repair it.

However, apart from that you CAN do what you are saying you want to do. GPT is only of use when a drive actually has partitions on it... with LVM, you don't need any partitions and can add the whole drive itself as a PV (in case you didn't know that).

msarro · June 28, 2011, 8:53pm

The server is going to be used to store log data for 60 days. Basically the goal is to keep logs from numerous servers in once place. The servers that use the largest amount of data will each have their own filesystem on their own volume groups.

The systems which produce a smaller volume of records will be on separate smaller volumes. The Servers are HP DL360 G7's, and the san is an HP P2000 G3, using fibre channel.

On the SAN we have a total of 12 disks (to start), split into two 6-disk RAID 5's. Adding additional disk shelves, we will continue doing the same thing. We're using 7200 RPM 2TB midline drives since this is just going to be used for data warehousing. We have these RAID 5's carved into single 1TB volumes. These will each get added to the storage server. We'll turn them into LVM physical volumes with PVcreate. Then, we'll create separate the 6 separate volume groups out of them.

Right now we're looking at having 6 separate filesystems running on 6 separate Volume Groups. The largest could be over 25-30TB. The smallest will probably be around 1-2 TB.

We've based this on load testing, but we don't have the ability to fully load test what the system is meant to handle outside of production so certain things (DB archive logs) we can't really get an idea about. Atm we're just scaling up the numbers we're seeing but that's fuzzy math.

---------- Post updated at 08:53 PM ---------- Previous update was at 08:35 PM ----------

This is awesome advice actually. We were running into issues when testing putting a LVM partition on the lun. It wasn't showing up, and partprobe wasn't working. Only thing that worked was a reboot, and that won't be acceptable in production. Just tried using a whole lun instead of a partition and it worked like gangbusters.

mark54g · June 28, 2011, 10:27pm

My experience with MSA storage (which the P2000 is the next generation of) has not been stellar. I have had a tremendous amount of failures. Also, isn't the P2000 an arbitrated loop system? If you plan on chaining them together, that may mean downtime.

It's too bad you can't do something like use an IBM SVC in front of your storage, as it would make a lot of your issues go away.

I agree wholeheartedly with cjcox. Large file systems on low budgets are asking for problems. I would create a VG for each array and then create LVs for each "client" logging server.

Also, I would suggest at LEAST 1 hot spare in your groups, as losing a 2TB drive will degrade performance, since they are 7200rpm to begin with, but the time to rebuild RAID 5 should be considered. You are talking many hours. If you have a second failure during that time, data is gone.

Also, your 2TB drives are really 1.85TB, and each array you create is about 9TB in size. Carve up your LUNs and treat them as PVs, using a rather large extent size.

Do you have to keep the logs around full time? My suggestion would be to have your server keep the latest data in an uncompressed format and then archive it to compressed formats later on, or at least move them into other drives to keep your file systems smaller.

msarro · June 29, 2011, 7:47am

The p2000 has gotten a bit of a dust off from the MSA2k series, thank god. They no longer require downtime to add additional shelves (verified by HP) on any of the platforms.

The logs need to be kept around for at least a few days (our operations team will be firing some scripts off at them for data mining purposes). However after that I don't see a reason why they couldn't get moved to a different location and gzipped (after say 30 days).

Each RAID 5 is currently split up into about 5 different LUNs, so none of the LUNs will be larger than 1TB a piece. Each one will be a PV, and those will get put together as VG's for each type of server (well for the two primary server types, then one for db backups, one for cross site backups, and finally one vg for the other server types). So no single PV should be larger than 1.5TB.

That's a really good point about the spare. I hadn't thought about that when I was designing it. I'm not sure if it will be possible to sacrifice the additional storage space, but if archiving after, say, 30 days is possible I'm pretty sure that will free up more than enough storage. We do have 4 hour/24x7 support from HP under our contract with them so we should be covered for speedy hardware replacements. Sadly the SNMP implementation for the P2000 isn't the best.

I actually went to a bunch of vendors and the issue was no one had a system that could run on DC power except HP. They tried to sell us a P4000 SAN, which is in fact a P2000 back end with a DL380 on the front, and that's it (it may run their proprietary storage software or something, not sure). So we found out the parts and put it together ourselves. Dell has pretty much no DC offerings, and everything from Hitachi fell well outside my budget for the storage requirements. We don't currently have a business relationship with IBM for that to have been considered (basically have to stick with the approved vendor list).

Any other advice? I appreciate everyone's feedback. Its looking like there are placers where my original design was over engineered and places where it was underengineered. Sadly I don't have access to storage specialists who aren't vendors so peer review of my design can't happen too well in house