Hi,
Recently we have new server T5 Oracle. We set up it for our database. For out database files we set one zfs filesystem. When i use iostat -xc the output as below. As you see the value for vdc4 is quite high.
Is this normal? When we database full backup the db will hang although the server load is normal during backup. Is that maybe related to the zfs filesystem setting? Hope can engligthen me on these.
It has been a long time since I worked with a ZFS filesystem, but I don't think it is unusual for ZFS to consume memory that is otherwise unused as a cache for ZFS disk data.
Reading 33Mb/s and writing 7.5Mb/s may seem high, but with 0 wait time on the device, it doesn't appear to be a problem.
Are you seeing a high swap rate (or any indication that running processes are running poorly due to a lack of available memory)?
It is not unusual for ZFS to eat almost all available memory.
You don't want that with database. Even if you are running on ZFS filesystems.
I would not recommend running databases on ZFS filesystems, since it requires alot of tuning to get it right. There is also an unresolved issue of fragmentation and for large implementation i would avoid ZFS for DB. ASM is the law
Are those FC or internal disks ?
What is the patchset you are running at (hypervisor & ldom - since i see it is a ldom) ?
Can you please tell what are the values kernel parameters :
Can you post output of following command during the problem ?
sar -d 2 10
Take a look at the avque , i suspect it is very high during the non responsive period.
If not, possibly your issue resides with arc_max (confirm that the machine is not swaping as Don suggested). Lower it to a sane value so your database doesn't run out of PGA space (it will start swapping then, causing extreme slowness).
In short, you will need multiple zpools on different spindles with different setups for various DB functionality (REDO, ARCH, DATA) and keep them under 80% (this is very important).
There is a lot of misunderstanding around this topic. All file systems will eat as much memory as they find useful, not just ZFS, unused memory being wasted memory anyway.
The big differences are:
ZFS memory, including the ARC, is reported as used/unavailable while other file systems memory, the buffer cache and the page cache, is reported as free/available.
ZFS memory is released asynchronously and gradually by observing RAM demand while other file system's memory is released synchronously and (almost) instantaneously. Where that matters is when an application requests a very large amount of non pageable memory as the allocation might fail. The arc_max tuning prevents ZFS to use all the RAM helping these allocations to succeed.
also. snapshots. minimize them. more snapshots will mean more i/o. I had issues similar to this a while back and it all came down to snapshots and zfs_arc_max.
more snapshots = more writing to the delta log which contributes to greater I/O. if you have complex zfs systems it's exponental. snapshot -r /rpool will snapshot every subordinate fs and will then cause any changes to have to be written to each snapshot. 50 subs, 10 snapshots you get the idea...
I'm afraid I don't get it. Snapshots are read-only by design so they cannot be the target of write operations. On the other hand creating them can have a small overhead and destroying them might have a bigger overhead. The latter is to be balanced with the fact having snapshots reduce the number of I/Os in case of file removal, as the data blocks, being still referenced by the snapshot(s) need not to be marked as free.
while admittedly I don't know the specifics of how it works I do know that a zfs snapshot is a delta value of the FS. so it must be recording those deltas somewhere. The older the snap the larger the file and the more snaps the more writes.
I can only tell you from practical experience that removing snapshots DOES improve performance.
The delta is not written. The data already exists in the original filesystem.
For instance, you have 4 files sized 20 GB on a zfs filesystem inside a zpool sized 100 GB.
Zpool current space utilization is 80%
For the sake of argument, we have only one filesystem in that zpool.
A snapshot has been made on that zfs filesystem.
You delete 1 of 4 files sized 20 GB.
Zpool will remain on 80%, since the snapshot is referencing on the deleted data, the data is not actually deleted from the zpool.
You issue zfs destroy on the snapshot. This operation actually deletes data from the zpool.
This is how i understand it, feel free to correct me
As for ARC :
Problem is it doesn't work well if a program requests a very large memory chunk (such as Oracle database), since it will request memory, if it is not given in certain time it will start swapping.
This is why i avoid zfs filesystems in general for Oracle database and use ASM with limited ZFS arc.
Take a look at the documentation regarding ZFS and databases. It requires a lot of love and attention.
I'd rather give that love to something else and run ASM
Chip in a good SSD or a local flash cache card as a CACHE device for Oracle, pin a couple of monster indexes in it and go get some beer
On the other hand, on several TB Solaris Cluster with ZFS which is a NFS server, i haven't touched that tunable.
Machines work fine with 95% memory consumed, mostly by filesystems, using the memory as ARC (which is desired).
Any idea on how to solve my issue regarding DB hang when backup(RMAN)? I already give sar -d output earlier. How to fine tune the zfs parameter especially the arc_max?
Is that change the parameter will cause data loss? Please help.
Hi jlliagre,
I have read it and found need to change some zfs value parameter. Is that safe to change those parameters recommended? Is that will affect the data in that filesystem?
As the file systems stores tables and indexes, tune the recordsize setting. It should probably be 8k vs 128k but it is too late for the parameter to affect the existing files. Look for "Important Note:" in the white paper for a workaround.
Properly tuning the record size is know to dramatically reduce the number of I/Os in some use cases although not necessarily with yours.
It's much worse than that on a server running Oracle database instance(s). The ZFS ARC does not play nice with Oracle databases. At all:
ZFS ARC expands to use all free memory - as 4k pages.
Oracle DB has a transient demand for memory - but it requests large pages (4 MB IIRC).
Entire server comes to an effective screeching halt while VM management is hung coalescing large pages.
Oracle DB releases the large pages, ZFS ARC grabs them and fragments them.
Repeat.
If the server is used just as a database server, limit the ARC to under 1 GB, if not smaller. After rebooting, check to be sure the ARC is actually limited to what you specified - if you go too small your limit will be ignored.