Just a quick question on this, I've tried to run this a couple of times now - first time it failed I increased the swap. I'm not sure that increasing the physical memory will do any good, but will try later today - has anyone seen this or anything similar?
SunOS fvssphsun01 5.11 11.3 sun4v sparc sun4v
root@fvssphsun01:~# pkgrepo refresh -s /export/s11repo
Initiating repository refresh.
Apr 18 10:59:06 fvssphsun01 su: 'su root' succeeded for e415243 on /dev/pts/2
pkgrepo: There is not enough memory to complete the requested operation. At least
3GB of virtual memory was in use by this command before it ran out of memory.
You must add more memory (swap or physical) or allow the system to access more
existing memory, or quit other programs that may be consuming memory, and try
the operation again.
root@fvssphsun01:~#
root@fvssphsun01:~# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 16 32G 0.2% 0.1% 7d 21h 17m
fbasphnhhp01 active -n---- 5001 16 32G 0.0% 0.0% 6d 4h 4m
fdbsphnhhp01 active -n---- 5000 48 96G 0.1% 0.1% 6d 3h 57m
root@fvssphsun01:~# swap -l
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 303,1 16 8388592 8388592
/dev/zvol/dsk/rpool/swap2 303,3 16 25165808 25165808
/dev/zvol/dsk/rpool/swap 303,1 8388624 25165808 25165808
root@fvssphsun01:~#
Hmmmm.......no, I haven't come across that problem before.
My first thought would be to check user/process resource limits. There seems to be quite enough swap available to avoid it falling over at 3GB virtual memory (unless the system is otherwise very heavily loaded). We all know that on Solaris, by default, root user is unlimited in resources but some clown might have imposed some limit. I assume that this is not a system installed and configured by you so you don't know the history??
Like you, at this stage of investigation, I wouldn't believe it is an actual memory shortage causing it.
Again all good and well although it took 3days and 18hours down loading almost 140Gb, this was followed by;
pkgrepo verify -s /export/s11repo
This also ran just dandy, can't remember how long it took and was followed by;
root@fvssphsun01:~# pkgrepo refresh -s /export/s11repo
Initiating repository refresh.
pkgrepo: There is not enough memory to complete the requested operation. At least
4GB of virtual memory was in use by this command before it ran out of memory.
You must add more memory (swap or physical) or allow the system to access more
existing memory, or quit other programs that may be consuming memory, and try
the operation again.
You'll notice that we had a 4Gb error which I thought was suspicious, so I increased the swap - no change. I added another 32G memory - no change. I changed the ZFS ARC Cache setting and the 4Gb error became a 3Gb error even with the additional swap and the additional physical memory.
Here is a quick update on this, still not entirely sure what I did to fix this - the only thing that I can think may have resolved the problem was a package index. After that was going to run the upgrade again and an explorer in an other shell to get the required data for Oracle - however the unexpected happened and the repo update worked.
root@fvssphsun01:/var/tmp/p27353277_9621a# pkg search -Hlo value info.cve:
pkg: Search performance is degraded.
Run 'pkg rebuild-index' to improve search speed.
root@fvssphsun01:/var/tmp/p27353277_9621a# pkg rebuild-index
Building new search index 582/582
root@fvssphsun01:/var/tmp/p27353277_9621a# pkg search -Hlo value info.cve:
root@fvssphsun01:/var/tmp/p27353277_9621a# pkgrepo refresh --key /var/pkg/ssl/pkg.oracle.com.key.pem --cert /var/pkg/ssl/pkg.oracle.com.certificate.pem -s https://pkg.oracle.com/solaris/support/ -s /export/s11repo
Initiating repository refresh.
root@fvssphsun01:/var/tmp/p27353277_9621a#
Still not entirely sure what happened, but it all seems to work now.
A little late but just wanting to point out that "swap -l" wasn't the right command to figure out what the virtual memory usage was like on that server.
With non overcommiting OSes like Solaris, you can reach an out of (virtual) memory state despite still having plenty of unused swap and RAM.
You are probably hitting bugs or features of 11.3 new KOM -> kernel object manager.
Ever since 11.3, i leave couple of GB not to be used by zfs (kernel, user_reserve_hint)
Otherwise, once you reach memory pressure, stuff pauses and timeouts badly, effecting entire operating system and services on it.
This is probably related unfortunately and it just doesn't work properly in my opinion.
Which should have meant that there was at least 32Gb available, instead of failing at 4Gb - it failed at 3Gb. At that point I decided to call it quits for a little while.
When I had to go back to the problem, the only thing that I'd done with the index was to re-index it this time the command ran and I'm still not certain what I did.
You were still focused on RAM with memstat while the issue was about Virtual memory. Again, while I agree adding RAM or freeing RAM previously allocated for ZFS in increasing the size of free virtual memory, failing to observe the virtual memory usage didn't help. Something was reserving most or all of virtual memory available on your system, identifying what was doing it might have help understanding why.
Download and create repository, pulling that 140 GB all version oracle repository will do
Run a simple find against repository directory with redirection to a file.
This is a great test case since that repo contains a lot of files.
Observe the memory footprint increase, mostly going to kernel and zfs lines in memstat.
A new ::komstat will output stuff more closely.
After it is finished run a small c program allocating memory.
System will pause literally for a time until reap process is complete.
This of course, can cause cluster failures, database failovers, nfs client issues , kernel panics and similar.
Dangerous stuff, even from a security standpoint.
Those are bugs present in oracle site, keyword kom reap
If i recall correctly those are fixed in higher versions, but i have yet to test it.
In my opinion, keeping limit and/or sizing of this zfs / kernel behavior is a must under 11.3 under any conditions on any patchset.