Solaris Repo Update

gull04 · April 18, 2018, 6:34am

Hi Folks,

Just a quick question on this, I've tried to run this a couple of times now - first time it failed I increased the swap. I'm not sure that increasing the physical memory will do any good, but will try later today - has anyone seen this or anything similar?

SunOS fvssphsun01 5.11 11.3 sun4v sparc sun4v
root@fvssphsun01:~# pkgrepo refresh -s /export/s11repo
Initiating repository refresh.
Apr 18 10:59:06 fvssphsun01 su: 'su root' succeeded for e415243 on /dev/pts/2


pkgrepo: There is not enough memory to complete the requested operation.  At least
3GB of virtual memory was in use by this command before it ran out of memory.
You must add more memory (swap or physical) or allow the system to access more
existing memory, or quit other programs that may be consuming memory, and try
the operation again.
root@fvssphsun01:~#
root@fvssphsun01:~# ldm ls
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-cv-  UART    16    32G      0.2%  0.1%  7d 21h 17m
fbasphnhhp01     active     -n----  5001    16    32G      0.0%  0.0%  6d 4h 4m
fdbsphnhhp01     active     -n----  5000    48    96G      0.1%  0.1%  6d 3h 57m
root@fvssphsun01:~# swap -l
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 303,1        16  8388592  8388592
/dev/zvol/dsk/rpool/swap2 303,3        16 25165808 25165808
/dev/zvol/dsk/rpool/swap 303,1   8388624 25165808 25165808
root@fvssphsun01:~#

Regards

Gull04

hicksd8 · April 18, 2018, 9:05am

Hmmmm.......no, I haven't come across that problem before.

My first thought would be to check user/process resource limits. There seems to be quite enough swap available to avoid it falling over at 3GB virtual memory (unless the system is otherwise very heavily loaded). We all know that on Solaris, by default, root user is unlimited in resources but some clown might have imposed some limit. I assume that this is not a system installed and configured by you so you don't know the history??

Like you, at this stage of investigation, I wouldn't believe it is an actual memory shortage causing it.

gull04 · April 18, 2018, 9:24am

Hi Dennis,

I built the system, I used the text installer and then pulled down the initial repo from the Oracle site - all good and well so far.

I then ran the pkgrecv as follows;

root@fvssphsun01:~# http_proxy=http://proxyspw.corp.XXXXXXXXX.com:8080 
root@fvssphsun01:~# export  http_proxy 
root@fvssphsun01:~#  pkgrecv --key /var/pkg/ssl/pkg.oracle.com.key.pem --cert /var/pkg/ssl/pkg.oracle.com.certificate.pem -s https://pkg.oracle.com/solaris/support/ -d /export/s11repo '*'

Again all good and well although it took 3days and 18hours down loading almost 140Gb, this was followed by;

pkgrepo verify -s /export/s11repo

This also ran just dandy, can't remember how long it took and was followed by;

root@fvssphsun01:~# pkgrepo refresh -s /export/s11repo                           
Initiating repository refresh. 
pkgrepo: There is not enough memory to complete the requested operation.  At least 
4GB of virtual memory was in use by this command before it ran out of memory. 
You must add more memory (swap or physical) or allow the system to access more 
existing memory, or quit other programs that may be consuming memory, and try 
the operation again.

You'll notice that we had a 4Gb error which I thought was suspicious, so I increased the swap - no change. I added another 32G memory - no change. I changed the ZFS ARC Cache setting and the 4Gb error became a 3Gb error even with the additional swap and the additional physical memory.

Regards

Gull04

gull04 · April 23, 2018, 10:00am

Hi Everyone,

Here is a quick update on this, still not entirely sure what I did to fix this - the only thing that I can think may have resolved the problem was a package index. After that was going to run the upgrade again and an explorer in an other shell to get the required data for Oracle - however the unexpected happened and the repo update worked.

root@fvssphsun01:/var/tmp/p27353277_9621a# pkg search -Hlo value info.cve:
pkg: Search performance is degraded.
Run 'pkg rebuild-index' to improve search speed.
root@fvssphsun01:/var/tmp/p27353277_9621a# pkg rebuild-index
Building new search index                    582/582
root@fvssphsun01:/var/tmp/p27353277_9621a# pkg search -Hlo value info.cve:
root@fvssphsun01:/var/tmp/p27353277_9621a#  pkgrepo refresh --key /var/pkg/ssl/pkg.oracle.com.key.pem --cert /var/pkg/ssl/pkg.oracle.com.certificate.pem -s https://pkg.oracle.com/solaris/support/ -s /export/s11repo
Initiating repository refresh.

root@fvssphsun01:/var/tmp/p27353277_9621a#

Still not entirely sure what happened, but it all seems to work now.

Just thought I would update everyone.

Regards

Gull04

jlliagre · April 23, 2018, 9:47pm

A little late but just wanting to point out that "swap -l" wasn't the right command to figure out what the virtual memory usage was like on that server.

With non overcommiting OSes like Solaris, you can reach an out of (virtual) memory state despite still having plenty of unused swap and RAM.

The proper command would have be "swap -s".

Peasant · April 23, 2018, 11:47pm

You are probably hitting bugs or features of 11.3 new KOM -> kernel object manager.
Ever since 11.3, i leave couple of GB not to be used by zfs (kernel, user_reserve_hint)

Otherwise, once you reach memory pressure, stuff pauses and timeouts badly, effecting entire operating system and services on it.

This is probably related unfortunately and it just doesn't work properly in my opinion.

Regards
Peasant.

gull04 · April 24, 2018, 5:28am

Hi Peasant,

I'd already been through that loop, had doubled the memory to 64Gb and set the value as follows;

echo "user_reserve_hint_pct/W0t50" | mdb -kw
echo ::memstat | mdb -k
echo ::memstat | mdb -k
echo ::memstat | mdb -k
echo user_reserve_hint_pct/D | mdb -k
echo ::memstat | mdb -k

Which should have meant that there was at least 32Gb available, instead of failing at 4Gb - it failed at 3Gb. At that point I decided to call it quits for a little while.
When I had to go back to the problem, the only thing that I'd done with the index was to re-index it this time the command ran and I'm still not certain what I did.

Regards

Gull04

jlliagre · April 24, 2018, 5:39am

You were still focused on RAM with memstat while the issue was about Virtual memory. Again, while I agree adding RAM or freeing RAM previously allocated for ZFS in increasing the size of free virtual memory, failing to observe the virtual memory usage didn't help. Something was reserving most or all of virtual memory available on your system, identifying what was doing it might have help understanding why.

Peasant · April 24, 2018, 11:06am

Give it a test on 11.3, with 20ish GB of ram.

Download and create repository, pulling that 140 GB all version oracle repository will do
Run a simple find against repository directory with redirection to a file.
This is a great test case since that repo contains a lot of files.
Observe the memory footprint increase, mostly going to kernel and zfs lines in memstat.
A new ::komstat will output stuff more closely.

After it is finished run a small c program allocating memory.
System will pause literally for a time until reap process is complete.
This of course, can cause cluster failures, database failovers, nfs client issues , kernel panics and similar.
Dangerous stuff, even from a security standpoint.

Those are bugs present in oracle site, keyword kom reap
If i recall correctly those are fixed in higher versions, but i have yet to test it.

In my opinion, keeping limit and/or sizing of this zfs / kernel behavior is a must under 11.3 under any conditions on any patchset.

Regards
Peasant.