[Solved] /var is filling continuously

Hi All,

I have Solaris-10 machine. Yesterday I patched it with Solaris-10 patch Cluster. Since then glance software is filling up /var/core continuously. In every few minutes, it will fill /var to 100%.
Glance runs through /etc/init.d/mwa and I already stopped it, still core files are generating, I don't know, from where. How will I know and can stop it. Please help.

# ps -ef | egrep -i "mwa|glance"
    root 18647 17055   0 05:18:09 pts/2       0:00 egrep -i mwa|glance
# du -sh /var/core/*
 3.5G   /var/core/core_tstbgp11_glance_0_1984_1315829327_16469
 3.5G   /var/core/core_tstbgp11_glance_0_1984_1315829627_18040
 2.1G   /var/core/core_tstbgp11_glance_0_1984_1315829928_19553
# df -h .
Filesystem             size   used  avail capacity  Mounted on
/dev/vx/dsk/bootdg/var
                        12G    12G     0K   100%    /var
# uname -a
SunOS tstbgp11 5.10 Generic_144488-17 sun4u sparc SUNW,Sun-Fire-V490

Regards

You can disable writing core files to /var/core by using:

coreadm -d global

But you should still investigate why your applications keeps crashing...

See: When Glance stop working | System Adm

Hi Yazu,

I tried your steps, but still it is generating core files

# ps -ef | egrep -i "sco|mwa|ttd|midaemon"
    root 10163 26426   0 06:27:37 pts/1       0:00 egrep -i sco|mwa|ttd|midaemon
# du -sh /var/core/*
 3.5G   /var/core/core_tstbgp11_glance_0_1984_1315833828_9576

Can you try following below steps and see if you are still getting the core files.

 
/opt/perf/bin/mwa stop
/opt/perf/bin/midaemon -T [If the midaemon is still active]
/opt/perf/bin/ttd -k [If the ttd is still active]
/opt/perf/bin/perfstat
 

This will stop all the performance tools and you can see the same in the output of the last command.

After this the core files due to glance should not be there since we haven't started anything.

If still you are getting the core files, then please collect the below command output for every minute.

 
ps -ef

Once a core file is generated you can get the process number from its name and check which process is making it to crash and can daignose it better.

Regards,
Vishal

Hi Vishal,

I ran all mentioned commands. /opt/perf/bin/perfstat also shows as below

MeasureWare scope status:
WARNING: scopeux    is not active (MWA data collector)
MeasureWare background daemon status:
(Should always be running when the system is up)
WARNING: ttd        is not active (Transaction Tracking daemon)
MeasureWare server status:
WARNING: alarmgen   is not active (alarm generator)
WARNING: agdbserver is not active (alarm database server)
WARNING: perflbd    is not active (location broker)
WARNING: rep_server is not active (repository server)

But still core file is generating

# ls -l /var/core
total 10943792
-rw-------   1 root     root     3772379034 Sep 12 07:16 core_tstbgp11_glance_0_1984_1315836828_24759
-rw-------   1 root     root     1828080968 Sep 12 07:19 core_tstbgp11_glance_0_1984_1315837127_26379
# ps -ef | egrep -i "sco|mwa|ttd|midaemon"
    root 24761     1   0 07:13:31 ?           0:00 /opt/perf/bin/midaemon
    root 26954 26426   0 07:20:06 pts/1       0:00 egrep -i sco|mwa|ttd|midaemon

I killed PID 24761, deleted core file, waited for 2 minutes. And again I can see core file is generating. Something is starting midaemon

# ls
core_tstbgp11_glance_0_1984_1315837428_28015
# ps -ef | egrep -i "sco|mwa|ttd|midaemon"
    root 28586 26426   0 07:24:29 pts/1       0:00 egrep -i sco|mwa|ttd|midaemon
    root 28017     1   0 07:23:31 ?           0:00 /opt/perf/bin/midaemon

Post the output of

crontab -l

No other user have access on cron except root. I don't see anything is related to that

# crontab -l | grep -v "#"
0 19 * * * /usr/local/bin/CLEANER_v1.1.ksh > /var/log/CLEANER.log 2>&1
10 3 * * * /usr/sbin/logadm
15 3 * * 0 /usr/lib/fs/nfs/nfsfind
30 3 * * * [ -x /usr/lib/gss/gsscred_clean ] && /usr/lib/gss/gsscred_clean
25 17 * * * /var/tmp/sys_stat.ksh > /var/tmp/sys_stat.ksh.out 2>&1

Post contents of /etc/inittab

ap::sysinit:/sbin/autopush -f /etc/iu.ap
sp::sysinit:/sbin/soconfig -f /etc/sock2path
smf::sysinit:/lib/svc/bin/svc.startd    >/dev/msglog 2<>/dev/msglog </dev/console
p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog 2<>/dev/msglog
pt:s1234:powerfail:/usr/lib/svc/method/installupdates lock

Maybe it is in the SMF control? Try:

svcs -a | grep -i glance

If you don't see any output, then check all the services reported by svcs -a and look for anything that might be your application.

I couldn't find anything there as well

# svcs -a | egrep -i "scope|mwa|ttd|midaemon|glan"
disabled       21:22:21 svc:/network/rpc/cde-ttdbserver:tcp

Try this:

  1. kill midaemon (or stop it with stop script)
  2. run this code
dtrace -wn 'syscall:::entry/execname=="midaemon"/{system ("ptree %d", ppid);exit(0)}'
  1. wait for midaemon to get respawned. DTrace should then show who started this process.
1 Like

It gave me below output

# dtrace -wn 'syscall:::entry/execname=="midaemon"/{system ("ptree %d", ppid);exit(0)}'
dtrace: description 'syscall:::entry' matched 237 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  0   4488                resolvepath:entry 4146  /bin/sh /usr/local/bb/bin/bbrun /usr/local/bb/ext/bb-system.sh
  23513 /bin/ksh /usr/local/bb/ext/bb-system.sh
    23557 head -11
      23558 glance -adviser_only -syntax /usr/local/bb/tmp/bb-system.tmp.23513 -j 1 -iterat

I got it, it is a script of BB, which is starting it

Big Brother is notorious for starting a lot of stuff off in the shadows.....

In your case you might want to review the bb configuration. It is a very useful tool.

Thanks Bartus. You really did a great help (and learning too :slight_smile: )
I reinstalled glance and then it started working fine