No Paging Space Available

tugger · October 25, 2013, 6:39am

Whilst perfoming some tests on lvm's I managed to crash our test box. No real problem as it is only used by our tech team.

however I would like to know why this was actually caused as the task being performed at the time was one which I though would not have any impact.

Using dd I was creating 10 x 2gb files. The filesystem these were being created in was 200gb in size and had 99% free.

The errors I got before the system became unresponsive where:

stdout:

dd: 0511-051 The read failed.
: There is not enough space in the file system.

errpt:

8527F6F4   1024165613 P S SYSVMM         NO PAGING SPACE AVAILABLE

The script being run was quite a simple script, which performed a count (to 10) and for each count it wrote a file to a filesystem:

nohup time dd if=/dev/zero of=/filesystem/file_$count bs=512m count=4 &

The system has 10GB of Memory and a pagespace of 2G. Why would the dd cause the system to become unresponsive?

rbatte1 · October 25, 2013, 7:29am

Can you provide us the output from a df command for a starter. Are you sure, for instance, that /filesystem was mounted? It could be that you filled the root filesystem by mistake.

Some OS will create /tmp from memory (and therefore paging/swap space) If you were writing to the wrong place, then this is another option, and when the machine re-starts, /tmp will be empty.

There's not much else to go on. Perhaps don't run them in parallel. Maybe memory got exhausted by the processing load.

Might I guess that this is AIX even?

Robin
Liverpool/Blackburn
UK

bakunin · October 25, 2013, 8:00am

hmm, judging from errpt and the format of the error message ( 0511-051 Read failed ) i suppose it is indeed AIX (at least it looks like it). Still, it would help to know the version.

You do know that

nohup time dd if=/dev/zero of=/filesystem/file_$count bs=512m count=4 &

will have all the writer processes run in parallel in the background, do you? If i had to take a wild guess i'd ask myself if the blocksize might be kept in memory and if the 10 parallel dd -instances maybe taxed memory too much. You might want to run this again with a smaller blocksize and a higher count to make up.

What rbatte1 suggested (not mounted FS) seems still the most likely cause to me. If this can be verified not to be the case you might consider running "vmstat" in one window and then start the job again in another to see a more detailed picture of what is happening.

I hope this helps.

bakunin

tugger · October 25, 2013, 10:25am

Thanks for your responses guys.

1) I can confirm that the system is AIX -

5300-12-05-1140

2) The file system is mount (called lv_test_3)

/dev/fslv12    272629760 272587472    1%        4     1% /lv_test_4

3) I am aware all the dd process will run in parallel as that is the aim to look into the throughput on the LVM (SAN attached disks) and the benefit (if any) of having the inter policy set to maximum

With regard to the blocksize, you may well be onto something. When I originally had this set to 1m and a count of 2000, the dd's ran through without any issue. Therefore your theory of the bs being held in memory might be correct.

I was trying a bs of 512m with the thought that it might be faster than using a bs of 1m.

Corona688 · October 25, 2013, 10:28am

It certainly has to be held somewhere for dd to send a whole block all in one write().

bakunin · October 25, 2013, 6:56pm

Good. Sorry for sometimes stating the obvious superfluously, the general experience of long-standing members here is that the painfully obvious is less obvious one might think to the better part of the audience.

If you are carrying out performance tests you might consider taking the file system driver out of the equation by addressing the raw device instead of the filesystem:

dd if=/dev/null of=/dev/somelv ....

or even the raw hard disk. I once did exactly that for the same reasons, you can read here an account of the risks involved - afterwards it was quite funny.

Give the LPAR more memory, then (6GB should suffice). Increase not the "max" alone in the profile, increase the "Desired" too. Fork()-ing 10 such processes will be done faster than the hypervisor can react.

I hope this helps.

bakunin