Shell script failing to read large Xml record-urgent critical help

aixjadoo · June 12, 2008, 9:11am

Hi All,

I have shell script running on AIX 5.3 box. It has 7 to 8 "sed" commands piped(|) together. It has a an Xml file as its input which has many records internally. There are certain record which which have more than hundered tags.The script is taking a huge amount of time more than 1.5 hrs to process the whole xml file:(.Can any one tell me what is the issue with a solution. I am assuming that lot of swapping is occuring from RAM to Disk and thus it is taking so long a time???

CAn any one suggest if i can allocate memory for this script process at the time of execution???

Thanks in advance!!

Pls respond early!! this is critical!!

aixjadoo · June 12, 2008, 9:13am

Hi All,

I have shell script running on AIX 5.3 box. It has 7 to 8 "sed" commands piped(|) together. It has a an Xml file as its input which has many records internally. There are certain record which which have more than hundered tags.The script is taking a huge amount of time more than 1.5 hrs to process the whole xml file:(.Can any one tell me what is the issue with a solution. I am assuming that lot of swapping is occuring from RAM to Disk and thus it is taking so long a time???

CAn any one suggest if i can allocate memory for this script process at the time of execution???

Thanks in advance!!

Pls respond early!! this is critical!!

bakunin · June 12, 2008, 9:24am

To give you sound advice it would be necessary to know what the script is, what your input data is and what your desired output is. Post it here (not *all* the data, just a significant sample) and we will think about it.

Lacking any info the best we can do is tell you some generalized hints which may or may not help in your specific case. Here is one: Probably the several sed-calls could be combined to one single script if they are piped one into another. This might speed things considerably.

Another one: maybe you are doing something context-oriented. sed is poor in that and maybe some of its shortcomings are covered by shell constructs. If this is the case you might be better of writing the whole in awk, which will be slower than sed in what sed can do well, but faster than sed and some shell constructs connected in a pipeline.

(For instance: if you try to cut out some part of every line sed is probably faster than then often-seen "awk '{print $5}'", but if the part you are cutting is a number and you want all these numbers totalled at the end than awk is quite better than cutting with sed and adding in a shell loop.)

bakunin

aixjadoo · June 12, 2008, 9:32am

Hi

i ahve all the sed's piped in the same script and my data consist of multiple Xml records in a xml file and the all the seds in total are taking long time.

I have taken 1 sed command singly and it took 25 mins... and i have 6 more piped along

i don't have much of an option to modify the script... other than something like ....

can we allocate memory (RAM) space [b'coz i have a very very huge RAM] so that swaping of the records may be reduced...

can u suggest something!!!plz plz

zaxxon · June 12, 2008, 9:51am

As bakunin says, post your script so people can see what might be redundant or can be written more efficient. And again no no no, there is no allocating memory for shell scripts on AIX, not even on other derivates I think.
You can check if your ulimits might be set too low. Else tune your machine so it won't swap with vmo. But this is another issue.

And please do not spam "plz" - it sounds like a 14 year old in World of Warcraft. Sorry, but couldn't resist.

matrixmadhan · June 12, 2008, 9:59am

One suggestion what I could say is ( not complete though )

if you are using sed commands like

sed 's/abc/def/g' filename | sed 's/rty/uty/g'

it can be combined as

sed 's/abc/def/g;s/rty/uty/g' filename

Second one would be much faster, as there is only one sed command ( process ) and iterations going on ( no Kernel Data Structure even )

This is a relative term and not encouraged here.

jim_mcnamara · June 12, 2008, 9:59am

Every time you execute a pipe you create a child process. This is expensive and consumes huge amounts of resource.

You can get past this by doing things like creating co-processes, or setting up cooperating processes. Again we need to see the "seven" pipes to sed. Which is very likely the bottleneck - but that is a guess.

matrixmadhan · June 12, 2008, 10:04am

Also,

you should have seen 'n' ( n is quite large here ) posts with the following format

problem definition
sample input
sample output
what they had tried
problem faced

You could follow at least 3 out of that

era · June 13, 2008, 6:56am

What they are trying to say is that sed -e foo | sed -e bar can in many cases be modified into a single script sed -e foo -e bar and sometimes this can bring significant savings. But this is not always possible; it depends on what foo and bar do.

25 minutes sounds like you have an inefficient regular expression or loop in the script, but like others have stated here repeatedly, it's hard to say anything concrete until you post the actual code here.

jim_mcnamara · June 13, 2008, 4:39pm

Since this was more than a day ago, I guess the urgency waned.

aixjadoo · June 16, 2008, 6:22am

ya the issue is no more urgent....

I worked my way around using sed..
Thanks!!