Using xargs for multiple functions

krypton · July 9, 2010, 10:26am

Hi Experts,

I am trying to parse some syslog outputs into a separate file per node using the below syntax but am having issues when it comes to my Xargs statements.

The command which I was intending on using was:

cat syslogs | nawk '/From/ { print $3 }' | uniq | xargs -I {} grep {} syslogs >> {}

Instead of grepping and dumping all the per node outputs into separate files its dumping all the outputs into one file.

What adjustments do I need to make to the xargs statement here to get this working.

Kind Regards,

K

anon64183241 · July 9, 2010, 10:44am

I think the issue is with the bit after ">>". The reason for this is that once you hit the shell redirect, the {} replacement no longer applies. I bet the one file you get is named "{}".

I would do this like this:

$ for i in `cat syslogs | nawk '/From/ {print $3}' | uniq`; do grep $i syslogs >> $i

krypton · July 9, 2010, 10:49am

Hi Malcolm,

I just tried it out, works perfectly.

Thanks for the help,

K

panyam · July 9, 2010, 11:19am

 
for i in `cat syslogs | nawk '/From/ {print $3}' | uniq`; do grep $i syslogs >> $i

Just an advice : no need of "cat" here, something like this should be fine:

 
for i in `nawk '/From/ {print $3}' syslogs | uniq`; do grep $i syslogs >> $i

methyl · July 9, 2010, 12:35pm

More points:
1) This is not a good command structure for an open ended list because it leaves the shell with a potentially massive "for" command to process.
We can turn the "for" into a "while" and remove that issue.
2) We need to "sort" the node selection or "uniq" will not work properly and we will be making selections multiple times.
3) We can replace ">>" with ">" because each file is now only written once.
4) The "cat" to a pipeline was harmless and can be quicker at getting records into "awk" than getting "awk" to read the file direct. I've left it out because it is the convention of this board.

Overall this should be quicker and more importantly should not give multiple repeat blocks of data because of the missing "sort".

nawk '/From/ {print $3}' syslogs | sort | uniq | while read node
do
      grep "${node}" syslogs > "${node}"
done

tange · July 9, 2010, 6:16pm

GNU Parallel is your friend:

cat syslogs | nawk '/From/ { print $3 }' | uniq | parallel "grep {} syslogs >> {}"

This even does the grepping in parallel, so if syslogs is bigger than your disk cache this will go considerably faster than reading the file from disk over and over again. The option -j0 may be useful to you as well.

Watch the intro video for GNU Parallel: YouTube - Part 1: GNU Parallel script processing and execution