Hello all,
after spending hours of searching the web I decided to create an account here. This is my first post and I hope one of the experts can help.
I need to resolve a grep / sed / xargs / awk problem.
My input file is just like this:
----------------------------------
root@Ubuntu-12:~# cat myfile
article1
data.........x
colour....blue
number.........15
name...smith
month...................july
article2
colour....yellow
number.........423489
something....x
month...................january
article3
colour....orange
number.........7
name....jason
month...................may
value.....4
much
more
lines
root@Ubuntu-12:~#
----------------------------------
This is the code I currently use (example):
grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | \
awk -F . '{print $NF}' ; done | xargs -L5 | \
awk 'BEGIN {printf("%15s %15s %15s %15s %15s\n" ,"Article", "Colours", "Numbers", "Names", "Month")} {printf("%15s %15s %15s %15s %15s\n", $1, $2, $3, $4, $5)}'
Unfortunately the output looks like this:
Article Colours Numbers Names Month
article1 blue 15 smith july
article2 yellow 423489 january article3
orange 7 jason may
As we can see the format is screwed up because we are egrep'ping for 5 values. This was successful for "article1" but "name...xx" is missing in "article2". Therefore "article3" is used as the 5th column in row 2 rather than in column1 of row 3.
So xargs is parsing the wrong format into awk which eventually shifts the table:
grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | awk -F . '{print $NF}' ; done | xargs -L5
article1 blue 15 smith july
article2 yellow 423489 january article3
orange 7 jason may
------------------------------------
Now the question. Is there a way that egrep, when searching for 5 strings but only finding 4, is replacing a missing string with a replacement word like "missing"? This would ensure xargs -L5 is happy and awk keeps the format for the table.
Or is there a more efficient way of doing this?
The input text file is just an example for a much larger file with hundreds of thousands of lines.