I have got this working OK but I am sure there is a more efficient/elegant way of doing it, which I hope you can help me with.
It can be done in whatever is most suitable i.e perl/awk..
Any suggestions are welcome and many thanks in advance.
What I require is to extract the first field using " as the FS upto the last . in that field. Sometimes there are several . in that field.
The second field is from the last . to the first "
The third field is from the first " to the | removing spaces.
This output is only required if the third field using the " as FS is blank, and the second field upto the | has data present.
Below is an example of all variants of the data I have in a file 800000+ rows.
This is the output using the above input.
#!/bin/bash
IFS='"'
while read line
do
test1=`echo "$line" | awk -F'"' '{print $1}'`
test2=`echo "$line" | awk -F '[|]' '{print $(NF-1)}' | awk -F'"' 'BEGIN {OFS=","} {print $2}'|awk '{$1=$1;print}'`
test3=`echo "$line" | awk -F'"' '{print $3}'|awk '{$1=$1;print}'`
if [[ -n "${test2}" && -z "${test3}" ]]; then
FID=`echo "${test1}"|awk -F"." '{ gsub(/-/,"",$0); for ( i = NF; i > 0; i-- ) printf("%s ",$i); printf("\n");}'| awk -F" " '{print $1}'`
RIC=`echo "${test1}"|sed -e 's/'.${FID}'//g'`
echo "$RIC , $FID , $test2" >> philout
else
echo "false"
fi
done < head_out_orig_phil
You are calling awk 6,400,000+ times, and sed 800,000+ times.
With 800000+ rows, you need awk, but you only need one call to awk, not eight (including one that does nothing) and one to sed for every line of the file.
OK, the Awk script has a problem whereby it is providing an output from the test when there is no characters so I am presuming it is spaces/tabs.
Can you help with this?
With the perl script it works OK, apart from on a couple of lines it fails due to a line with the highlighted character. Have you any ideas and if you could put some comments regards this script I would appreciate it. My perl is not too good.