Hi everyone,
Looking for a suggestion to improve the below script in which I�ve been working.
The thing is I have 3 separated AWK scripts that I need to apply over the inputfile, and for scripts (2) and (3) I have to use a "temp" file as their inputfile (inputfile_temp and inputfile_temp1 respectively).
I would like to join this 3 different scripts in a unique AWK whith "inputfile" as unique source file, without using temp files.
inputfile is as follow ($7 is empty):
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,
The script that works (Let say Divided_Script) is as follow (with each routine a little bit explained):
### 1-) Filter to search "pattern1" and "pattern2" within any column in inputfile ###
awk 'BEGIN{FS=OFS=","} /HEADER/||/pattern1/||/pattern2.*pattern1/||/pattern1.*pattern2/' inputfile > inputfile_temp
### 2-) 2nd filter to exclude lines containing "pattern4", "pattern5" and "pattern6" in column 2 ####
awk 'BEGIN{FS=OFS=","} $2 !~ /pattern4|pattern5|pattern6/' inputfile_temp > inputfile_temp1
### 3-) Make column 7 = Column 2 and after that renaming column 7 header with "NEW_HEADER" ###
awk 'BEGIN{FS=OFS=","} {$7=$2} NR==1{$7="NEW_HEADER"}
### 3.1-) Deleting the string "/Sub data1/Sub data2" for every line in column 7, which now has the same data of $2 ###
{sub(/\/.*/,"",$7)}
### 3.2-) Printing the final output in a new desired order ###
{print $1,$7,$3,$4,$5,$6,$2}' inputfile_temp1 > outputfile
The Desired and Correct Output using "inputfile" and "Divided_Script" is:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
But when I try to join the routines in a unique AWK script invoking only once AWK command with the next script (Let say Unified_Script):
*Basically removing "awk 'BEGIN{FS=OFS=","}" from routines (2) and (3).
### 1-) Filter to search "pattern1" and "pattern2" within any column in inputfile ###
awk 'BEGIN{FS=OFS=","} {/HEADER/||/pattern1/||/pattern2.*pattern1/||/pattern1.*pattern2/}
### 2-) 2nd filter to exclude lines containing "pattern4", "pattern5" and "pattern6" in column 2 ####
$2 !~ /pattern4|pattern5|pattern6/
### 3-) Make column 7 = Column 2 and after that renaming column 7 header with "NEW_HEADER" ###
{$7=$2} NR==1{$7="NEW_HEADER"}
### 4-) Deleting the string "/Sub data1/Sub data2" for every line in column 2 ###
{sub(/\/.*/,"",$7)}
### 5-) Printing the final output in a new desired order ###
{print $1,$7,$3,$4,$5,$6,$2}' inputfile > outputfile
Then the resulting output using "inputfile" and "Unified_Script" is wrong, and it seems that prints the original file merged with the lines processed by routines 3, 4 and 5 in "Unified_Script", but the without the filter that should apply routines 1 and 2 because appear lines that don�t contain pattern1 or pattern2.
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2
I hope somebody could help me to join this 3 scripts to work as I�ve explained.
Thanks in advance for any suggestion.