Need script for making files based on some conditions.

ROCK_PLSQL · May 6, 2015, 5:40am

Hi All,

I have a text file (code_data.txt) with the followig data.

AMAR     AB123456                  XYZ
KIRAN    CB789                        ABC
RAJ      CS78890                       XYZ
KAMESH   A33535335                 ABC
KUMAR    MD678894                   MAT
RITESH   SR3535355                  SAB
RAHUL    PM366536666               SAS
RAMANA   KS566767477747         ABC
DINESH   SA6666464664646         SAB
SUJAN    GD6674474747              SAS
SOM      MD6546474777              XYZ
GANE     MS657869933553666747  XYZ

I have to make three files using this file (code_data.txt) based on the below conditions.

The records which belongs to the code "XYZ" should be in one file,
the records which belongs to the code "ABC" should be in one file
and remaining records which belongs to all other codes should be in one file.

In the target three files we should exclude the code.

The expected files should be as below.

file1_XYZ

AMAR     AB123456               
RAJ       CS78890                
SOM      MD6546474777          
GANE     MS657869933553666747   

file2_ABC

KIRAN     CB789                  
KAMESH   A33535335              
RAMANA   KS566767477747         

file3_REM

KUMAR    MD678894               
RITESH   SR3535355              
RAHUL    PM366536666            
DINESH   SA6666464664646        
SUJAN    GD6674474747

Please help me.

Thanks in advance.

RudiC · May 6, 2015, 6:48am

Please use code tags as required by forum rules!

Any attempts from your side to solve the problem?

ROCK_PLSQL · May 6, 2015, 7:22am

Hi,

Sorry for violating rules.
I have edited my post .

I am new to Unix. I am not getting any ideas how to cut the lines based on data.

Please help me.

Thanks.

RudiC · May 6, 2015, 7:36am

awk '$3 !~ /ABC|XYZ/ {$3="REM"} {print $1, $2 > "file_" $3}' code_data.txt

If you REALLY need the file numbering, we need to readdress.

ROCK_PLSQL · May 18, 2015, 5:54am

Hi,

This is working fine as per my intial requirement.

awk '$3 !~ /ABC|XYZ/ {$3="REM"} {print $1, $2 > "file_" $3}' code_data.txt

Now the file has been changed from txt to csv
and also code is coming as first column in the file.

XYZ;AMAR ;AB123456               
ABC;KIRAN;CB789                        
XYZ;RAJ;CS78890                       
ABC;KAMESH;A33535335                 
MAT;KUMAR;MD678894                   
SAB;RITESH;SR3535355                  
SAS;RAHUL;PM366536666               
ABC;RAMANA;KS566767477747         
SAB;DINESH;SA6666464664646         
SAS;SUJAN;GD6674474747              
XYZ;SOM ;MD6546474777              
XYZ;GANE ;MS657869933553666747

awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > "file_" $1}' code_data.csv

file1_XYZ

;AMAR    ;AB123456               
;RAJ       ;CS78890                
;SOM      ;MD6546474777          
;GANE     ;MS657869933553666747

The intial ";" should not come in the file.
The file should be as below.The has to be created in a particular directory.

file1_XYZ

AMAR    ;AB123456               
RAJ       ;CS78890                
SOM      ;MD6546474777          
GANE     ;MS657869933553666747

I tried the below code its not working.

awk '$1 !~ /ABC|XYZ/ {$1="REM"} {gsub(";","",$2);print $2, $3 > $MSD_DIR"file_" $1}' code_data.csv

Please help me.
Thanks.

sea · May 18, 2015, 6:12am

Heya

oIFS="$IFS"
IFS=";"

file_abc=~/abc.csv
file_xyz=~/xyz.csv
file_rest=~/rest.csv

while read id name num;do
	case $id in
	ABC)	out=$file_abc	;;
	XYZ)	out=$file_xyz	;;
	*)	out=$file_rest	;;
	esac
	echo "$name ; $num" >> "$out"
done<code_data.csv
IFS="$oIFS"

Hope this helps

ROCK_PLSQL · May 18, 2015, 6:46am

Hi,

This is giving some syntax error.

awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > $MSD_DIR"file_" $1}' code_data.csv

And also the files should be created in the directory $MSD_DIR

Thanks

RudiC · May 18, 2015, 6:49am

Not only the field order changed, but the field separator as well. Plus, you can't simply use shell variables in awk scripts; you'll have to assign them to awk variables. Try

awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > DIR"/file_" $1}' FS=";" OFS=";" DIR="$MSD_DIR" code_data.csv

subrkann · May 18, 2015, 7:10am

awk -F\; '{ if ( match($1,"XYZ")) print "echo "$2,$3 " >> File_"$1;
            if ( match($1,"ABC")) print "echo "$2,$3 " >> File_"$1;
            if ( ( $1 != "XYZ" ) && ( $1 != "ABC" ) ) print "echo "$2,$3 " >> File_REM"}' code_data.csv | sh -x

ROCK_PLSQL · May 18, 2015, 8:50am

Hi Rudi,

Thanks for the script.

Your script is working fine.

I added some more for file format but it's not giving correct file name.
For some files some spaces are added in the file name.

date_yyyymmdd=$(my_date "" -e"%Y%m%d")
file_format="_$date_yyyymmdd.csv
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv

I got the file names as below.

file_ABC   _20150518.csv
file_XYZ   _20150518.csv
file_MSORT _20150518.csv
file_SDDCCR_20150518.csv

Please help me.

Thanks.

RudiC · May 18, 2015, 9:29am

I don't believe that with your above code snippet you'll have those file names. With

awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1 file_out}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv

I'm getting

file_ABC_20150518.csv
file_REM_20150518.csv
file_XYZ_20150518.csv

so I can't reproduce your problem.

---------- Post updated at 15:29 ---------- Previous update was at 15:27 ----------

Not all awk s handle the concatenation > DIR"/file_" $1 file_out for the output file name correctly, though. Should that be the case for you, you'll need to compose the file name beforehand into a variable.

ROCK_PLSQL · May 18, 2015, 9:47am

Hi Rudi,

Might be You tried with old csv file. That is the reason you were not able to reproduce this.

Can u please try with this file.

XYZ;AMAR ;AB123456               
ABC;KIRAN;CB789                        
XYZ;RAJ;CS78890                       
ABC;KAMESH;A33535335                 
MAT;KUMAR;MD678894                   
SAB;RITESH;SR3535355                  
SAS;RAHUL;PM366536666               
ABC;RAMANA;KS566767477747         
SAB;DINESH;SA6666464664646         
SAS;SUJAN;GD6674474747              
XYZ;SOM ;MD6546474777              
XYZ;GANE ;MS657869933553666747
MSORT;DINESH;SA6666464664646         
MSORT;SUJAN;GD6674474747              
MSORT;SOM ;MD6546474777 
SDDCCR;DINESH;SA6666464664646         
SDDCCR;SUJAN;GD6674474747              
SDDCCR;SOM ;MD6546474777

Thanks

sea · May 18, 2015, 9:53am

Must it be awk ?
Any specific reason why this wouldnt work?

Cheers

RudiC · May 18, 2015, 9:58am

I get these files with your new file above:

file_ABC_20150518.csv
file_MSORT_20150518.csv
file_REM_20150518.csv
file_SDDCCR_20150518.csv
file_XYZ_20150518.csv
-->  file_ABC_20150518.csv:
KIRAN;CB789                        
KAMESH;A33535335                 
RAMANA;KS566767477747         
-->  file_MSORT_20150518.csv:
DINESH;SA6666464664646         
SUJAN;GD6674474747              
SOM ;MD6546474777 
-->  file_REM_20150518.csv:
KUMAR;MD678894                   
RITESH;SR3535355                  
RAHUL;PM366536666               
DINESH;SA6666464664646         
SUJAN;GD6674474747              
-->  file_SDDCCR_20150518.csv:
DINESH;SA6666464664646         
SUJAN;GD6674474747              
SOM ;MD6546474777
-->  file_XYZ_20150518.csv:
AMAR ;AB123456               
RAJ;CS78890                       
SOM ;MD6546474777              
GANE ;MS657869933553666747

Are you aware that there are many trailing spaces in many lines of your new file? It doesn't hurt for the task we're tackling here, but it might be unnecessary.

ROCK_PLSQL · May 18, 2015, 10:00am

Hi,

I have added some more codes "MSORT and SDDCCR" in my CSV file "code_data.csv"
I have to generate saparate files for these codes also.

file_ABC_20150518.csv
file_XYZ_20150518.csv
file_MSORT_20150518.csv
file_SDDCCR_20150518.csv
file_REM_20150518.csv

awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1 file_out}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv

Why you got only belowthree files.It suppose to be five files.

file_ABC_20150518.csv
file_REM_20150518.csv
file_XYZ_20150518.csv

Thanks,

RudiC · May 18, 2015, 10:10am

I can find three:

the IFS is set to the empty string; the semicolon should be quoted.
there's two esac s but no done (for the while...do )
"$out" is overwritten for every line, not appended.

If these are corrected, that snippet works as required.

---------- Post updated at 16:10 ---------- Previous update was at 16:08 ----------

Who are you talking to? I proved that the required five files are generated and do have the correct contents?

gandolf989 · May 18, 2015, 10:21am

I would use grep rather than awk. But if your professor wants awk,
then you need to use awk...

grep    "ABC" code_data.txt                 > file1.csv
grep    "XYZ" code_data.txt                 > file2.csv
grep -v "ABC" code_data.txt | grep -v "XYZ" > file3.csv

sea · May 18, 2015, 10:30am

Ops, you're right, sorry.
Its fixed now.
To add these new 'condititions', simply add a modified line similar to the existing ones for ABC and XYZ.

hth

ROCK_PLSQL · May 18, 2015, 11:51am

Hi All,

I ran this script I got the files as

awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} 
{print $2, $3 > DIR"/file_" $1 file_out}
' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv

file_ABC   _20150518.csv
file_XYZ   _20150518.csv
file_MSORT _20150518.csv
file_SDDCCR_20150518.csv
file_REM   _20150518.csv

What is wrong with me.

Please help me.

Thanks.

RudiC · May 18, 2015, 12:34pm

You may have <TAB>s in your input file that won't be removed as the field separator is ";". Post the output of od -ctx1 for a few lines of your data file.