Hi All,
I have a text file (code_data.txt) with the followig data.
AMAR AB123456 XYZ
KIRAN CB789 ABC
RAJ CS78890 XYZ
KAMESH A33535335 ABC
KUMAR MD678894 MAT
RITESH SR3535355 SAB
RAHUL PM366536666 SAS
RAMANA KS566767477747 ABC
DINESH SA6666464664646 SAB
SUJAN GD6674474747 SAS
SOM MD6546474777 XYZ
GANE MS657869933553666747 XYZ
I have to make three files using this file (code_data.txt) based on the below conditions.
The records which belongs to the code "XYZ" should be in one file,
the records which belongs to the code "ABC" should be in one file
and remaining records which belongs to all other codes should be in one file.
In the target three files we should exclude the code.
The expected files should be as below.
file1_XYZ
AMAR AB123456
RAJ CS78890
SOM MD6546474777
GANE MS657869933553666747
file2_ABC
KIRAN CB789
KAMESH A33535335
RAMANA KS566767477747
file3_REM
KUMAR MD678894
RITESH SR3535355
RAHUL PM366536666
DINESH SA6666464664646
SUJAN GD6674474747
Please help me.
Thanks in advance.
RudiC
May 6, 2015, 6:48am
2
Please use code tags as required by forum rules!
Any attempts from your side to solve the problem?
Hi,
Sorry for violating rules.
I have edited my post .
I am new to Unix. I am not getting any ideas how to cut the lines based on data.
Please help me.
Thanks.
RudiC
May 6, 2015, 7:36am
4
awk '$3 !~ /ABC|XYZ/ {$3="REM"} {print $1, $2 > "file_" $3}' code_data.txt
If you REALLY need the file numbering, we need to readdress.
Hi,
This is working fine as per my intial requirement.
awk '$3 !~ /ABC|XYZ/ {$3="REM"} {print $1, $2 > "file_" $3}' code_data.txt
Now the file has been changed from txt to csv
and also code is coming as first column in the file.
XYZ;AMAR ;AB123456
ABC;KIRAN;CB789
XYZ;RAJ;CS78890
ABC;KAMESH;A33535335
MAT;KUMAR;MD678894
SAB;RITESH;SR3535355
SAS;RAHUL;PM366536666
ABC;RAMANA;KS566767477747
SAB;DINESH;SA6666464664646
SAS;SUJAN;GD6674474747
XYZ;SOM ;MD6546474777
XYZ;GANE ;MS657869933553666747
awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > "file_" $1}' code_data.csv
file1_XYZ
;AMAR ;AB123456
;RAJ ;CS78890
;SOM ;MD6546474777
;GANE ;MS657869933553666747
The intial ";" should not come in the file.
The file should be as below.The has to be created in a particular directory.
file1_XYZ
AMAR ;AB123456
RAJ ;CS78890
SOM ;MD6546474777
GANE ;MS657869933553666747
I tried the below code its not working.
awk '$1 !~ /ABC|XYZ/ {$1="REM"} {gsub(";","",$2);print $2, $3 > $MSD_DIR"file_" $1}' code_data.csv
Please help me.
Thanks.
sea
May 18, 2015, 6:12am
6
Heya
oIFS="$IFS"
IFS=";"
file_abc=~/abc.csv
file_xyz=~/xyz.csv
file_rest=~/rest.csv
while read id name num;do
case $id in
ABC) out=$file_abc ;;
XYZ) out=$file_xyz ;;
*) out=$file_rest ;;
esac
echo "$name ; $num" >> "$out"
done<code_data.csv
IFS="$oIFS"
Hope this helps
Hi,
This is giving some syntax error.
awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > $MSD_DIR"file_" $1}' code_data.csv
And also the files should be created in the directory $MSD_DIR
Thanks
RudiC
May 18, 2015, 6:49am
8
Not only the field order changed, but the field separator as well. Plus, you can't simply use shell variables in awk scripts; you'll have to assign them to awk
variables. Try
awk '$1 !~ /ABC|XYZ/ {$1="REM"} {print $2, $3 > DIR"/file_" $1}' FS=";" OFS=";" DIR="$MSD_DIR" code_data.csv
awk -F\; '{ if ( match($1,"XYZ")) print "echo "$2,$3 " >> File_"$1;
if ( match($1,"ABC")) print "echo "$2,$3 " >> File_"$1;
if ( ( $1 != "XYZ" ) && ( $1 != "ABC" ) ) print "echo "$2,$3 " >> File_REM"}' code_data.csv | sh -x
Hi Rudi,
Thanks for the script.
Your script is working fine.
I added some more for file format but it's not giving correct file name.
For some files some spaces are added in the file name.
date_yyyymmdd=$(my_date "" -e"%Y%m%d")
file_format="_$date_yyyymmdd.csv
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv
I got the file names as below.
file_ABC _20150518.csv
file_XYZ _20150518.csv
file_MSORT _20150518.csv
file_SDDCCR_20150518.csv
Please help me.
Thanks.
RudiC
May 18, 2015, 9:29am
11
I don't believe that with your above code snippet you'll have those file names. With
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1 file_out}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv
I'm getting
file_ABC_20150518.csv
file_REM_20150518.csv
file_XYZ_20150518.csv
so I can't reproduce your problem.
---------- Post updated at 15:29 ---------- Previous update was at 15:27 ----------
Not all awk
s handle the concatenation > DIR"/file_" $1 file_out
for the output file name correctly, though. Should that be the case for you, you'll need to compose the file name beforehand into a variable.
Hi Rudi,
Might be You tried with old csv file. That is the reason you were not able to reproduce this.
Can u please try with this file.
XYZ;AMAR ;AB123456
ABC;KIRAN;CB789
XYZ;RAJ;CS78890
ABC;KAMESH;A33535335
MAT;KUMAR;MD678894
SAB;RITESH;SR3535355
SAS;RAHUL;PM366536666
ABC;RAMANA;KS566767477747
SAB;DINESH;SA6666464664646
SAS;SUJAN;GD6674474747
XYZ;SOM ;MD6546474777
XYZ;GANE ;MS657869933553666747
MSORT;DINESH;SA6666464664646
MSORT;SUJAN;GD6674474747
MSORT;SOM ;MD6546474777
SDDCCR;DINESH;SA6666464664646
SDDCCR;SUJAN;GD6674474747
SDDCCR;SOM ;MD6546474777
Thanks
sea
May 18, 2015, 9:53am
13
Must it be awk
?
Any specific reason why this wouldnt work?
Cheers
RudiC
May 18, 2015, 9:58am
14
I get these files with your new file above:
file_ABC_20150518.csv
file_MSORT_20150518.csv
file_REM_20150518.csv
file_SDDCCR_20150518.csv
file_XYZ_20150518.csv
--> file_ABC_20150518.csv:
KIRAN;CB789
KAMESH;A33535335
RAMANA;KS566767477747
--> file_MSORT_20150518.csv:
DINESH;SA6666464664646
SUJAN;GD6674474747
SOM ;MD6546474777
--> file_REM_20150518.csv:
KUMAR;MD678894
RITESH;SR3535355
RAHUL;PM366536666
DINESH;SA6666464664646
SUJAN;GD6674474747
--> file_SDDCCR_20150518.csv:
DINESH;SA6666464664646
SUJAN;GD6674474747
SOM ;MD6546474777
--> file_XYZ_20150518.csv:
AMAR ;AB123456
RAJ;CS78890
SOM ;MD6546474777
GANE ;MS657869933553666747
Are you aware that there are many trailing spaces in many lines of your new file? It doesn't hurt for the task we're tackling here, but it might be unnecessary.
Hi,
I have added some more codes "MSORT and SDDCCR" in my CSV file "code_data.csv"
I have to generate saparate files for these codes also.
file_ABC_20150518.csv
file_XYZ_20150518.csv
file_MSORT_20150518.csv
file_SDDCCR_20150518.csv
file_REM_20150518.csv
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1 file_out}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv
Why you got only belowthree files.It suppose to be five files.
file_ABC_20150518.csv
file_REM_20150518.csv
file_XYZ_20150518.csv
Thanks,
RudiC
May 18, 2015, 10:10am
16
I can find three:
the IFS is set to the empty string; the semicolon should be quoted.
there's two esac
s but no done
(for the while...do
)
"$out" is overwritten for every line, not appended.
If these are corrected, that snippet works as required.
---------- Post updated at 16:10 ---------- Previous update was at 16:08 ----------
rock_plsql:
Hi,
I have added some more codes "MSORT and SDDCCR" in my CSV file "code_data.csv"
I have to generate saparate files for these codes also.
file_ABC_20150518.csv
file_XYZ_20150518.csv
file_MSORT_20150518.csv
file_SDDCCR_20150518.csv
file_REM_20150518.csv
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"} {print $2, $3 > DIR"/file_" $1 file_out}' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv
Why you got only belowthree files.It suppose to be five files.
ile_ABC_20150518.csv
file_REM_20150518.csv
file_XYZ_20150518.csv
Thanks,
Who are you talking to? I proved that the required five files are generated and do have the correct contents?
I would use grep rather than awk. But if your professor wants awk,
then you need to use awk...
grep "ABC" code_data.txt > file1.csv
grep "XYZ" code_data.txt > file2.csv
grep -v "ABC" code_data.txt | grep -v "XYZ" > file3.csv
sea
May 18, 2015, 10:30am
18
Ops, you're right, sorry.
Its fixed now.
To add these new 'condititions', simply add a modified line similar to the existing ones for ABC and XYZ.
hth
Hi All,
I ran this script I got the files as
awk '$1 !~ /ABC|XYZ|MSORT|SDDCCR/ {$1="REM"}
{print $2, $3 > DIR"/file_" $1 file_out}
' FS=";" OFS=";" DIR="$MSD_DIR" file_out="$file_format" code_data.csv
file_ABC _20150518.csv
file_XYZ _20150518.csv
file_MSORT _20150518.csv
file_SDDCCR_20150518.csv
file_REM _20150518.csv
What is wrong with me.
Please help me.
Thanks.
RudiC
May 18, 2015, 12:34pm
20
You may have <TAB>s in your input file that won't be removed as the field separator is ";". Post the output of od -ctx1
for a few lines of your data file.