search and replace in csv file

I have csv file where I want the second column has to be replaced with value 1.

Source file
919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,2,2011-07-11T22:36:16.000+05:30,
917417384969,2,2011-07-11T22:33:26.000+05:30,

Final file

919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,1,2011-07-11T22:36:16.000+05:30,
917417384969,1,2011-07-11T22:33:26.000+05:30,

Kindly let me know the best efficient with sed or awk.

Thanks

 
$ nawk -F, 'BEGIN{OFS=","}{$2=1;print}' input.txt
919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,1,2011-07-11T22:36:16.000+05:30,
917417384969,1,2011-07-11T22:33:26.000+05:30,

Or with Sed..

sed 's/,./,1/' inputfile

if you have more than one digit in the second column, then try this.

 
sed 's/,\([0-9]*\),/,1,/' input.txt

Micheal how does the sed select the second field in your above command.

 
,.  --> find first comma, then followed by any one character 
,1 --> replace with comma 1 (,1)
 
we are not using g (global) here, so it replaces the first occurance
 

,. means first comma followed by only one character. If it has to four characters then ,.... right?

Here is another file where I want to change the second column to 1010

9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,5004,20120120004809,

try my suggestion in the #4 post

 
$ sed 's/,\([0-9]*\),/,1010,/' input.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1010,20120120004809,

---------- Post updated at 01:01 PM ---------- Previous update was at 01:00 PM ----------

characters then ,.... right? ---> yes you are correct

1 Like

g is not requried, otherwise it will match the all the numeric which falls between two comma's

Hi Kamaraj,

In your sed, it replace all the second column value to the replace value, but I want to replace only the one which matches column value not the others.

The sed output has both changed entries and non-changed entries. Is there a way to captures only the entries that matched and changed by the sed.?

Thanks in advance

 
$ cat test.txt
9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,
 
$ sed '/5004/ s/,\([0-9]*\),/,1010,/' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

If you want to check only in the 2nd column, then use the below awk

 
 
$ nawk -F, -v OFS="," '$2=="5004" {$2="1010"}1' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

Sorry man I dont have nawk package..Can I use awk instead?

yes.. try with awk.

what OS you are using ?

post the ouput of the below command

 
uname -a

Even the awk works, but I want the output to be only the ones which are replaced not all the entries,,Is it possible?

i am not clear. post some sample data and the expected output.
you want to replace only the first occurance in the row as well ?

 
$ awk -F, -v OFS="," '{if(a==0 && $2=="5004"){$2="1010";a=1;print}else{print}}' test.txt
9788691010,1010,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,
 
$ cat test.txt
9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

Input file

919925623336,1,2011-07-11T23:33:31.000+05:30,
919099951769,2,2011-07-11T23:34:48.000+05:30,
918306699170,2,2011-07-11T23:32:20.000+05:30,

output File expected

919099951769,1,2011-07-11T23:34:48.000+05:30,
918306699170,1,2011-07-11T23:32:20.000+05:30,

I want only the output with only the entries which are replaced and not the other entries which are not replaced.

Thanks..

sorry for the confusion..

$ awk -F, -v OFS=, '$2==2{$2=1;print $0}' test.txt
919099951769,1,2011-07-11T23:34:48.000+05:30,
918306699170,1,2011-07-11T23:32:20.000+05:30,

Works perfectly....bye for now

---------- Post updated at 05:21 PM ---------- Previous update was at 03:48 PM ----------

Hi Kamaraj,

I actually run the awk on multiple files..and export the output to one big file. Is there a way where I can run awk and export the output to same file on which the awk is running.

Regards,

Hi,

Try this,

awk -F, -v OFS=, '$2==2{$2=1;a[NR]=$0;}END{for(i=1;i<=NR;i++){print a >FILENAME;}}' input_file

FILENAME - give you the input file name.

Cheers,
Ranga:)

1 Like