search and replace in csv file

dondilip · February 27, 2012, 1:04am

I have csv file where I want the second column has to be replaced with value 1.

Source file
919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,2,2011-07-11T22:36:16.000+05:30,
917417384969,2,2011-07-11T22:33:26.000+05:30,

Final file

919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,1,2011-07-11T22:36:16.000+05:30,
917417384969,1,2011-07-11T22:33:26.000+05:30,

Kindly let me know the best efficient with sed or awk.

Thanks

itkamaraj · February 27, 2012, 1:08am

 
$ nawk -F, 'BEGIN{OFS=","}{$2=1;print}' input.txt
919568760477,1,2011-07-11T22:34:27.000+05:30,
919557735692,1,2011-07-11T22:36:16.000+05:30,
917417384969,1,2011-07-11T22:33:26.000+05:30,

michaelrozar17 · February 27, 2012, 1:20am

Or with Sed..

sed 's/,./,1/' inputfile

itkamaraj · February 27, 2012, 1:23am

if you have more than one digit in the second column, then try this.

 
sed 's/,\([0-9]*\),/,1,/' input.txt

dondilip · February 27, 2012, 2:10am

Micheal how does the sed select the second field in your above command.

itkamaraj · February 27, 2012, 2:16am

 
,.  --> find first comma, then followed by any one character 
,1 --> replace with comma 1 (,1)
 
we are not using g (global) here, so it replaces the first occurance

dondilip · February 27, 2012, 2:26am

,. means first comma followed by only one character. If it has to four characters then ,.... right?

Here is another file where I want to change the second column to 1010

9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,5004,20120120004809,

itkamaraj · February 27, 2012, 2:31am

try my suggestion in the #4 post

 
$ sed 's/,\([0-9]*\),/,1010,/' input.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1010,20120120004809,

---------- Post updated at 01:01 PM ---------- Previous update was at 01:00 PM ----------

characters then ,.... right? ---> yes you are correct

dondilip · February 27, 2012, 3:01am

itkamaraj:

try my suggestion in the #4 post
 
$ sed 's/,$[0-9]*$,/,1010,/' input.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1010,20120120004809,
---------- Post updated at 01:01 PM ---------- Previous update was at 01:00 PM ----------

Your command works flawlessly..One more doubt, this command will change all the entries in the file where-ever it matches. '/g' is not required right..

characters then ,.... right? ---> yes you are correct

itkamaraj · February 27, 2012, 3:10am

g is not requried, otherwise it will match the all the numeric which falls between two comma's

dondilip · February 27, 2012, 3:19am

Hi Kamaraj,

In your sed, it replace all the second column value to the replace value, but I want to replace only the one which matches column value not the others.

The sed output has both changed entries and non-changed entries. Is there a way to captures only the entries that matched and changed by the sed.?

Thanks in advance

itkamaraj · February 27, 2012, 3:26am

 
$ cat test.txt
9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,
 
$ sed '/5004/ s/,\([0-9]*\),/,1010,/' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

If you want to check only in the 2nd column, then use the below awk

 
 
$ nawk -F, -v OFS="," '$2=="5004" {$2="1010"}1' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

dondilip · February 27, 2012, 3:49am

itkamaraj:

 
$ cat test.txt
9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,
 
$ sed '/5004/ s/,\([0-9]*\),/,1010,/' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

If you want to check only in the 2nd column, then use the below awk

 
 
$ nawk -F, -v OFS="," '$2=="5004" {$2="1010"}1' test.txt
9788691010,1010,20120120002804,
9705645060,1010,20120120004753,
9014961516,1000,20120120004809,
9014961516,1010,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

Sorry man I dont have nawk package..Can I use awk instead?

itkamaraj · February 27, 2012, 3:51am

yes.. try with awk.

what OS you are using ?

post the ouput of the below command

 
uname -a

dondilip · February 27, 2012, 3:53am

Even the awk works, but I want the output to be only the ones which are replaced not all the entries,,Is it possible?

itkamaraj · February 27, 2012, 4:00am

i am not clear. post some sample data and the expected output.
you want to replace only the first occurance in the row as well ?

 
$ awk -F, -v OFS="," '{if(a==0 && $2=="5004"){$2="1010";a=1;print}else{print}}' test.txt
9788691010,1010,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,
 
$ cat test.txt
9788691010,5004,20120120002804,
9705645060,5004,20120120004753,
9014961516,1000,20120120004809,
9014961516,5004,20120120004809,
9014961516,1000,20120120004809,
9014961516,1000,20120120004809,

dondilip · February 27, 2012, 4:08am

Input file

919925623336,1,2011-07-11T23:33:31.000+05:30,
919099951769,2,2011-07-11T23:34:48.000+05:30,
918306699170,2,2011-07-11T23:32:20.000+05:30,

output File expected

919099951769,1,2011-07-11T23:34:48.000+05:30,
918306699170,1,2011-07-11T23:32:20.000+05:30,

I want only the output with only the entries which are replaced and not the other entries which are not replaced.

Thanks..

sorry for the confusion..

itkamaraj · February 27, 2012, 4:14am

$ awk -F, -v OFS=, '$2==2{$2=1;print $0}' test.txt
919099951769,1,2011-07-11T23:34:48.000+05:30,
918306699170,1,2011-07-11T23:32:20.000+05:30,

dondilip · February 27, 2012, 6:51am

Works perfectly....bye for now

---------- Post updated at 05:21 PM ---------- Previous update was at 03:48 PM ----------

Hi Kamaraj,

I actually run the awk on multiple files..and export the output to one big file. Is there a way where I can run awk and export the output to same file on which the awk is running.

Regards,

rangarasan · February 27, 2012, 7:10am

Hi,

Try this,

awk -F, -v OFS=, '$2==2{$2=1;a[NR]=$0;}END{for(i=1;i<=NR;i++){print a >FILENAME;}}' input_file

FILENAME - give you the input file name.

Cheers,
Ranga:)