how to remove tab space only in the column of a specific row

redse171 · September 19, 2012, 1:22pm

Hi,
I need help to remove tab delimited space in the $2 of a specific row. My file is like this:-

file1.txt

No_1    4    139    156
No_1    5    161    205
No_4    91    227    212
No_19   254    243    263
No_19   645   249    258
No_19   101  2492    2635
No_90   8    277    288

file2.txt

ID    L_254
NAME    L_254
START    39644
END    37193
LINE    unknown
TYPE    R
N    37736-37861
@@
ID    L_101
NAME   L_101
START    314257
END    312432
LINE    unknown
TYPE    R
@@
ID    L_8
NAME   L_8
START    3196078
END    3194948
LINE    unknown
TYPE    R
@@

i used a script like this to update my file2.txt with values of START and END in file1.txt.

FNR==NR{b[$2]=$3;f[$2]=$4; OFS="\t"; next}
$1=="ID" {id=substr($2,index($2,"_")+1)}
id in b {$2=($1=="START")?b[id]:(($1=="END")?f[id]:$2)}
1

My output is tab separated. The code above works great for the values update but the problem is after each '@@'. i don't want the column after each '@@" in tab separated. It should be considered as the end of the line. It should be just @@ instead of @@\t. Thanks in advance..

DGPickett · September 19, 2012, 2:40pm

sed 's/^\(your_first_pattern\)TAB\(your_end_pattern\)$/\1\2/' input_file >output_file

TAB is a literal tab, your_first_pattern and your_end_pattern must regular-expressions that define and accept what you want to keep.

redse171 · September 19, 2012, 4:03pm

Hi DGpickett,

Thanks for your response. I believe that your code is for one file only. I have thousands of separate files that i need to remove the tab. Is there any other way to do this? thanks

DGPickett · September 19, 2012, 4:10pm

As long as the pattern does not change, this is quick and robust:

 
find /top_directory_path -name your_pattern -type f | while read f
do
  sed ... $f >$f.new
  if [ "$( cmp $f $f.new 2>&1 )" = "" -s -s $f -a -s $f.new ]
  then
    mv $f $f.old
    mv $f.new $f
  else
    rm $f.new
  fi
done

If file names have spaces or metacharacters, put "$f" for $f.

redse171 · September 19, 2012, 4:45pm

Hi,

Can you please explain to me your code? i dont really familiar with sed. i tried your code couple of times but i got infinite loop in my shell.

DGPickett · September 19, 2012, 5:01pm

sed is a stream editor, it reads in a loop applying the script to each line read and then writing it. It never uses a temp file or runs out of space on huge data sets when you use it on pipes.

The command s/pattern/new_data/ is a substitute. The pattern is regular expressions (regex) slightly expanded to allow  pickup and \number put back down in the substitute. In your case, the regex can identify lines where the tab must be removed, and where in the line the tab is. For just every @@TAB, you could just scrub that phrase as many times on any line as it appears:

s/@@TAB/@@/g