How to remove spaces from a file selectively?

Sharma331 · October 1, 2014, 6:18am

Hi i have a file in which i am doing some processing.

The code is as follows:

#!/bin/ksh
grep DATA File1.txt >> File2.txt
sed 's/DATA//' File2.txt | tr -d � � >> File4.xls

As you can see my output is going in a xl file.The output consist of four columns/feilds out of which the first column has Date and time which looks like:

12 Sep 2014 14:00 EST

so if i give tr -d ' ' It works fine for other feilds( because they have values without internal space ) but it deletes spaces for my Date and time feild which looks like :

12Sep201414:00EST

Is there a way where i can delete only the spaces in rest of the feild leaving my first column?

Akshay_Hegde · October 1, 2014, 6:24am

We need some more input from you, please post sample input (fields other than date & time stamp if any ) as well as expected output .

Thanks

Sharma331 · October 1, 2014, 6:52am

Input file File1:

xyz                                  Date                   Name        Age           State
------------------------------------------------------------------------------------
DATA                12 Sep 2014 14:00 EST        Alia             22           Chennai
SAND                12 Sep 2014 14:00 EST        Alia             22           Chennai
DATA                12 Sep 2014 14:00 EST        Ashika            22       Chennai
DATA                12 Sep 2014 14:00 EST        Bosky             22        Chennai
DATA                12 Sep 2014 14:00 EST        Swati             22         Chennai

Input file File2:

xyz                                  Date                   Name        Age           State
------------------------------------------------------------------------------------
DATA                12 Sep 2014 14:00 EST        Alia             22           Chennai
DATA                12 Sep 2014 14:00 EST        Ashika            22       Chennai
DATA                12 Sep 2014 14:00 EST        Bosky             22        Chennai
DATA                12 Sep 2014 14:00 EST        Swati             22         Chennai

As you can see there are some extra leading and trailling spaces.

input file File4(removes DATA)(xls file)

 Date                               Name        Age           State
-------------------------------------------------------------------------------
12Sep201414:00EST        Alia              22         Chennai
12Sep201414:00EST        Ashika          22         Chennai
12Sep201414:00EST        Bosky           22         Chennai
12Sep201414:00EST        Swati            22         Chennai

I dont want my date column to be edited with space i just want name age and state column to be removed the trailing and leading spaces.

RudiC · October 1, 2014, 7:09am

How about

sed 's/ //7g; s/DATA //p; 1s/^[^ ]* //p; 2!d' file1
Date Name Age State
------------------------------------------------------------------------------------
12 Sep 2014 14:00 EST Alia22Chennai
12 Sep 2014 14:00 EST Ashika22Chennai
12 Sep 2014 14:00 EST Bosky22Chennai
12 Sep 2014 14:00 EST Swati22Chennai

RavinderSingh13 · October 1, 2014, 7:17am

sharma331:

Input file File1:
xyz Date Name Age State
------------------------------------------------------------------------------------
DATA 12 Sep 2014 14:00 EST Alia 22 Chennai
SAND 12 Sep 2014 14:00 EST Alia 22 Chennai
DATA 12 Sep 2014 14:00 EST Ashika 22 Chennai
DATA 12 Sep 2014 14:00 EST Bosky 22 Chennai
DATA 12 Sep 2014 14:00 EST Swati 22 Chennai

Input file File2:
xyz Date Name Age State
------------------------------------------------------------------------------------
DATA 12 Sep 2014 14:00 EST Alia 22 Chennai
DATA 12 Sep 2014 14:00 EST Ashika 22 Chennai
DATA 12 Sep 2014 14:00 EST Bosky 22 Chennai
DATA 12 Sep 2014 14:00 EST Swati 22 Chennai

As you can see there are some extra leading and trailling spaces.

input file File4(removes DATA)(xls file)
Date Name Age State
-------------------------------------------------------------------------------
12Sep201414:00EST Alia 22 Chennai
12Sep201414:00EST Ashika 22 Chennai
12Sep201414:00EST Bosky 22 Chennai
12Sep201414:00EST Swati 22 Chennai

I dont want my date column to be edited with space i just want name age and state column to be removed the trailing and leading spaces.

Hello Sharma331,

Following may help you in same, please use code tags while posting commands and codes in posts.

awk -vvar=" " 'NR==1{$1="";print} NR==2{print}  NR>2{$1="";V=$(NF-2) var $(NF-1) var $NF;{gsub(/EST.*/,"EST",$0);gsub(/[[:space:]]/,X,$0);print $0 OFS V}}' Input_file

Output will be as follows.

 Date Name Age State
------------------------------------------------------------------------------------
12Sep201414:00EST Alia 22 Chennai
12Sep201414:00EST Ashika 22 Chennai
12Sep201414:00EST Bosky 22 Chennai
12Sep201414:00EST Swati 22 Chennai

Thanks,
R. Singh

ctsgnb · October 1, 2014, 8:06am

sed '2!{s/[^ ]* //};1!{s/ //;s/ //;s/ //;s/ //;}' yourfile

Using Rudi's 5g stuff :

sed '2!{s/[^ ]* //;};1!{s/ /#/g;s/#/ /5g;s/#//g;}' yourfile

# cat tst
xyz Date Name Age State
------------------------------------------------------------------------------------
DATA 12 Sep 2014 14:00 EST Alia 22 Chennai
SAND 12 Sep 2014 14:00 EST Alia 22 Chennai
DATA 12 Sep 2014 14:00 EST Ashika 22 Chennai
DATA 12 Sep 2014 14:00 EST Bosky 22 Chennai
DATA 12 Sep 2014 14:00 EST Swati 22 Chennai
 
# sed '2!{s/[^ ]* //;};1!{s/ /#/g;s/#/ /5g;s/#//g;}' tst
Date Name Age State
------------------------------------------------------------------------------------
12Sep201414:00EST Alia 22 Chennai
12Sep201414:00EST Alia 22 Chennai
12Sep201414:00EST Ashika 22 Chennai
12Sep201414:00EST Bosky 22 Chennai
12Sep201414:00EST Swati 22 Chennai

Just another one :

# sed '2!{s/[^ ]* //;};1!{s/ /#/5g;y/ #/- /;}' tst
Date Name Age State
------------------------------------------------------------------------------------
12-Sep-2014-14:00-EST Alia 22 Chennai
12-Sep-2014-14:00-EST Alia 22 Chennai
12-Sep-2014-14:00-EST Ashika 22 Chennai
12-Sep-2014-14:00-EST Bosky 22 Chennai
12-Sep-2014-14:00-EST Swati 22 Chennai

RudiC · October 1, 2014, 8:42am

Leaving out the code tags you hid all those multiple spaces from us thus leading to inappropriate solutions. With that new info, try

sed -r 's/ +/ /g; s/DATA //p; 1s/^[^ ]* //p; 2!d' file1
Date Name Age State
------------------------------------------------------------------------------------
12 Sep 2014 14:00 EST Alia 22 Chennai
12 Sep 2014 14:00 EST Ashika 22 Chennai
12 Sep 2014 14:00 EST Bosky 22 Chennai
12 Sep 2014 14:00 EST Swati 22 Chennai

Sharma331 · October 1, 2014, 8:50am

okay lets make my question very simple.
I have an excel sheet in which i just want to sed one space which is coming in the second column data.

column1        column2      column3      column4
date is here    _alia

The underscore in red color indicates the space. So now i just want to delete this one space from the entire second column.

RudiC · October 1, 2014, 9:00am

How would you propose to distinguish that space from the one(s) before? In awk default, ALL spaces are the field separators, so they would be gone anyhow. In sed, there's no fields, so you would need to know the space count in order to remove that one...

Sharma331 · October 1, 2014, 9:03am

rudic:

Leaving out the code tags you hid all those multiple spaces from us thus leading to inappropriate solutions. With that new info, try

sed -r 's/ +/ /g; s/DATA //p; 1s/^[^ ]* //p; 2!d' file1
Date Name Age State
------------------------------------------------------------------------------------
12 Sep 2014 14:00 EST Alia 22 Chennai
12 Sep 2014 14:00 EST Ashika 22 Chennai
12 Sep 2014 14:00 EST Bosky 22 Chennai
12 Sep 2014 14:00 EST Swati 22 Chennai

It gives an error to me saying sed -r is not recognized

---------- Post updated at 08:03 AM ---------- Previous update was at 08:02 AM ----------

The space count is one for that column which i want to remove

RudiC · October 1, 2014, 9:04am

Try

sed 's/  */ /g; s/DATA //p; 1s/^[^ ]* //p; 2!d' file

Sharma331 · October 1, 2014, 11:23am

Ok so its very simple i have a text file okay which has four columns like below:

column1          column2     column3      column4

16 jan 1994       _alia1        22        chennai 
17 jan 1994       _alia2        22        chennai 
18 jan 1994       _alia3        22        chennai 
19 jan 1994       _alia4        22        chennai

and from this file i want to remove that one space which is coming in name and move it to another file which is an excel sheet so that it comes like below:

column1          column2     column3      column4

16 jan 1994       alia1        22        chennai 
17 jan 1994       alia2        22        chennai 
18 jan 1994       alia3        22        chennai 
19 jan 1994       alia4        22        chennai

---------- Post updated at 10:23 AM ---------- Previous update was at 08:11 AM ----------

can we do this?? please reply i need help with this

Don_Cragun · October 1, 2014, 2:19pm

First, let me be very clear: If you have a text file, feeding it through awk , grep , sed , and/or tr is not going to magically convert that text file into the proprietary format of a Microsoft Excel Spreadsheet!

If you have the text file you have shown us in post #12 in this thread, you can create the output you say you want in that post as a text file using the command:

sed '3,$s/  / /' file.txt

Sharma331 · October 6, 2014, 6:08am

Hi Don,

sorry for the confusion and the mess created by me.
I have only the xls file which has this data

column1          column2     column3      column4

16 jan 1994       _alia1        22        chennai 
17 jan 1994       _alia2        22        chennai 
18 jan 1994       _alia3        22        chennai 
19 jan 1994       _alia4        22        chennai

Underscores means spaces. so there is only one leading space before the data in column 2 and the position of the space is shown above.
and i want to remove space from the second column to make it look like below:

column1          column2     column3      column4

16 jan 1994       alia1        22        chennai 
17 jan 1994       alia2        22        chennai 
18 jan 1994       alia3        22        chennai 
19 jan 1994       alia4        22        chennai

Don_Cragun · October 6, 2014, 6:28am

sharma331:

Hi Don,

sorry for the confusion and the mess created by me.
I have only the xls file which has this data
column1          column2     column3      column4

16 jan 1994       _alia1        22        chennai 
17 jan 1994       _alia2        22        chennai 
18 jan 1994       _alia3        22        chennai 
19 jan 1994       _alia4        22        chennai
Underscores means spaces. so there is only one leading space before the data in column 2 and the position of the space is shown above.
and i want to remove space from the second column to make it look like below:
column1          column2     column3      column4

16 jan 1994       alia1        22        chennai 
17 jan 1994       alia2        22        chennai 
18 jan 1994       alia3        22        chennai 
19 jan 1994       alia4        22        chennai

NO! What you have shown us is a text file not a binary Microsoft Excel spreadsheet file. An underscore and a space are very different characters. If you are saying that you have converted a space to an underscore to indicate which one of the nine spaces between 1994 and alia is to be deleted; I say it doesn't matter whether you delete the 2nd of those nine spaces (which the sed script I provided does) or the 9th of those nine spaces. Either way, you get the output you requested (as long as the input you showed us above is an accurate representation of the text file you have as an input file). Did you try the sed command I suggested, or have you determined by looking at it that it won't do what you want?

Please try it before deciding it won't work. The sed script I suggested produces exactly the output you said you want when I run it!

Sharma331 · October 6, 2014, 7:36am

don cragun:

NO! What you have shown us is a text file not a binary Microsoft Excel spreadsheet file. An underscore and a space are very different characters. If you are saying that you have converted a space to an underscore to indicate which one of the nine spaces between 1994 and alia is to be deleted; I say it doesn't matter whether you delete the 2nd of those nine spaces (which the sed script I provided does) or the 9th of those nine spaces. Either way, you get the output you requested (as long as the input you showed us above is an accurate representation of the text file you have as an input file). Did you try the sed command I suggested, or have you determined by looking at it that it won't do what you want?

Please try it before deciding it won't work. The sed script I suggested produces exactly the output you said you want when I run it!

I tried your code it doesn't work for me..
and i have a excel file with .xls extension. I am doing all the data processing in text file and finally transferring it to the xls file.

the xls file has that space.

Underscore is used to show the space its not the actual underscore.. I have used it to show the space,the leading space.

RudiC · October 6, 2014, 7:48am

Again (see post#9): How would you propose to distinguish that space from the one(s) before? Are those spaces or <TAB>s?

If it is a REAL .xls file (which is hard to believe reading all the posts), we cannot work on it with text tools. You need to run EXCEL and remove that column's leading spaces (or maybe the perl excel module?).

Sharma331 · October 6, 2014, 8:33am

You are correct i was talking about the one leading space inside the xls file.

---------- Post updated at 07:33 AM ---------- Previous update was at 07:18 AM ----------

my actual code is below:

#!/bin/ksh


tempsql=$EXPRS_ROOT/logs/temp1.txt
tempsql1=$EXPRS_ROOT/logs/temp2.txt
outputFile=$EXPRS_ROOT/logs/temp3.xls
outputFileName="RT Claim Dashboard Usage Report Cumulative.xls"

grep DATA $tempsql >> $tempsql1

sed 's/DATA//' $tempsql1 >> $outputFile

uuencode $outputFile "$outputFileName"  | mail -s "Report Cumulative" "alia@gmail.com"

exit 0

RudiC · October 6, 2014, 8:42am

Just calling a file "something.xls" does not make it a (binary) EXCEL file. What you provided above is producing text files, and in those you can't distinguish one space from the other, and removing the first of nine is as good as removing the last (as Don Cragun said). If his proposal doesn't fit your needs, I'm afraid you're out of luck.

Sharma331 · October 6, 2014, 9:02am

thanks..
But it do send me a mail with a proper xls file with four columns.