Unix Shell scripting, removing hex 0d 0a

mrsindhe87 · December 28, 2010, 4:28am

hi,
I have a file with data like this :

5963491,11926750,Policy Endorsement 1
Policy Endorsement 2
Policy Endorsement 3
Policy Endorsement 4
Policy Endorsement 5
Policy Endorsement 6
Policy Endorsement 7
5963492,11926751,Product[0].Quote Options[0].CoPolicy[0].CoLobs[0].CoLob[0].LwPolInput[0].LwXsLayerInfoRpt[0].LwXsLayerInfo[1].LwXsLayerCode
5963492,11926752,2
5963493,11926753,Product[0].Quote Options[0].SW Selected Forms[2].SW Selected Form Default

I want it to be like this:

5963491,11926750,Policy Endorsement 1 Policy Endorsement 2 Policy Endorsement 3 Policy Endorsement 4 Policy Endorsement 5 Policy Endorsement 6 Policy Endorsement 7
5963492,11926751,Product[0].Quote Options[0].CoPolicy[0].CoLobs[0].CoLob[0].LwPolInput[0].LwXsLayerInfoRpt[0].LwXsLayerInfo[1].LwXsLayerCode
5963492,11926752,2
5963493,11926753,Product[0].Quote Options[0].SW Selected Forms[2].SW Selected Form Default

There are few lines of data which needs to be append to the previous line. That is, the line should commence by a 7 or 8 digit number.
When i see the hex format of file (od -x file 1 ) , it contains 0d 0a. I want it to be only 0d. dos2unix wasn't much of a help. Appreciate your help. thanks

cabrao · December 28, 2010, 5:18am

awk 'ORS=/ Endorsement [0-6]$/?FS:RS' file

birei · December 28, 2010, 6:19am

Hi,

Using 'sed':

$ cat infile
5963491,11926750,Policy Endorsement 1
Policy Endorsement 2
Policy Endorsement 3
Policy Endorsement 4
Policy Endorsement 5
Policy Endorsement 6
Policy Endorsement 7
5963492,11926751,Product[0].Quote Options[0].CoPolicy[0].CoLobs[0].CoLob[0].LwPolInput[0].LwXsLayerInfoRpt[0].LwXsLayerInfo[1].LwXsLayerCode
5963492,11926752,2
5963493,11926753,Product[0].Quote Options[0].SW Selected Forms[2].SW Selected Form Default
$ sed -n '1 H; 2,$ { /^[^0-9]\{7,8\}/ H; /^[0-9]\{7,8\}/ { x; s/^\n//; s/\n/ /g; p} }; $ { /^[0-9]\{7,8\}/ {x;p} }' infile
5963491,11926750,Policy Endorsement 1 Policy Endorsement 2 Policy Endorsement 3 Policy Endorsement 4 Policy Endorsement 5 Policy Endorsement 6 Policy Endorsement 7
5963492,11926751,Product[0].Quote Options[0].CoPolicy[0].CoLobs[0].CoLob[0].LwPolInput[0].LwXsLayerInfoRpt[0].LwXsLayerInfo[1].LwXsLayerCode
5963492,11926752,2
5963493,11926753,Product[0].Quote Options[0].SW Selected Forms[2].SW Selected Form Default

Regards,
Birei

mrsindhe87 · December 29, 2010, 6:46am

Hi, What i meant is I have got a lot of data following the 7 digit number. Like this:

5834563,11133336,djkfhdfksdkfl
aaaaahhh 12 No
5834564,11133337,iorueureir rierei rere
qqqqrerr r
ruerirei reoprixm cm reopie jkldjas  kls
woewio

I want all the data to be appended to the previous 7 digit number. Like this :

5834563,11133336,djkfhdfksdkfl aaaaahhh 12 No
5834564,11133337,iorueureir rierei rere qqqqrerr r ruerirei reoprixm cm reopie jkldjas  kls woewio

Thanks. (please note that that 1st column value can be either 7 or 8 digit number)

m.d.ludwig · December 29, 2010, 7:33am

Since I like slicing and dicing my files with PERL:

$/ = "\r\n";
$\ = "\r";

my $line = undef;

while (<>) {
    chomp;

    if (m{^\d{7,8},}) {
        print $line if defined $line;
        $line = $_;
        next;
    }

    $line .= $_;
}

print $line if defined $line;

will generate:

963491,11926750,Policy Endorsement 1Policy Endorsement 2Policy Endorsement 3Policy Endorsement 4Policy Endorsement 5Policy Endorsement 6Policy Endorsement 7<CR>
5963492,11926751,Product[0].Quote Options[0].CoPolicy[0].CoLobs[0].CoLob[0].LwPolInput[0].LwXsLayerInfoRpt[0].LwXsLayerInfo[1].LwXsLayerCode<CR>
5963492,11926752,2<CR>
5963493,11926753,Product[0].Quote Options[0].SW Selected Forms[2].SW Selected Form Default<CR>

Now I explicitly set the end-of-line character to "\r" since I am running Linux on an x86 based system. Ymmv on a macos based system.

methyl · December 29, 2010, 7:37am

Very strange request.

Please show a representative sample before and after file displayed with "sed" to make control codes visible.

sed -n l filename

zedex · December 30, 2010, 1:43am

oops i read your second reply after posting this !! still you can go through this reply
--------------------------------------------------------------------------------

0d 0a - Hex representation of Control-M character / <C-R> / /r/n

I faced same kind of problem where i had to remove only /r & keep /n. if its only1 time thing then open file in vim

# In command mode 
:se list       # list all characters 
:se ff=unix  # set file format to unix, will remove /r from /r/n
:w!            # save the file 
:se ff?        # verify file format

or

perl -pi.back -e "s/\x0d0a/\x0a/g" <file_name>

above should do the trick, i did something like that but i can not recall it completely you can play with above perl expression see if it work! BTW take a backup of file in some other directory before you run this.

mrsindhe87 · December 30, 2010, 4:25am

Hi guys, thanks a lot for your response. Let me tell you the complete requirement:
File 1:
111,10
112,20
113,30
114,40

File 2:
111,51,jklfsdfj
dkfld
111,52,dadfdl das
112,53,ewuei ewi ewop
wqopie ew
112,54,aaa aa[1] qq
113,55,ee ee[4] rr[6]
ew 1
ewe 4

My task is to produce the output as:
10 jklfsdfj dkfld dadfdl das
20 ewuei ewi ewop wqopie ew aaa aa[1] qq
30 ee ee[4] rr[6] ew 1 ewe 4
That is, whenever column1 of file1 matches with column1 of file2,
output has to be column2 of file1 and column3 of file2 corresponding to this match.
Hence I used join. Like this:
join -1 1 -2 1 -o 1.2 2.3 file1 file2 >file3
But the output is errorneous since file2 has data not in a pattern.

@ludwig : thank you. But i need the code in shell scripting.

methyl · December 30, 2010, 5:36am

Please show carriage-return and line-feed characters in the before and after examples. If you are dealing with a non-unix format text file it will require special techniques.

m.d.ludwig · December 30, 2010, 9:36am

On MacOSs, the end-of-line character is \r, only.
Or this may be for some legacy app.
Stranger things have happened.

methyl · December 30, 2010, 10:20am

@m.d.ludwig
Hmm. I specialise in data conversion.
Unfortunately the O/P has not revealed the file format, the Operating System or any consistent data samples.
This thread is going nowhere fast.