Shel program file handling

system · February 13, 2008, 3:17am

Hi,

Iam having the file as follows:

QWASEDRF1234567890098765     abc@quebex.com            000000000-932333       678394-56=3   9033894
QWASEDRF1234567890098765     abc@quebex.com            000000000-932333       678394-56=3   9033894
OPIUYTREE0986666544443322     dcsx@olivaa.net              123456678-rererrrde      09655584-35=6   77778
OPIUYTREE0986666544443322     dcsx@olivaa.net              123456678-rererrrde      09655584-35=6   77778

I need to take the lines that occurs more than once.

Actually i have used uniq -d filename. It is taking corectly with the file having lines that are not having the spaces.

Here our file has lines with lots of spaces inbetween.

Can anybody help me in this ??

Thanks

agn · February 13, 2008, 3:43am

cat filename | tr -s '[:space:]' | uniq -d

That should help.

HPAVC · February 13, 2008, 3:48am

Do before you uniq -d, you want to alter the input to make them have less white space? Your example doesn't seem to suffer from this problem or your not showing the lines that don't suffer from too much white space.

None the less "tr --squeeze-repeats" might be the answer for something like this:

$ cat >tmp
one two
one      two
three four
three four
^d

$ cat tmp | tr -s " " | uniq -d
one two
three four

system · February 13, 2008, 4:11am

Iam having the file as follows:

KILOPJUY23415000000003537   roaringrat@hscd_com                                                                                  2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E                              000000000000016.03 000000000000000.00 CBB00010000000920
KILOPJUY23415000000003537   roaringrat@hscd_com                                                                                  2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E                              000000000000016.03 000000000000000.00 CBB00010000000920
AQWSEDFR298099000003537   poiuty@hpoiyuj-usa_com                                                                                2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C                                000000000000051.66 000000000000040.00 CBB00010000000906
AQWSEDFR298099000003537   poiuty@hpoiyuj-usa_com                                                                                2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C                                000000000000051.66 000000000000040.00 CBB00010000000906

The abouve 2 methods are not working here...pls help..

HPAVC · February 13, 2008, 4:28am

squeeze those characters are they tabs or something? You said spaces if its whitespace then you need you squeeze accordingly.

system · February 13, 2008, 4:32am

Its all white spaces and not tabs.

how to sqeeze???

manas_ranjan · February 13, 2008, 4:49am

Try this out.....

awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' <filename>

system · February 13, 2008, 5:06am

Iam getting error as using this asort logic.

manas_ranjan · February 13, 2008, 5:14am

can u tell us what error u are facing by using asort......b'cuse for me it's working fine as per u r requirement......
and it would be better if you could post the script/error , u have the problem.......

manas_ranjan · February 13, 2008, 5:18am

let
cat duplicate.txt
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
then
$ awk '{ a[$1]=$0 }
> END{
> n = asort(a)
> for (i=1;i<=n;i++) print a
[i]> }' duplicate.txt
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894

radoulov · February 13, 2008, 5:21am

asort is a GNU Awk extension ...

system · February 13, 2008, 5:32am

Iam using a shell script
if cat old_extract.dat
then
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' old_extract.dat
fi

and iam getting the error as

awk: 0602-553 Function asort is not defined.
The input line number is 5. The file is old_extract.dat.
The source line number is 3.

manas_ranjan · February 13, 2008, 5:46am

made mistake ....while using awk and cat simultaneously.....doesn't make any sense ............
simply execute like this,
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' old_extract.dat

if old_extract.dat is the detailed dat file where lists are there....
please follow the steps mentioned in my previous post on this same issues.

please do practice in different combination of looping and it's usage, so that you will able to know why awk(asort) throws error while using inside cat command. best of luck:b:

system · February 13, 2008, 5:52am

I executed this also

$ awk '{ a[$1]=$0 }
> END{
> n = asort(a)
> for (i=1;i<=n;i++) print a
[i]> }' old_extract.dat
awk: 0602-553 Function asort is not defined.
The input line number is 5. The file is old_extract.dat.
The source line number is 3.

manas_ranjan · February 13, 2008, 6:04am

what is u r awk version ?????? asort is a GNU awk extension ... as already told by radoulov .