system
February 13, 2008, 3:17am
1
Hi,
Iam having the file as follows:
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
I need to take the lines that occurs more than once.
Actually i have used uniq -d filename. It is taking corectly with the file having lines that are not having the spaces.
Here our file has lines with lots of spaces inbetween.
Can anybody help me in this ??
Thanks
agn
February 13, 2008, 3:43am
2
cat filename | tr -s '[:space:]' | uniq -d
That should help.
HPAVC
February 13, 2008, 3:48am
3
Do before you uniq -d, you want to alter the input to make them have less white space? Your example doesn't seem to suffer from this problem or your not showing the lines that don't suffer from too much white space.
None the less "tr --squeeze-repeats" might be the answer for something like this:
$ cat >tmp
one two
one two
three four
three four
^d
$ cat tmp | tr -s " " | uniq -d
one two
three four
system
February 13, 2008, 4:11am
4
Iam having the file as follows:
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
The abouve 2 methods are not working here...pls help..
HPAVC
February 13, 2008, 4:28am
5
squeeze those characters are they tabs or something? You said spaces if its whitespace then you need you squeeze accordingly.
system
February 13, 2008, 4:32am
6
Its all white spaces and not tabs.
how to sqeeze???
Try this out.....
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' <filename>
system
February 13, 2008, 5:06am
8
Iam getting error as using this asort logic.
can u tell us what error u are facing by using asort......b'cuse for me it's working fine as per u r requirement......
and it would be better if you could post the script/error , u have the problem.......
let
cat duplicate.txt
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
then
$ awk '{ a[$1]=$0 }
> END{
> n = asort(a)
> for (i=1;i<=n;i++) print a
[i]> }' duplicate.txt
AQWSEDFR298099000003537 poiuty@hpoiyuj-usa_com 2008-02-020000823.15 0011676 00017.00 2008-03-01HJORTH,ROGER C 000000000000051.66 000000000000040.00 CBB00010000000906
KILOPJUY23415000000003537 roaringrat@hscd_com 2008-02-020014243.99 0000758 00284.00 2008-03-01SEMENZA, CATHY E 000000000000016.03 000000000000000.00 CBB00010000000920
OPIUYTREE0986666544443322 dcsx@olivaa.net 123456678-rererrrde 09655584-35=6 77778
QWASEDRF1234567890098765 abc@quebex.com 000000000-932333 678394-56=3 9033894
asort is a GNU Awk extension ...
system
February 13, 2008, 5:32am
12
Iam using a shell script
if cat old_extract.dat
then
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' old_extract.dat
fi
and iam getting the error as
awk: 0602-553 Function asort is not defined.
The input line number is 5. The file is old_extract.dat.
The source line number is 3.
nivas:
Iam using a shell script
if cat old_extract.dat
then
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' old_extract.dat
fi
and iam getting the error as
awk: 0602-553 Function asort is not defined.
The input line number is 5. The file is old_extract.dat.
The source line number is 3.
made mistake ....while using awk and cat simultaneously.....doesn't make any sense ............
simply execute like this,
awk '{ a[$1]=$0 }
END{
n = asort(a)
for (i=1;i<=n;i++) print a
[i]}' old_extract.dat
if old_extract.dat is the detailed dat file where lists are there....
please follow the steps mentioned in my previous post on this same issues.
please do practice in different combination of looping and it's usage, so that you will able to know why awk(asort) throws error while using inside cat command. best of luck:b:
system
February 13, 2008, 5:52am
14
I executed this also
$ awk '{ a[$1]=$0 }
> END{
> n = asort(a)
> for (i=1;i<=n;i++) print a
[i]> }' old_extract.dat
awk: 0602-553 Function asort is not defined.
The input line number is 5. The file is old_extract.dat.
The source line number is 3.
what is u r awk version ?????? asort is a GNU awk extension ... as already told by radoulov .