Uniq or sort -u or similar only between { }

fugitivus · November 7, 2014, 2:44am

Hi !
I am trying to remove doubbled entrys in a textfile only between delimiters.
Like that example but i dont know how to do that with sort or similar.

input:

{
   aaa
   aaa
}
{
   aaa
   aaa
}

output:

{
   aaa
}
{
   aaa
}

i would be pleasured for every help !
Mfg Fugitivus

saps19 · November 7, 2014, 3:18am

Please post whatever you tried till now.

fugitivus · November 7, 2014, 3:29am

i have tried nothing, i have read the man page for sort but i didnt find any idea how to do that, same thing on uniq i didnt know if uniq or sort can do that thinks.
I am yust beginning with doing such things in a shell script
Because of that i have posted here to have any evidence what i should look for.

Mfg Fugitivus

RavinderSingh13 · November 7, 2014, 4:41am

Hello Fugitivus,

Following may help you in same, let's say we have following input file.
Input file:

cat tst4567
{
   aaa
   bbb
   aaa
   ccc
}
{
   aaa
   aaa
}

Then we can use following code.

awk '/\{/ {print $0} !/\{/ && !/\}/{gsub(/[[:space:]]/,A,$0);X[$0]} /\}/ {for(u in X){print u;delete X}{print $0}}' tst4567

Following will be the output on same.

{
aaa
bbb
ccc
}
{
aaa
}

Hope this helps, kindly let me know if you have any queries.

EDIT: Adding one more senario, where lets say file is having some text in between braces also then following may help you.

 cat tst4567
{
   aaa
   bbb
   aaa
   ccc
}
ads
dvsah
{
   aaa
   aaa
}

Following code may help in same.

awk '/\{/ {B=1;print $0} {if(!/\{/ && !/\}/ && B==1){gsub(/[[:space:]]/,A,$0);X[$0]}} /\}/ {B=0;for(u in X){print u;delete X}{print $0}} {if(B==0 && ($0 !~ /\{/) && ($0 !~ /\}/)){print $0}}' tst4567

Non one liner form for solution.

awk '
/\{/ {B=1;print $0}
{if(!/\{/ && !/\}/ && B==1)
        {gsub(/[[:space:]]/,A,$0);X[$0]}
}
/\}/ {B=0;
        for(u in X){print u;delete X}
        {print $0}
     }
{if(B==0 && ($0 !~ /\{/) && ($0 !~ /\}/))
        {print $0}
}' tst4567

Output will be as follows.

{
aaa
bbb
ccc
}
ads
dvsah
{
aaa
}

Thanks,
R. Singh

fugitivus · November 7, 2014, 5:15am

thank you for your help it seems that i have to learn awk
did anyone know a goot and easy to understand awk tutorial ?!?
Mfg Fugitivus

RavinderSingh13 · November 7, 2014, 5:23am

Hello,

Glad that given code is helpful for you, following are some links which can be helpful to you for awk learning.

http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawkA4.pdf

You can refer to O'Reilly books for same too. Hope this helps. Also this forum is one of the best forums/platforms to learn unix, you can always search your queries here in search option and if you have made some code and you have queries, you are always welcome to come and ask your queries. Also you can help people in their queries, by this we can help each other in learning which is the primary motive of this forum. Enjoy learning

Thanks,
R. Singh

RudiC · November 7, 2014, 5:29am

For your original scenario, this might do:

awk     '/{/            {delete T}
         T[$0]++        {next}
         1
        '  file

For RavinderSingh's extended sc., try

awk     '/{/            {L=1; delete T}
         /}/            {L=0}
         L && T[$0]++   {next}
         1
        '  file

@RavinderSingh: the for (u in X) doesn't guarantee the order of lines in the input file; try with quite some more lines...

MadeInGermany · November 7, 2014, 7:34am

awk '
/\}/ {block=0; delete S}
(block==1) {
  if ($0 in S) {next} else {S[$0]}
}
{print}
/\{/ {block=1}
' input

The array S stores all encountered strings (where string is $0 i.e. the current line). At the end of the block it is deleted, so each block is separate.
The following stores the previous value in a simple variable S, and will only suppress adjacent duplicate lines.

awk '
/\}/ {block=0; found=0}
(block==1) {
  if (found==1 && $0==S) {next} else {S=$0; found=1}
}
{print}
/\{/ {block=1}
' input

The following input file demonstrates the difference:

{
   aaa
   bbb
   aaa
}

aaa
bbb
aaa

{
   aaa
   aaa
   ccc
}

NB awk uses ERE, and { and } have a special meaning. So it should be \{ and \} .
Then in principal my 1st sample is identical with Rudi's 2nd sample.

pamu · November 7, 2014, 10:38am

 $ cat file
{
   aaa
   bbb
   aaa
   ccc
}
ads
dvsah
{
   aaa
   aaa
}


$  awk '/\{/ {f=NR} f && !A[f,$0]++ {print} !f{print} /\}/{f=0}' file
{
   aaa
   bbb
   ccc
}
ads
dvsah
{
   aaa
}