Sort and summarise between patterns

mac-arrow · November 21, 2018, 4:55am

Hi!

I have a text file which I would like to sort, summarise and count between the pattern "--Current Database"

This is my text file:

-- Current Database: `city`
New York
Chicago
Las Vegas
San Francisco
-- Current Database: `country`
United States
Mexico
Portugal
Mexico
Mexico
Norway
-- Current Database: `name`
Kevin Hart
Caroline
Max
Kevin Hart
-- Current Database: `phone`
669874223
236897556
478896542
669874223
-- Current Database: `addres`
menk st 
guitar st 15

And I would like the output to be like that

-- Current Database: `city`
1 Chicago
1 Las Vegas
1 New York
1 San Francisco
-- Current Database: `country`
3 Mexico
1 Norway
1 Portugal
1 United States
-- Current Database: `name`
1 Caroline
2 Kevin Hart
1 Max
-- Current Database: `phone`
1 236897556
1 478896542
2 669874223
-- Current Database: `addres`
1 guitar st 15
1 menk st

I know that cat file.txt | sort | uniq -c would sort, summarise and count all lines but I don't know how to do it between patterns. I also tried with "split" command but I wasn't able to make what I was expecting

Could somebody help me?

Thanks

disedorgue · November 21, 2018, 9:50am

Hi,
Maybe as:

awk '/^-- Current Database:/ {A++}{print A" "$0}' file.txt | LC_COLLATE=C sort | uniq -c | sed 's/ *\([0-9]\+\) [0-9]\+ /\1 /;/-- Current Database:/s/^[0-9]\+ //'

Regards.

mac-arrow · November 21, 2018, 10:22am

WOW!! it worked!! Thank you so much!

Just one more thing... How could I do to sort the results from the highest number to the lowest instead of alphabetically?

Like this:

-- Current Database: `country`
3 Mexico
1 Norway
1 Portugal
1 United States
-- Current Database: `name`
2 Kevin Hart
1 Caroline
1 Max
-- Current Database: `phone`
2 669874223
1 236897556
1 478896542

Regards

disedorgue · November 21, 2018, 10:50am

It's a little few hard and the result is little few different (highest number then highest aphabetic so) :

awk '/^-- Current Database:/ {A--}{print A" "$0}' /tmp/file.txt | LC_COLLATE=C sort | uniq -c | sed '/-- Current Database:/s/[0-9]\+/Z/' | LC_COLLATE=C  sort -rn -k2,2 | sed 's/ *Z -[0-9]* //;s/  \+\|-[0-9]\+//g'

Regards.

RudiC · November 21, 2018, 5:27pm

Try also

awk '
/-- /   {if (cmd) close (cmd)
         print
         cmd = "sort | uniq -c | sort -k1,1r -k2"
         next
        }
        {print | cmd
        }

END     {close (cmd)
        }
' file

mac-arrow · November 22, 2018, 4:03am

Thank you @disedorgue for your help! This sentence is kind of mixing up some lines with different databases but don't worry.
I think your first answer will also help me for the task I needed to do

------ Post updated at 09:03 AM ------

Hi @Rudic

I tried yours but its giving me some errors

awk: /-- /   {if (cmd) close (cmd) print cmd = "sort | uniq -c | sort -k1,1r -k2" next} {print | cmd} END {close (cmd)}
awk:                               ^ syntax error
awk: /-- /   {if (cmd) close (cmd) print cmd = "sort | uniq -c | sort -k1,1r -k2" next} {print | cmd} END {close (cmd)}
awk:                                                                              ^ syntax error

regards

Don_Cragun · November 22, 2018, 4:55am

mac-arrow:

Thank you @disedorgue for your help! This sentence is kind of mixing up some lines with different databases but don't worry.
I think your first answer will also help me for the task I needed to do

------ Post updated at 09:03 AM ------

Hi @Rudic

I tried yours but its giving me some errors
awk: /-- /   {if (cmd) close (cmd) print cmd = "sort | uniq -c | sort -k1,1r -k2" next} {print | cmd} END {close (cmd)}
awk:                               ^ syntax error
awk: /-- /   {if (cmd) close (cmd) print cmd = "sort | uniq -c | sort -k1,1r -k2" next} {print | cmd} END {close (cmd)}
awk:                                                                              ^ syntax error
regards

Please try RudiC's code the way he presented it.

You introduced syntax errors when you combined lines that RudiC had as separate lines in post #5 in this thread.

mac-arrow · November 23, 2018, 4:04am

Hi @disedorgue

Could you explain me those lines? I'm not sure what the do each of them Thank you

awk '/^-- Current Database:/ {A++}{print A" "$0}' 
sed 's/ *\([0-9]\+\) [0-9]\+ /\1 /;/-- Current Database:/s/^[0-9]\+ //'

Regards

disedorgue · November 23, 2018, 10:58am

mac-arrow:

Hi @disedorgue

Could you explain me those lines? I'm not sure what the do each of them Thank you
awk '/^-- Current Database:/ {A++}{print A" "$0}' 
sed 's/ *$[0-9]\+$ [0-9]\+ /\1 /;/-- Current Database:/s/^[0-9]\+ //'
Regards

Hi,

awk '/^-- Current Database:/ {A++}{print A" "$0}' ==> put a flag (a number follow of space) on each line of each Database block. for a given block, we have the same flag on each line, like that, the sort does not mix the lines of a block with another block. sed 's/ *$[0-9]\+$ [0-9]\+ /\1 /;/-- Current Database:/s/^[0-9]\+ // ==> suppress the flag put by previous awk.

Regards.

mac-arrow · November 23, 2018, 11:00am

Thank you so much again!!