Convert list of numbers to text range

chrissycc · December 14, 2016, 10:32am

Hi,

I'd like to take a list of numbers (with a prefix) and convert to a range, for example:

cn001
cn004
cn016
cn017
cn018
cn019
cn020
cn021
cn031
cn032
cn038
cn042
cn043
cn044
cn045

What I'd like is to reduce this down to the following (it's slurm output!):

cn[001,004,016-021,031-032,038,042-045]

Cheers,

Chris

rbatte1 · December 14, 2016, 11:41am

Dear chrissycc,

I have a few to questions pose in response first:-

Is this homework/assignment? There are specific forums for these.
What have you tried so far?
What output/errors do you get?
What OS and version are you using?
What are your preferred tools? (C, shell, perl, awk, etc.)
What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Kind regards,
Robin

chrissycc · December 14, 2016, 12:32pm

Ok, apologies....

Not homework/assignment, just a task I want to complete at work to make reporting on some Slurm scheduler output a bit more manageable.

CentOS 7, awk the tool of choice.

I have tried the following, which seems to be getting somewhere, just having a mental block on how to get over the finishing line:

cat test | sed 's/cn//g' | \
awk '{ \
if (length(PRE) == 0)  printf "%s", $1 ; \
else if ($1>PRE+1) printf ",%s", $1 ; \
else if ($1=PRE+1) printf "-" ; \
PRE=$1 ; } \
END { print }'

Producing:

001,004,016-----,031-,038,042---,048048

(I put an additional cn048 at the end, just as a test as I got some weird output of another test batch of data using this technique. Otherwise it is the data above).

Cheers,

Chris

RudiC · December 14, 2016, 3:23pm

How about

awk '
NR == 1         {PFX = substr ($1, 1, match ($1, /[0-9]/)-1)
                 printf "%s[", PFX
                 LEN = length($1) - length(PFX)
                 PRE = -1E9 
                }

                {sub (PFX, "")
                }

$1 == PRE + 1   {SEQ = 1
                }
$1  > PRE + 1   {if (SEQ) printf "-%0*d", LEN, PRE
                 printf "%s%0*d", DL, LEN, $1
                 DL = ","
                 SEQ = 0
                }

                {PRE = $1
                }

END             {printf "-%0*d]\n", LEN, $1
                }
' file
cn[001,004,016-021,031-032,038,042-045]

chrissycc · December 16, 2016, 4:25am

Hi RudiC - thanks that is almost perfect.....

The almost comes in as if I have a slightly different dataset where the final number isn't the end of a sequence, it becomes a sequence of its own, e.g.

Input:

cn001
cn004
cn016
cn017
cn018
cn019
cn020
cn021
cn031
cn032
cn038
cn042
cn043
cn044
cn045
cn048

Output:

cn[001,004,016-021,031-032,038,042-045,048-048]

I'm guessing I just need to drop an if statement in after END, so I'll have a look at that and report back....

---------- Post updated 16-12-16 at 09:25 AM ---------- Previous update was 15-12-16 at 11:15 AM ----------

Thanks for the help RudiC. My final code which seems to work (I've run it through various test cases):

awk '
NR == 1         {PFX = substr ($1, 1, match ($1, /[0-9]/)-1)
                 printf "%s[", PFX
                 LEN = length($1) - length(PFX)
                 PRE = -1E9 
                }

                {sub (PFX, "")
                }

$1 == PRE + 1   {SEQ = 1
                }
$1  > PRE + 1   {if (SEQ) printf "-%0*d", LEN, PRE
                 printf "%s%0*d", DL, LEN, $1
                 DL = ","
                 SEQ = 0
                }

                { PREPRE=PRE ; PRE = $1 }
END          { 
                  if ($1 > PREPRE+1) { printf "]\n" }
                  else { printf "-%0*d]\n", LEN, $1 }
                }
' file

RudiC · December 16, 2016, 4:40am

Glad it works. How about this somewhat shorter approach:

END		{if (SEQ) printf "-%0*d", LEN, $1
		 printf "]\n"
                }