convoluted code

Hi,
I have been thinking of how to script this but i have no clue at all..
Could someone please help me out or give me some idea on this?
I would like to group those lines with the same first variable in each line, joining the 2nd variables with commas.
Let's say i have the following input.

aa c1
aa c2
aa c3
cc d1
dd e1
dd e2
ee f1

I would like the output to be like this.

aa c1,c2,c3
cc d1
dd e1,e2
ee f1

Could this be easily done with bash script?
Or should i try perl script then?
I'm a beginner in bash script and perl.
Thank you.

***************************
Try this

first="Y"                
while read a b           
do                       
if [ "$first"  =  "Y"  ] 
then                     
   first="N"             
   prev=$a               
   echo "$a $b\c"        
else                     
if [ "$a" != "$prev" ]   
then                     
   echo "\n$a $b\c"      
   prev=$a               
else                     
   echo ",$b\c"          
fi                       
fi                       
done

This assumes that the input file is sorted.

********************************************************
Sorry.. don't understand..
what is a, b , F ..?
Anyway i found a short solution.
${input} is the filename for the input file.

for m in `cat ${input} | awk '{print $1}' | sort | uniq `
do
        var=`grep "^${m} " ${output} | awk '{print $2}' | tr '\n' ',' | sed '$s/,$//'`
        echo "${m} ${var}"
done 
2 Likes

@jgt
Very good.

I love the way that the first script uses Bourne echo and never terminates the last line.

The "short solution" relies on the exponential multi-pass technique and utilises a magic Shell with unlimited command line length. Until looking at unix.com I had never seen the for var in <open-ended list> construct ever ... and still wonder which course/manual/book/rumour it comes from?

Footnote: For anybody following this thread, this is a discussion thread.
@jgt is expert and definitely did not ask this question.

I must confess that I was confused until that footnote clued me in to the fact that I wasn't in the shell scripting forum anymore (got here via the 'new posts' link). :slight_smile:

I don't know for certain, but my first impulse is to blame GNU Bash. I've seen that idiom recommended as the correct way to work around the fact that the final command in a bash pipeline runs in a subshell.

From the Advanced Bash-Scripting Guide's Bash Gotchas:

# Loop piping troubles.
#  This example by Anthony Richardson,
#+ with addendum by Wilbert Berendsen.


foundone=false
find $HOME -type f -atime +30 -size 100k |
while true
do
   read f
   echo "$f is over 100KB and has not been accessed in over 30 days"
   echo "Consider moving the file to archives."
   foundone=true
   # ------------------------------------
     echo "Subshell level = $BASH_SUBSHELL"
   # Subshell level = 1
   # Yes, we're inside a subshell.
   # ------------------------------------
done
   
#  foundone will always be false here since it is
#+ set to true inside a subshell
if [ $foundone = false ]
then
   echo "No files need archiving."
fi

# =====================Now, here is the correct way:=================

foundone=false
for f in $(find $HOME -type f -atime +30 -size 100k)  # No pipe here.
do
   echo "$f is over 100KB and has not been accessed in over 30 days"
   echo "Consider moving the file to archives."
   foundone=true
done
   
if [ $foundone = false ]
then
   echo "No files need archiving."
fi

Regards,
Alister

@jgt: Really funny... :smiley:

Although it may be good to note that command line length does not really play a role in this case, since "for" is a shell keyword and not an external command, so the length of the list should be more or less limited by memory...

It is not only bash, dash and Bourne shell do this too. It is also not specified by Posix, so ksh is the exception..

There have been so many posts from AIX users (to name but a few O/S versions and variants and for the benefit of @jlliagre that includes many versions of SunOS/Solaris!) who hit this problem that I would love that the for var in <open ended list> syntax would become a syntax error.

Somehow I knew that this thread was going to be fun.

Great posts @jgt, @alister and @Scrutinizer.

I agree with you. ksh is exceptional. :wink:

My post isn't intended to suggest that bash (or dash or bourne sh) violate the standard -- I'm aware that POSIX allows a shell to execute each component of a pipeline in either the current environment or a subshell (ksh is compliant in this respect); the post is merely a response to methyl's rumination on the popularity of the idiom, identifying one popular resource which promotes it.

In my opinion, the ksh approach makes life a little easier. But, if I depend on it in a script, I always explicitly require ksh in the shebang.

Regards,
Alister

You weren't that wordy about it at the time; you simply said your first impulse was to "blame GNU bash". That's a bit of a sore point -- BASH, as a shell many new users start with, has a low value in some admin's eyes -- the habit of new users calling all shell scripting "bash scripts" doesn't help -- and seems to get blamed for everything from their own bugs to poor weather as a result. Even though it's just doing what a Bourne shell does. Blame him if you must. :wink:

Perhaps, but it's not something that can really be changed now. Bourne is what it is, and changing the ordering would change what happens in a lot of scripts because it cuts both ways -- suddenly scripts could be setting variables in the main shell they never did before...

I bet BASH'll add it as one of it's many fiddly setopt options someday.

Indeedelydoodely..:slight_smile: bash 4.2 and up has:

shopt -s lastpipe
1 Like

Upon further reflection, I have decided to blame methyl. :slight_smile:

Regards,
Alister