Script to change first line of files in directory

LMHmedchem · June 9, 2012, 1:45am

I need a script to take the filename of every file in a directory and substitute that file name for whatever is on the first line of the file. There may or may not be anything on the line, but I want the line to be the same as the file name. Most of the script tools I have used are non-destructuve, so I'm not sure how to go about this. I guess I could write to a temp file, delete the original, and then rename the temp, but that seems rather crude.

My understanding is this is what the sed c command is for (something like sed '1 c filename' file.txt), but I'm not sure of the usage and how I would get the filename.

Any suggestions as to where I could start.

LMHmedchem

Skrynesaver · June 9, 2012, 2:38am

for i in * ; do
  first=$(head -1 $i)
  head=$(echo $head | sed 's/ /_/')
  if [ "X$head" -eq "X" ] ; then
    head="blank"
  fi
  while [ -e $head ];do
    head="$head.1"
  done
  echo "mv $i $head" # if this looks good on the first pass, then uncomment the next line and try again
  #mv $i $head
done

LMHmedchem · June 9, 2012, 12:42pm

I put this in a file in the directory with the files, added #!/usr/bin/bash to the first line, and ran it. I get an endless output of,

head: cannot open `=$ head.1' for reading: No such file or directory

It is stuck in a loop and I have to kill it.

Am I running this correctly?

LMHmedchem

alister · June 9, 2012, 1:42pm

With all due respect to Skrynesaver, that script is severely braindamaged.

The line that's giving you that error looks to be an assignment but the assignment operator cannot have whitespace around it. It appears to be an attempt to generate a unique filename, but the result, if the syntax were correct, would be a filename with a string of .1.1.1.1.1.1 appended.

The variable $first is set on the second line but it's never used. I think it was intended to be used in place of the second occurence of $head on the third line.

The sed invocation will only modify the first space it encounters. If there is another, the unquoted use of $head in the -e test and the mv command will implode.

Bugs aside, I believe Skrynesaver misunderstood your request. It appears that the script is an attempt to take the contents of the first line in a file and use that to rename the file, instead of using the filename to modify the first line in the file.

Regards,
Alister

---------- Post updated at 01:42 PM ---------- Previous update was at 01:19 PM ----------

Perhaps this will meet your needs:

cd "$1"
for f in *; do
    [ ! -f "$f" ] && continue
    if [ -s "$f" ]; then
        printf %s\\n 1c "$f" . w q | ed -s "$f"
    else
        printf %s\\n "$f" > "$f"
    fi
done

It takes one argument, the path to the directory to work on. It would be prudent to test it on a dummy directory with a few sample files.

Regards,
Alister

agama · June 9, 2012, 2:06pm

alister:

Perhaps this will meet your needs:

cd "$1"
for f in *; do
   [ ! -f "$f" ] && continue
   if [ -s "$f" ]; then
   printf %s\\n 1c "$f" . w q | ed -s "$f"
   else
   printf %s\\n "$f" > "$f"
   fi
done

I would suggest one small change: add some error checking to the cd command. If the user mistypes the path, it will work on all files in the current directory which is probably not what is intended.

if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

LMHmedchem · June 9, 2012, 2:06pm

This script will accept a filename and and do the renaming, but it is rather awkward and I would need to generate a list of files in the directory, which I guess is no big deal.

#!/usr/bin/bash
   FILENAME=$1
   sed "1 c\\$FILENAME" $FILENAME > TEMP
   rm -f $FILENAME
   cp -f TEMP $FILENAME
   rm -f TEMP

LMHmedchem

alister · June 9, 2012, 2:36pm

agama:

I would suggest one small change: add some error checking to the cd command. If the user mistypes the path, it will work on all files in the current directory which is probably not what is intended.
if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

Bah! Where's the fun in that?

You are correct, of course. Better safe than sorry, especially when the damage can be so severe.

However, I think the parameter expansion, cd "${1:-no-such-directory}" is misguided. As unlikely as it may be to exist, no-such-directory is a valid directory name. In my opinion, it's a bad idea to replace an absent or empty parameter with anything, in this instance.

Regards,
Alister

---------- Post updated at 02:36 PM ---------- Previous update was at 02:25 PM ----------

It's much safer and appropriate for that replacement to occur when $1 is referenced in the echo statement.

Regards,
Alister

LMHmedchem · June 9, 2012, 3:01pm

Sorry, I didn't see these last two posts, I will give the code a try.

LMHmedchem

---------- Post updated at 03:01 PM ---------- Previous update was at 02:42 PM ----------

Well using the script posted by alister, I get the following error,

line 14: ed: command not found

line 14 is,

printf %s\\n 1c "$f" . w q | ed -s "$f"

If I add the code from agama, I get the following error,

line 5: cd: ./test_script/: No such file or directory
abort: could not switch to './test_script/' or parameter was missing

This is the code I am using,

#!/usr/bin/bash

# mol_filename2firstline_2.sh 

cd "$1"

if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

for f in *; do
    [ ! -f "$f" ] && continue
    if [ -s "$f" ]; then
        printf %s\\n 1c "$f" . w q | ed -s "$f"
    else
        printf %s\\n "$f" > "$f"
    fi
done

run as,

./mol_filename2firstline_2.sh test_script/

I did stick my code into the loop to see what would happen, and it does work.

#!/usr/bin/bash
# mol_filename2firstline_3.sh 
# accepts a directly name dir/ and add replaces the first line of all .mol files
# with the filename

cd "$1"

for f in *.mol; do
   sed "1 c\\$f" $f > TEMP
   cp -f TEMP $f
   rm -f TEMP
done

It is rather slow at 1.4s for 20 files.

LMHmedchem

alister · June 9, 2012, 3:15pm

What operating system are you using that it doesn't have ed? I'm curious.

You don't want to cd twice.

Regards,
Alister

agama · June 9, 2012, 3:31pm

You only need the cd command in the if statement. Your directory is a relative path (doesn't start with a slant) and that is why it's erroring (there isn't a ./xxxx directory inside of the directory which you switched to earlier in the script.

You also are wasting effort copying the file back in your script (extra i/o adds to the latency).

#!/usr/bin/bash

# mol_filename2firstline_2.sh 
### unneeded cd "$1"

if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

ls | while read f              # for *   doesn't handle filenames with spaces or large numbers of files. 
do
    [ ! -f "$f" ] && continue

    printf %s\\n "$f" >"$f.new"
    if ! sed '1d' "$f" >>"$f.new"
    then
        rm "$f.new"
    else
        mv "$f.new" "$f"
    fi
done

LMHmedchem · June 9, 2012, 3:48pm

I'm running cygwin under windows xp. I know that cygwin has ed, but I may not have the package installed.

I didn't recognize that the cd command was going to execute twice.

I ran you code on a set of 900 files and it takes ~45s, compared to 1m6s for my code, so this works pretty well.

The files I will use this on always have the extension .mol, I added that to my code for some extra security in case there happen to be some other files in the directory. Using your code, I could do ls *.mol | etc, but would that invoke the same issue you were trying to avoid by using ls in the first place (spaces, etc)? I suppose logic could be added to the loop to test if the filename ends in .mol and skip it if not.

Something like,

#!/usr/bin/bash

# mol_filename2firstline_4.sh 

if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

ls | while read f              # for *   doesn't handle filenames with spaces or large numbers of files. 
do
    [ ! -f "$f" ] && continue

   EXTENSION=`echo ${f: -4}`

    if [ "$EXTENSION" == ".mol" ]; then
      printf %s\\n "$f" >"$f.new"
      if ! sed '1d' "$f" >>"$f.new"
      then
         rm "$f.new"
      else
         mv "$f.new" "$f"
      fi
   else
     echo $f # alert user if files other than .mol were found
   fi

done

LMHmedchem

agama · June 10, 2012, 11:00am

You are correct, adding the globbing to the ls command would have the same possible consequences as using the glob on the for.

Your solution is on the right track; you don't need the echo to assign the value to EXTENSION, and since you are using bash, you don't even need to create the variable.

EXTENSION="${f: -4}"     # if you prefer to keep the variable, just assign it directly no need for backtics

ls | while read f              # for *   doesn't handle filenames with spaces or large numbers of files. 
do
    [ ! -f "$f" ] && continue

    if [[ "$f" == *".mol"  ]]; then
      printf %s\\n "$f" >"$f.new"
      if ! sed '1d' "$f" >>"$f.new"
      then
         rm "$f.new"
      else
         mv "$f.new" "$f"
      fi
   else
     echo $f # alert user if files other than .mol were found
   fi

Both bash and kshell support pattern matching (don't confuse with regular expression matching) inside of the [[...]] construct. This allows you to test for the contents of f without needing to chop it up.

alister · June 10, 2012, 11:03am

agama:

if ! cd "${1:-no-such-directory}"     # also quote on the off chance that something in the path has spaces
then
   echo "abort: could not switch to '$1' or parameter was missing"
   exit 1
fi

ls | while read f              # for *   doesn't handle filenames with spaces or large numbers of files.

If you want to ensure that the script has been given a viable working directory, then $1 should be tested to confirm that it's non-empty and a directory. In that instance, blindly substituting a string in that way can be dangerous.

You are mistaken about the for f in * ... construct. Since pathname expansion occurs after field splitting, the expansion of * is absolutely safe with regard to IFS characters (by default, space, tab, and newline).

Ironically, the statement you replaced it with does not handle spaces correctly in all cases. If ls prints a filename with leading or trailing spaces, read will discard those, yielding either a non-existent file or a different file. Further, if the filename ended with a backslash, that backslash would be stripped and the next filename in the list, if any, would be appended. while IFS= read -r f fixes both issues.

Demonstration:

$ touch 'a1\' a2 '   spaces   '
$ # CORRECT RESULT
$ for f in *; do printf ':%s:\n' "$f"; done
:a1\:
:a2:
:   spaces   :
$ # ERRONEOUS HANDLING OF LEADING/TRAILING WHITESPACE AND TRAILING BACKSLASH
$ ls | while read f; do printf ':%s:\n' "$f"; done
:a1a2:
:spaces:
$ # FIX ONLY THE BACKSLASH
$ ls | while read -r f; do printf ':%s:\n' "$f"; done
:a1\:
:a2:
:spaces:
$ # FIX ONLY THE SPACES
$ ls | while IFS= read f; do printf ':%s:\n' "$f"; done
:a1a2:
:   spaces   :
$ # FIX BOTH
$ ls | while IFS= read -r f; do printf ':%s:\n' "$f"; done
:a1\:
:a2:
:   spaces   :

You are correct in that there may be a limit to how many files the pathname expansion in the for-loop list can handle, but if it's sufficient (usually is) for the task at hand, it's the simplest and safest method.

Regards,
Alister

agama · June 10, 2012, 11:55am

I've always thought it was the other way round -- my twisted reasoning was that the shell had to expand the glob before it could split fields otherwise all files resulting from * would be treated as a single token.

Very embarrassing -- I wasn't even thinking along the lines of lead/trailing spaces.

Appreciate your pointing these out, and the samples. Fortunately, I'm always in learning mode

alister · June 10, 2012, 12:26pm

The world would be a better place if more of us approached it with that attitude.

Regards,
Aliser