Help with Bash Shell Script

khanvader · July 8, 2009, 4:32pm

I am fairly new to Unix scripting. A need has come where I need to to read the first column of a tab delimited file. The first column will be an id number and so I would need to read the first column one by one and create another file with only unique id numbers. So for example, if I have the following in my input file
12345 ABC
12345 XYZ
45678 ABC
12345 BVC
78906 ABC
12345 ABC

Here I need to create a file that should have the following:
12345
45678
78906

Thanks for the help in advance.

vgersh99 · July 8, 2009, 4:40pm

nawk '!a[$1]++{print $1}' myFile

khanvader · July 8, 2009, 4:56pm

Vgarsh,

Thanks for the quick response and yes this works exactly how I wanted. I don't understand each and every piece of the command clearly but I can look those up

However, at command line this is working great but if I wanted to create a bash shell script to be able to run as part of a cron job, how would file reading part work?

vgersh99 · July 8, 2009, 5:21pm

$ echo 'Vgarsh' | sed 's/a/e/'
Vgersh

I guess, I don't understand the question. Just create a script - like so:

#!/bin/ksh
myFile='/absolute/path/to/the/file/to/be/parsed'
nawk '!a[$1]++{print $1}' "${myFile}"

I'm not what else should be in the script - outputting the unique numbers from the script invoked from cron does not make much sense. You probably want to do something else with the those unique numbers.

khanvader · July 8, 2009, 5:34pm

Sorry for misspelling your username

Yes, you are right the purpose of the cron job is not just outputting file with unique numbers, I wanted to actually try it myself too but here is really what I am trying to do.

We will get a tab delimited file, the first column would have id numbers (like I mentioned earlier), we would need to create files based on unique id number. So the example I used earlier, we will create 3 files and move the records in that file. (idnumber1.txt, idnumber2.txt etc). These id numbers will mostly repeat but there could always be two or three different ones and hence, we create and move records for that id number in that file.
In other words, the idea is to split the file based on the first column in the file when it changes. This script can then be scheduled as a cron job to run at different time to look for a new input file.

vgersh99 · July 8, 2009, 6:10pm

#!/bin/ksh
myFile='/absolute/path/to/the/file/to/be/parsed'
nawk '{
  if (out) close(out)
  out=$1 ".txt"
  print >> out }' "${myFile}"

khanvader · July 10, 2009, 1:15pm

Thanks for the code...here is my current script and I have some issues that maybe you or someone can point out:

#!/bin/bash
for i in $(ls cliaDir/)
do
  myFile=$i
  echo "$myFile"
  nawk '{
     if (out) close(out)
     out=$1 ".txt"
     print >> out }' "${myFile}"
  mv $i processedDir/
done

cliaDir/ will have the file in it that would need to be split based on the first id column (which the code is doing fine)
the mv command is really copying the file in processedDir/ but not actually moving. Since the original is still in cliaDir and should not there anymore.

Another problem is that when I run the script again to process same file that is in cliaDir/, I get the following error:
test.csv
nawk: can't open file test.csv
source line number 4

vgersh99 · July 10, 2009, 1:28pm

khanvader:

Thanks for the code...here is my current script and I have some issues that maybe you or someone can point out:
#!/bin/bash
for i in $(ls cliaDir/)
do
  myFile=$i
  echo "$myFile"
  nawk '{
   if (out) close(out)
   out=$1 ".txt"
   print >> out }' "${myFile}"
  mv $i processedDir/
done

First of all, there's no need for 'ls' - this is useless.

No, 'mv' does the MOVE - that's why it's called 'mv' and not 'cp'.
If you want to 'copy', use 'cp'.

Hmm.... I don't see any mention of 'test.csv' in the script. Maybe somebody just 'mv'-ed that file while the script was running.
Uncomment the 'set -x' to see the script debug output.

#!/bin/bash
#set -x
for i in cliaDir/*
do
  myFile=$i
  echo "$myFile"
  nawk '{
     if (out) close(out)
     out=$1 ".txt"
     print >> out }' "${myFile}"
  mv $i processedDir/
done

khanvader · July 10, 2009, 3:43pm

Actually...the script is working for me now. The only change made was the one where u suggesting to not use 'ls' in the for loop. MV is what I needed and not CP so I am clear on that but it was not working for me for some reason but it is now.

Now with the same script if I wanted to check the existence of a file in cliaDir/, can you help me with that? In other words, the script should check to see if file is in cliaDir/ or don't run the script.

khanvader · July 13, 2009, 11:57am

Here is the code so far that I got working with good help from the experts. It is getting the file from the directory and splitting to create files by first column. However I need to skip the first row since it is has header columns.

#!/bin/bash
#set -x
for i in cliaDir/*
do
  myFile=$i
  echo "$myFile"
  nawk '{
     if (out) close(out)
     out=$1 ".txt"
     print >> out }' "${myFile}"
  mv $i processedDir/
echo "here is the file name $i"
done

Any ideas, I would appreciate.

vgersh99 · July 13, 2009, 12:22pm

#!/bin/bash
#set -x
for i in cliaDir/*
do
  myFile=$i
  echo "$myFile"
  nawk 'FNR>1{
     if (out) close(out)
     out=$1 ".txt"
     print >> out }' "${myFile}"
  mv $i processedDir/
echo "here is the file name $i"
done

khanvader · July 16, 2009, 6:15pm

This is working great for me except that I am not able to move the split files into some other directory.

Within nawk, it splits great by first colulmn but the file is getting saved in the directory where I am running this script from....I would like to send it to some other directory. How can I achieve that?

vgersh99 · July 16, 2009, 6:45pm

#!/bin/ksh

targetDir='/path/to/some/other/dir'
for i in cliaDir/*
do
  myFile=$i
  echo "$myFile"
  nawk -v target="${targetDir}" '
     FNR>1{
     if (out) close(out)
     out=target "/" $1 ".txt"
     print >> out }' "${myFile}"
  mv $i processedDir/
echo "here is the file name $i"
done

khanvader · July 17, 2009, 12:00pm

vgersh99...thanks again for your help...it works great for me now

Ankzz · August 19, 2009, 4:00am

I am new to Unix.. Scripting...
Still as I can suggest, just delete the files which you are confident about that they have been processed already.