Adding a column to a text based on file name

rlapate · May 23, 2009, 3:53am

Dear all,

Does anyone know how I could to add a column of numbers (1s, or 2s, or..., or 6s) to two-column text files (tab-delimited), where the specific number to be added varies as a function of the file naming?

Currently, each of my text files has two columns, so the column with the repeated number in it would be the third column. The groups of six text files are located within separate directories, and the easiest way to get at their naming (which determines which number should be added) is by using "ls" - it will always list them in the right order, such that first text file listed should have a third column added with "1s" (for as many rows as it has; where the number of rows varies depending on the specific text file); and the second text file listed should have a third column appended with "2s", so on and so forth.

I am still quite new to programming, and my intuition tells me I should use the command "ls" and then pipe "|" that to a certain command that would execute "if the file is the first one, add a column of ones, if file is the second one, a column of twos..." etc.

Thank you so much for any feedback!

Regina

vidyadhar85 · May 23, 2009, 4:04am

what have you tried till now??

cfajohnson · May 23, 2009, 4:11am

rlapate:

Dear all,

Does anyone know how I could to add a column of numbers (1s, or 2s, or..., or 6s) to two-column text files (tab-delimited), where the specific number to be added varies as a function of the file naming?

Currently, each of my text files has two columns, so the column with the repeated number in it would be the third column. The groups of six text files are located within separate directories, and the easiest way to get at their naming (which determines which number should be added) is by using "ls" - it will always list them in the right order, such that first text file listed should have a third column added with "1s" (for as many rows as it has; where the number of rows varies depending on the specific text file); and the second text file listed should have a third column appended with "2s", so on and so forth.

You don't need ls for that; in fact, it's usually the wrong way to do it. Use filename expansion instead.

No, loop through the files with filename expansion.

n=1
for file in [whatever pattern works]
do
  awk -v num=$n '{ $(NF+1) = num; print }' "$file" > "$file.new"
  n=$(( $n + 1 ))
done

The awk script might have to be tweaked, depending on the exact format of the files.

devtakh · May 23, 2009, 4:16am

You might want to do something like

 i=1;
for file in *.txt; 
do 
awk -F "\t" -v n=$i '{print $0,n}' OFS="\t" $file > $file.tmp;
mv $file.tmp $file; 
i=`expr $i \+ 1 `
done

-Devaraj Takhellambam

cfajohnson · May 23, 2009, 4:24am

You don't want to use an external command for simple arithmetic:

i=$(( $i + 1 ))

vidyadhar85 · May 23, 2009, 4:35am

you don't wanna use extra "$" inside

 
i=$((i+1))

cfajohnson · May 23, 2009, 4:42am

Yes I do. I want my code to be legible.

rlapate · May 23, 2009, 11:32am

Thank you all so much for your quick replies! Both cfajohnson 's and devtakh's codes
did exactly what I was looking for.

I understand awk is a powerful tool to manipulate text files; if you could please recommend a good source for beginners that would be very appreciated; and/or if you could let me know what these two lines are doing, for the next time I have to deal with a similar problem... Thank you and have a great day!

awk -v num=$n '{ $(NF+1) = num; print }' "$file" > "$file.new"
n=$(( $n + 1 ))

Regina

Rhije · May 23, 2009, 11:44am

awk -v num=$n '{ $(NF+1) = num; print }' "$file" > "$file.new"
n=$(( $n + 1 ))

So, the -v num=$n is making awk set the variable "num" to the value of $n (originally set to 1 before the loop). So the first time the loop runs, it sets num to 1.

Within the awk block, you see $(NF+1), NF is the "number of fields" in that record. So if it has 5 fields, then NF is 5. You can access a field by using $() and it will evaluate what is inside and then access that field, meaning that if you did $(NF) it would access the last field (NF being how many fields are in the record as stated before). So if you did $(NF+1) it would add 1 to the number of fields and access that field. So if you had 5 fields, it would access the 6th field. In this particular case the awk block is setting the last field+1 to the value of num (which is incremented with each iteration of the loop). It then prints the entire record (print is the same as print $0 which is to print the entire record).

"$file" > "$file.new" is reading from $file and writing to $file.new So it performs all of the actions on the content in $file but does not change it there, but instead writes all of the changes/modifications/etc to $file.new

The last part is n=$(( $n + 1 )), which just increments n by 1 (so it increases by 1 every time the loop is entered).

Well.. I hope thats enough information!

rlapate · May 23, 2009, 11:46am

Yes this is fantastic. Thanks a million!

ghostdog74 · May 23, 2009, 9:11pm

just do it inside awk

awk '{ print $0,n++ > "file.new"}' file

cfajohnson · May 23, 2009, 9:17pm

That increments the number for every line in one file, not once for each file in a set of files.

ghostdog74 · May 23, 2009, 9:22pm

ah, thanks, my bad.

awk 'FNR==1{n++}{print $0,n> "file_new"}' file*