Fetching 1st Column and Last n Columns

Nina2910 · October 5, 2016, 12:23pm

I am trying to fetch 1st column and last 10 columns.The code I am using is working fine but after using the code then output file becomes space delimited not tab delimited.

 
 awk 'BEGIN {OFS="\t"}{printf("%s\t",$1)}{for(i=NF-9; i<=NF; i++) {printf("%s\t",$i)};printf "\n" } ' inputfile

Then I have to use below code to make the file Tab Delimited.

 
 awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}'

Please help

Please let me know what is wrong in the code which is fetching 1st column and last 10 columns/

RavinderSingh13 · October 5, 2016, 12:28pm

Hello Nina2910,

Thank you for showing your efforts in forum, it will be more good if you could show us the sample Input_file and sample expected output too, could you please try following and do let me know if this helps you, haven't tested it though.

awk 'BEGIN {OFS="\t"}{for(i=NF-9; i<=NF; i++){Q=Q?Q OFS $i:$i};print $1,Q;Q=""}' Input_file

Thanks,
R. Singh

RudiC · October 5, 2016, 12:32pm

I'd be very surprised if the output were space delimited as you explicitly print the <TAB> char...
Post (or better: attach) your input and output files.

Nina2910 · October 5, 2016, 12:48pm

@Rudi

Input File

 
NAME    1       2       3       4       5       6       7       8       9       10      11      12      13      14      15      16
A       10      20      30      40      50      60      70      80      90      100     110     120     130     140     150     160
B       10      20      30      40      50      60      70      80      90      100     110     120     130     140     150     160
C       10      20      30      40      50      60      70      80      90      100     110     120     130     140     150     160
D       10      20      30      40      50      60      70      80      90      100     110     120     130     140     150     160
E       10      20      30      40      50      60      70      80      90      100     110     120     130     140     150     160

checked the input file for TAB delimited

 
 awk -F "\t" 'NF != 17' Inputfile.txt

Then use the below code to get 1st and last 10 columns

 
 awk 'BEGIN {FS="\t"}{printf("%s\t",$1)}{for(i=NF-9; i<=NF; i++) {printf("%s\t",$i)} printf "\n"} ' Inputfile.txt >> Inputfile1.txt

Inputfile1.txt

 
NAME    7       8       9       10      11      12      13      14      15      16
A       70      80      90      100     110     120     130     140     150     160
B       70      80      90      100     110     120     130     140     150     160
C       70      80      90      100     110     120     130     140     150     160
D       70      80      90      100     110     120     130     140     150     160
E       70      80      90      100     110     120     130     140     150     160

check to see if above is Tab Delimited but it is not

awk -F "\t" 'NF != 11' Inputfile1.txt

@Ravinder Thank you so much. Your code is working fine but I am not sure what is wrong with mine.

RudiC · October 5, 2016, 1:05pm

awk 'BEGIN {FS="\t"}{printf("%s\t",$1)}{for(i=NF-9; i<=NF; i++) {printf("%s\t",$i)} printf "\n"} ' file | od -tx1
0000000 4e 41 4d 45 09 37 09 38 09 39 09 31 30 09 31 31
0000020 09 31 32 09 31 33 09 31 34 09 31 35 09 31 36 09
0000040 0a 41 09 37 30 09 38 30 09 39 30 09 31 30 30 09
0000060 31 31 30 09 31 32 30 09 31 33 30 09 31 34 30 09
0000100 31 35 30 09 31 36 30 09 0a 42 09 37 30 09 38 30
0000120 09 39 30 09 31 30 30 09 31 31 30 09 31 32 30 09
0000140 31 33 30 09 31 34 30 09 31 35 30 09 31 36 30 09
0000160 0a 43 09 37 30 09 38 30 09 39 30 09 31 30 30 09
0000200 31 31 30 09 31 32 30 09 31 33 30 09 31 34 30 09
0000220 31 35 30 09 31 36 30 09 0a 44 09 37 30 09 38 30
0000240 09 39 30 09 31 30 30 09 31 31 30 09 31 32 30 09
0000260 31 33 30 09 31 34 30 09 31 35 30 09 31 36 30 09
0000300 0a 45 09 37 30 09 38 30 09 39 30 09 31 30 30 09
0000320 31 31 30 09 31 32 30 09 31 33 30 09 31 34 30 09
0000340 31 35 30 09 31 36 30 09 0a

proves it IS <TAB> delimited. What you can see is that you have one <TAB> too many at the end-of-line.

rohmanovich · October 5, 2016, 2:44pm

Look into expand/unexpand to go between tabs/spaces. And if the number of columns will stay constant, you can just do cat <file> | cut -f1,7-16 (-d is defaulting to tabs).

R

Don_Cragun · October 5, 2016, 4:31pm

If we go back to your original code:

awk 'BEGIN {OFS="\t"}{printf("%s\t",$1)}{for(i=NF-9; i<=NF; i++) {printf("%s\t",$i)};printf "\n" } ' inputfile

I think you will find that the output is tab delimited, but your test is failing because you have 12 fields on each output line (with the last field being empty) instead of 11 fields. Two trivial changes to your code will fix the problem:

awk 'BEGIN {OFS="\t"}{printf("%s",$1)}{for(i=NF-9; i<=NF; i++) {printf("\t%s",$i)};printf "\n" } ' inputfile

or, since nothing in your awk script uses OFS :

awk '{printf("%s,$1)}{for(i=NF-9; i<=NF; i++) {printf("\t%s",$i)};print ""}' inputfile

Of course, there is also the brute force:

awk 'BEGIN{OFS="\t"}{print $1,$(NF-9),$(NF-8),$(NF-7),$(NF-6),$(NF-5),$(NF-4),$(NF-3),$(NF-2),$(NF-1),$NF}' inputfile

or:

awk '{print $1,$(NF-9),$(NF-8),$(NF-7),$(NF-6),$(NF-5),$(NF-4),$(NF-3),$(NF-2),$(NF-1),$NF}' OFS='\t' inputfile

Nina2910 · October 6, 2016, 10:52am

@Don.. Thank you for fixing the code and telling me the issue.

Actually I need to extract last 100 columns and columns numbers changes everyday that's why can't use

awk '{print $1,$(NF-9),$(NF-8),$(NF-7),$(NF-6),$(NF-5),$(NF-4),$(NF-3),$(NF-2),$(NF-1),$NF}'

.

Thank you for help and fixing the issue in my code. It is working fine now

rbatte1 · October 6, 2016, 12:14pm

Is it always a fixed number of columns in each file? If so, you could use cut:-

tab=$(printf "\t")                                             # Just defining the tab character for clarity

columns=$(set - $(head -1 input_file | tr " " "_") ; echo $#)  # Count number of columns today.  The tr eliminates spaces, just for counting columns

((start_col=$columns-99))                                      # Count back 99 columns (else you will get an extra one)

cut -d "$tab" -f1,${start_col}- input_file                     # Note the trailing hyphen on the field/column definition

Robin

RavinderSingh13 · October 6, 2016, 1:59pm

rbatte1:

Is it always a fixed number of columns in each file? If so, you could use cut:-

tab=$(printf "\t")                                             # Just defining the tab character for clarity
columns=$(set - $(head -1 input_file | tr " " "_") ; echo $#)  # Count number of columns today.  The tr eliminates spaces, just for counting columns
((start_col=$columns-99))                                      # Count back 99 columns (else you will get an extra one)
cut -d "$tab" -f1,${start_col}- input_file                     # Note the trailing hyphen on the field/column definition

Robin

Hello rbatte1,

Thank you for your nice code sir, may I add here my humble view here(hope I am correct here, kindly correct me if I am wrong here).
I think if we need to print last 100 columns then final line in above code could be following.

cut -d "$tab" -f${start_col},${columns}- Input_file

I haven't tested it though.

Thanks,
R. Singh

Don_Cragun · October 6, 2016, 3:09pm

ravindersingh13:

Hello rbatte1,

Thank you for your nice code sir, may I add here my humble view here(hope I am correct here, kindly correct me if I am wrong here).
I think if we need to print last 100 columns then final line in above code could be following.
cut -d "$tab" -f${start_col},${columns}- Input_file
I haven't tested it though.

Thanks,
R. Singh

Hi Ravinder,
No. The list that is the option-argument to the cut -b list , -c list , and -f list options is a comma separated set of specifiers of one of four forms:

number: specifying that that byte, character, or field number number, respectively, (with the 1st item on a line being number 1) be output,
number-: specifying that byte, character, or field number number and every byte, character, or field following it on that line be output,
num1-num2: specifying that byte, character, or field numbers num1 through num2, inclusive, be output, and
-number: specifying that byte, character, or field numbers 1 through number, inclusive, be output.

Since the request in this thread is to print the 1st field and the last 100 fields and Robin's code is setting columns to the number of fields on the first line in the file named input_file , (assuming that the field delimiter in input_file is a <tab> character, that would be:

cut -f 1,$((columns - 99))- input_file

(the -d delimiter option is not needed since the default delimiter in cut is the <tab> character).

Hi Robin,
The command:

columns=$(set - $(head -1 input_file | tr " " "_") ; echo $#)  # Count number of columns today.  The tr eliminates spaces, just for counting columns

can be changed to:

IFS= read -r line < input_file
columns=$(IFS=$tab; set -- $line; echo $#)

using just shell built-ins without needing to exec head and tr . Note also that the standards explicitly stated that:

set - arg...

produces unspecified results starting in the 2004 revision of the standard; that form was deprecated in the 1992 edition of the standard with the preferred form being:

set -- arg...

rbatte1 · October 7, 2016, 6:41am

Hello Don,

Hi Robin,
The command:
columns=$(set - $(head -1 input_file | tr " " "_") ; echo $#)  # Count number of columns today.  The tr eliminates spaces, just for counting columns
can be changed to:
IFS= read -r line < input_file
columns=$(IFS=$tab; set -- $line; echo $#)
using just shell built-ins without needing to exec head and tr . Note also that the standards explicitly stated that:
set - arg...
produces unspecified results starting in the 2004 revision of the standard; that form was deprecated in the 1992 edition of the standard with the preferred form being:
set -- arg...

Many thanks for the comments (again) and I'm delighted to have a way to save a few processes. I imaging the cost of running what I had in a loop reading a large file and counting columns for each record would be quite expensive. This is very useful.

Regarding the standards change, I was unaware so this is really useful and will hopefully stop me failing in the future.

Kindest regards,
Robin