chr1 100 200 + gene_name "alpha"; protein_name "alpha"; level 2; tag "basic"; info "known";
chr1 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; level 9; tag "basic"; info "uknown";
chr1 310 320 + gene_name "alpha"; protein_name "alpha-4"; level 2; info "known";
chr1 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; tag "basic"; info "valid";
The above input file has varying field separators and has more than 1 million rows. If I want certain columns, I know that I can use awk to print only certain columns. But, my input file has varying number of columns too. So, I can't do it.
My request here is to print only certain parts of a row by using grep until the semicolon. So, I need chr, start, stop, symbol, gene_name, protein_name and info from each row.
My output will be
chr1 100 200 + gene_name "alpha"; protein_name "alpha"; info "known";
chr1 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; info "uknown";
chr1 310 320 + gene_name "alpha"; protein_name "alpha-4"; info "known";
chr1 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; info "valid";
How do I print only the grepped contents until the semicolon?