grep -v Following < inputfile > outputfile
Thanks.
How should I start learning shell scripting/awk programming better. Any book?
Thanks again.
In addition to the grep Corona688 provided, you could also add another output file to the awk script I provided, or add an option to the script to control whether or not marker lines should be included in the tro.txt output file, or just always leave out the markers in the tro.txt output file.
Hi,
I have two files:
11.txt showing two patterns:
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C
TER
ENDMDL
c.txt
Number of groups: 40 3.95
Group: 0 Branches: 1
0 001
Centre: 001 Nodes: 1
Group: 1 Branches: 1
0 002
Centre: 002 Nodes: 1
Group: 2 Branches: 6
0 009
1 004
2 008
3 007
4 005
5 006
Centre: 006 Nodes: 6
ENDMDL is coming many times in 11.txt. I wish to retreive that pattern corresponds to the value of Id. It means, if I give input of 004 (Id) from group 2, then it should output the fourth repeat from 11. txt ending with ENDMDL.
Id004.txt
Group2: Id 004
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C
TER
ENDMDL
So, corresponding to value of Id from c.txt, I want to retreive the repeat at the number from 11.txt.
Please guide, how, corresponding to value of Id from c.txt, I can retreive the repeat at the number from 11.txt.
Also, I wish to retreive these patterns in individual files based on their Id, group, centre. For example:
group0.txt contains all patterns with Id
group1.txt contains all patterns with Id
group2.txt contains all patterns with Id
One file containing patterns with corresponding to centre ID
Id001.txt
Id002.txt
Id009.txt
............
............
Thanks
This is the third or fourth problem you have posted to this thread. Reading through the thread it is getting hard to determine which problem is being addressed by some of the comments.
I have shown you how to read 11.txt
, accumulate the entries in it for each set of lines ending with an ENDMDL line, and print selected entries from the accumulated list. You know what files you want to create and what you want in them, so why don't you try putting together an awk script to do that and let us know what isn't working.
From your description of groups, centres, and IDs, I have no idea how many files you want created nor what is supposed to be in each of them. I also don't see any use for the lines starting with Centre:
in your c.txt
file; they just have the characters Centre:
followed by the Id of the last Branch in the Group that they follow, followed by the characters Nodes:
, followed by the number of branches listed on the preceding Group:
line. What is the difference between a Node and a Branch? What is the difference between a Group and a Centre?
If you can't do this awk script yourself, you're going to have to give us a lot more detail specifying the exact list of the files you want produced in response to the snippet from c.txt
you provided, along with the data that you want written into those files.
Thanks
I will post it in a new thread with more detail.
Hi,
Script at # 15 is working great
I have two questions related to it.
(1) If I only want patterns from 11.txt which are divisible by 100 with field 1 ( that means file for no entry if $1%100 != 0), only file no.txt
(2) Also, is it possible to number rows (whose 1st field is divisible by 100 and used for retreiving patterns from 11.txt) and also to number patters retreived from 11.txt
Shall I use following code for (1):
no=${1:-no.txt} # name of file for no entry if $1%100 != 0
awk -v no="$no" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
if($0 == "ENDMDL") rc++
next}
{ # If we got to here, we are reading lines from the 2nd file.
# Determine exact, truncated, and rounded entry numbers.
if (substr($1, length($1) - 5) == "00.000") {
# $1 ends in 00.000; no truncation or rounding needed.
entry = substr($1, 1, length($1) - 6)
round = trunc = 0
} else {
# $1 is not evenly divisible by 100; calculate rounded and truncated
# values.
entry = 0
round = sprintf("%.0f", $1 / 100)
trunc = substr($1, 1, length($1) - 6)
}
# Write the appropriate entry
# to each output file.
printf("%s", r[entry]) > no
}
}'
11.txt o.txt
Thanks.
No. I assume that you tried running this awk script and got an error saying that your open "{" s didn't match your "}"s. Since you moved the filenames to be processed to a line of their own, if the awk script had run it would have tried to read both input files from standard input (not from 11.txt and o.txt). And, instead of skipping over lines that had $1 that did not end in 00.000, it would have written an entry for the 0th element in 11.txt. In this case you would get what you want since r[0] is an empty string and writing it to the file no
wouldn't have done anything.
A corrected and simplified version of this script would be something like:
awk -v no="no.txt" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
if($0 == "ENDMDL") rc++
next}
{ # If we got to here, we are reading lines from the 2nd file.
# Determine exact, truncated, and rounded entry numbers.
if (substr($1, length($1) - 5) == "00.000") {
# $1 ends in 00.000; write an entry corresponding to this line.
entry = substr($1, 1, length($1) - 6)
# Write the appropriate entry
# to each output file.
printf("%s", r[entry]) > no
}
}' 11.txt o.txt
Yes it is possible to number entries from 11.txt
and to number rows from o.txt
, but you'll have to specify what you mean by that by showing the exact output that you want to appear in no.txt
when using your 11.txt
and the following instead of your version of o.txt
:
100.000
2010.000
1000.000
If you're talking about adding a tag line to the output specifying the entry # from 11.txt and the line number from o.txt, you have seen examples of how to produce tag lines in earlier scripts I have provided (including the script your stripped down to produce the script above). The entry number from 11.txt
being printed is specified by the variable entry
and the line number from o.txt
producing an output line is specified by the variable FNR
.
One way to add a tag doing this would be to change the last printf in the above script from:
printf("%s", r[entry]) > no
to:
printf("The following entry from line %d is for Branch %d:\n%s",
FNR, entry, r[entry]) > no
If you want each line of output in no.txt
to include the Branch #. That is also easy to do, but changes the code where entries are accumulated from 11.txt
instead of changing the printf at the end of the script. If you want each line of output in no.txt
to include the Branch # and the line # from o.txt
, that can also be done, but it will involve changing the way the script accumulates and prints entries from 11.txt
.
Thanks.
I will try and let you know.
Its working
printf("The following entry from line %d is for Branch %d:\n%s",
FNR, entry, r[entry]) > no
But if I want to print the full line as well as branch. Also, I want serial no.
Required output:
(001) The following entry from entry 5 "print full line here" is for branch 2711:
# Branch 2711 is printed here
(002) The following entry from entry 9 "print full line here" is for branch 2716:
# Branch 2716 is printed here
(003) The following entry from entry 13 "print full line here" is for branch 2916:
# Branch 2916 is printed here
Then, using other file (2.txt having one column of some serial numbers) I wish to retreive those branches from above output corresponding to values from 2.txt. For example, I want to retreive 002 from above output:
Required output:
(002) The following entry from entry 9 "print full line here" is for branch 2716:
# Branch 2716 is printed here
Please guide.
Thanks
With all of the examples I've provided you in both of the active threads you started titled "Help in awk/bash", you should be able to replace the awk printf statement:
printf("The following entry from line %d is for Branch %d:\n%s",
FNR, entry, r[entry]) > no
with one that will produce the output you want.
You know how to create a variable to count the number of lines you've written (e.g., outcnt
), you know how to increment that variable before retrieving its value ( ++outcnt
), you know how to use a printf format specifier to print a value as a 3 digit decimal value with leading zero fill ( %03d
), you know that in awk $0
is the contents of the current line, and you know how to use a printf format specifier to print a variable as a string ( %s
).
The only thing you might be missing is how to print a double quote character in a printf format string (since you want the full line to be printed between double quote characters). You do that by escaping each double quote you want to print with a backslash character. An example doing that is:
printf("Print a \"%s\" string\n", "quoted")
Please show me that all of the time I've put into providing samples for you is helping you learn how to use awk by trying this one on your own and then showing us what you've done!
Thanks.
Its working. I modified the way of printing and got the required output.
Great.
Have you figured out how to use awk, sed, or the shell to extract entries listed in 2.txt
from the output you just produced?
I modified the code in printf statement for first output and that helped me in getting second output. yippie
Thanks a lot to you.