Hi I have multiple files for which I want to use awk for the following:
Read each line in each file- if any of the columns match "PVALUE=" followed by the number, then print the line in case the number following "PVALUE=" is greater than 0.05.
I did the following:
ls *.txt | xargs -I @ -P15 sh -c "awk {if /PVALUE=(\d+)/)){if ($1 > 0.05){print $_}}}' @ - >@.fail"
but i get an error message:
sh: -c: line 0: syntax error near unexpected token `('
I appreciate suggestion in correcting the one liner for accommodating all of the above.
I'm not at all sure that I understand what you're trying to do here. And saying you're getting an error message without showing us the exact error you're getting leaves us with a lot of guess work. But whatever you do is going to need matched single quotes surrounding your awk script, and xargs is not likely to replace @ inside a double quoted string with the filenames you seem to want.
But even after you get the quotes fixed (and getting rid of the unneeded sh -c and the extra set of double quotes needed for that, you still have several syntax errors in your awk script. Please show us a sample of the data in one of these text files and the output you are hoping this awk script will produce from that sample input file.
Your first post implies that there are multiple fields in your text files, and that you want to examine every column, so we also need to know that your text files are using as field delimiters.
And, especially since you're depending on non-standard options in the xargs utility you're using, we also need to know what operating system and shell you're using.
xargs won't help you here. It will put the file names found onto the command line as parameters to awk , like the shell does in above proposal. In either case, awk will work on that input stream writing ALL results to stdout. If you want the output by input file name, you need to redirect within awk .
The 1st post in this thread explicitly requested that every field in your input files be searched for PVALUE=number . But, the sample data provided never shows more than once such string on an input line and, on lines that do have something matching that pattern, it always appears in the last field on the line. But, we have no indication of whether or not the sample data provided in post #5 in this thread is representative of the actual data that needs to be processed. From the code samples posted, it appears that the submitter wants one output file produced for each input file that contains matched lines. The submitter seems to also want to have 15 copies of awk running in parallel (which only makes sense if those 15 awk commands won't be thrashing CPU and/or disk accesses.
Assuming that parallel processing won't really help much here (and might actually slow down processing), avoiding xargs completely, and assuming that an input line may contain more than one of the patterns above; I would try something more like:
Note that this doesn't create an output file for every input file; it only creates an output file if one or more lines in the corresponding input file meets your criteria.
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .
OK. I completely misunderstood your example. I thought your field separator was <semicolon>, but now I'm guessing that <tab> is your field separator, and <semicolon> is a subfield separator in your third field.
And you are wrong. The code I suggested produces a separate output file for each input file that contains lines that meet your criteria.
Using your updated description (but assuming that no <semicolon> characters appear anywhere in the 1st two fields in your input files AND assuming that a single <tab> character separates the first three fields), my code adjusted for your new description of the problem is:
WHY do you insist on xargs ? You have received some proposals working entirely without it, although they may be somewhat off target as the target is not THAT clear. perl , sed , awk - they all will do what (we think) you need on an input stream of the desired file names.