2 files, first one has 3 fields seperated by ||| and 2nd one is plain text.
I want to copy the lines from the first file if the 2nd field is present anywhere in the text file. This is what I've tried, but I'm new to awk and shell scripting in general so it's kinda broken.
#!/bin/awk -f
BEGIN
{
FS=" ||| ";
}
FILENAME=="./input/text"
{
# for all lines
for (i; i < NR; i++) {
# for all records on the current line
for (j, j < NF; j++) {
# if the 2nd field is seen in the text, increase its count
if ($2 == $j) {
seen[$2]++;
}
}
}
}
# If the 2nd expression was seen at least once, print its corresponding line
{
if ( seen[$2] > 0 ) {
print $1 " \|\|\| " $2 " \|\|\| " $3;
}
}
the first sed command just merges " ||| " to "|", to make the awk-statement easier, last sed changes | to ||| again
$(echo $line | awk -F "|" '{print $2}')
this part stands for the part in the middle of your input file
abc kek
def foo
..
..
an easier way perhaps is, to write that output to a variable, and grep for that variable, it's just shorter
so you grep (-q for quite, so no output from grep) , line per line, the middle part of file1 out of file2, and if the return code of grep is 0, the string was found, and the whole line is printed out, that's
with this method , you have to call grep+awk 500 times if there are 500 lines in file1. further adding to the inefficiency is if there are many lines in file2 (x500 times).
First of all, thanks a bunch, that script is allowing me to get rid of a lot of unnecessary data in my files. I've been using the script from ghostdog74 and it works great for the most part but it gets stuck when the expression has reserved signs like the ones listed below (and probably some others).
? | + -
Any way to force those characters to be considered as text?
I'm thinking about just changing all special characters by adding a "\" before them, and then undo that change when the processing is done. Is that optimal?
use strict;
my %hash;
open my $fh,"<","file2.txt";
while(<$fh>){
chomp;
my @tmp=split(/\|{3}/,$_);
$tmp[1]=~s/^\s*//;
$tmp[1]=~s/\s*$//;
$hash{$tmp[1]}=$_;
}
close $fh;
while(<DATA>){
foreach my $item(keys %hash){
if (/$item/){
print $hash{$item},"\n" ;
delete $hash{$item};
}
}
}
__DATA__
bla bla bla jkl bla bla bla
def foo test 123. abc kek jkl