Awk script to create new file based on previous line

Need help creating a script that does the following:

[*]Sort a file
[*]Compare the previous line "last field" with current line "last field"
[*]If they are the same, print output to a file
[*]If they are different print output to a new file
[*]The script should keep creating new files if the previous line "last field" does not match the current lines last field

i.e.

Input file has the following:

001901070|S|DOE|JOHN|A|19630219|11|10|1|20021119|99991231|UNK|UNK||1005190567|IMM_20090228
001901070|S|DOE|JOHN|A|19630219|11|15|0|20021119|20031001|PMC|U0983AA||1005190567|IMM_20090131
001901070|S|DOE|JOHN|A|19630219|11|15|1|20031125|20090801|AVB|U1269AA||1005190567|IMM_20090131
001901070|S|DOE|JOHN|A|19630219|11|15|1|20061030|20070801|PMC|UNK||1005190567|IMM_20090131
001901070|S|DOE|JOHN|A|19630219|11|29|1|20021213|99991231|UNK|FAV065||1005190567|IMM_20090228
001901070|S|DOE|JOHN|A|19630219|11|29|2|20030122|20030205|MIP|FAV065||1005190567|IMM_20090330
001901070|S|DOE|JOHN|A|19630219|11|29|3|20030211|20030711|MIP|FAV065||1005190567|IMM_20090630
001901070|S|DOE|JOHN|A|19630219|11|29|5|20070215|20070819|MIP|FAV103||1005190567|IMM_20090630
001901070|S|DOE|JOHN|A|19630219|11|29|9|20090325|20090921|MIP|UNK||1005190567|IMM_20090630
001901070|S|DOE|JOHN|A|19630219|11|9|1|20021119|20121116|UNK|UNK||1005190567|IMM_20090330

The field separator is a "|"

what have you tried till now??

I am able to print the line if the the previous variable is different, but I am not able to:

  • Create a new file if the content differs
  • write to the existing file if they are the same

I need to find a way to keep creating files as well as append to current file if they are the same. This is the only thing I have so far:

awk -F '|' '$16 != prev {print; prev=$16}' Test_File.dat

If I change the compare the $16 = prev, I get no output at all. I have already sorted the file, so there should be some entries that are the same.

That wasn't much fun. Took me a while to clue into the fact that arrays are associative and not numeric. Sorting was a pain until I made the element name a combo of the the last field and a record number to guarantee uniqueness.

Without GAWK the function asorti() is unavailable so I'm hoping you have GAWK. I'm a little unsure what you describe about where the records end up but I think you mean.. All the records that match the last field go into unique files.

cat input13.txt | gawk -F '|' '
BEGIN {
  i = 1;
}
{
  idx = $16 i;
  l[idx] = $0;
  ++i;
}
END {
  n = asorti(l, ni);
  prev = "";
  for (j = 1; j <= n; ++j) {
    outline = l[ni[j]];
    x = split(outline, al, "|");
    
    file = "out_" al[x] ".txt";
    if (al[x] != prev) {
      print "Unique " al[x];
      print outline > file;
      prev = al[x];
    } else {
      print "Same   " al[x];
      print outline >> file;
    }
  }
}'

Your data generates the following files...

out_IMM_20090131.txt
out_IMM_20090228.txt
out_IMM_20090330.txt
out_IMM_20090630.txt

Each file contains records that match the last field.

There's an easier way to do this and you can skip the sort. Here's the pseudo code...

# Remove all out_ files.
# Foreach record of input
#    File is record "out_" field[last] ".txt"
#    Append record to file

...which turns into...

rm out_*.txt > /dev/null 2>&1
cat input13.txt | gawk -F '|' '
{
  file = "out_" $16 ".txt";
  print $0 >> file
}'
1 Like

Hi,
Try this.

sort -t "|" -k 16 Test_File.dat | awk -F "|" '{print > $NF}'

Thanks
Guru.

1 Like

Thanks to everyone who replied. I was able to use the 1 liner posted by Guru, it worked perfectly. I appreciate all the time and effort everyone put into this.