I have a file like below. It has 500K lines.
I want to extract TAG_IDs shown in single quote at the end to copied to another file. As if I had copied the TAG_IDs using block select (Column Select) in modern text editor
$ cat file.txt
UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN ('Span', 'SpanUnitOfMeasure') AND TAG_ID = '1-19KYTMQY';
UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN ('Span', 'SpanUnitOfMeasure') AND TAG_ID = '1-19KZQV5B';
UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN ('Span', 'SpanUnitOfMeasure') AND TAG_ID = '1-19N01VZQ';
.
.
.
.
.
expected output in the output file (with or without single quotes is fine)
In awk language $ means value of any field(we could setup field's separator as per our need like in my solution I have made them to '; and in Rudi's case it is default which is space), so $16 means first field of the line and NF denotes the number of fields in a line, which will print the value of last field
Following is the explanation for solution in more details.
I- Field first will always start from left only.
II- Though you could read fields either from left to right OR right to left.
III- Let's take a look after setting field separator to -F"'|;" and go through all the fields in the line by a loop to see what are the values we have:
awk -F"'|;" '{for(i=1;i<=NF;i++){print "Field number=",i,"Field value=",$i}}' Input_file
Field number= 1 Field value= UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN (
Field number= 2 Field value= Span
Field number= 3 Field value= ,
Field number= 4 Field value= SpanUnitOfMeasure
Field number= 5 Field value= ) AND TAG_ID =
Field number= 6 Field value= 1-19KYTMQY
Field number= 7 Field value=
Field number= 8 Field value=
Field number= 1 Field value= UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN (
Field number= 2 Field value= Span
Field number= 3 Field value= ,
Field number= 4 Field value= SpanUnitOfMeasure
Field number= 5 Field value= ) AND TAG_ID =
Field number= 6 Field value= 1-19KZQV5B
Field number= 7 Field value=
Field number= 8 Field value=
Field number= 1 Field value= UPDATE TAGREF SET TAG_VAL = NULL WHERE ATTR_NAME IN (
Field number= 2 Field value= Span
Field number= 3 Field value= ,
Field number= 4 Field value= SpanUnitOfMeasure
Field number= 5 Field value= ) AND TAG_ID =
Field number= 6 Field value= 1-19N01VZQ
Field number= 7 Field value=
Field number= 8 Field value=
IV- So here you could see above 6th field's value is the one which you need to print.
Let me know if you have any more queries on same.
I didn't know that awk can have multiple field delimiters . So, -F"'|;" means , the field delimiter can be either a single quote or a semicolon .
Why was semicolon included as the delimiter ? Because, the following command without semicolon as the delimiter works fine too.