Bash: Pulling first and last character in string

petfyp · December 11, 2015, 6:00pm

I am writing a bash script that will find all references to the �Well_List� in the �Comp_File�.

I am filtering a Well_List that contains the following:

 TEST_WELL_01
 TEST_WELL_02
 TEST_WELL_11
 TEST_WELL_22
 GOV_WELL_1
 GOV_WELL_201
 PUB_WELL_57
 PUB_WELL_82
 . 
 .

Comparison File contains (Comp_File):

 /drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
 /drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/Government_Well_Mnemonics.dlis
 /drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/LL3_0893780002.LIS
 /drive/t/Asset/13_Wells/Test_Well-011/LAS_and_LIS/LL3_9367000.LIS
 /drive/r/Asset/13_Wells/Test_Well-01/LAS_and_LIS/LL3_9367000.LIS
 /drill/t/Cnt-143/Wells/Gov_Well-01/Drilling/Government_Well_Mnemonics.dlis
 /drive/w/Asset/13_Wells/Public_Well_82/County_82/NEUT&DN.LIS
 .
 .

I�ve gone through many of the existing submissions on your webpage, but can�t find what I need. Unfortunately, I�m fairly new to sed which I figured will be needed here. I can perform this task individually (see Code below), but need to get both to perform the search I need in �Comp_File�

Last field of Well_List line: 

for x in `cat Well_List; do; awk �F_ �{print $NF}�; done

First field: 
 for x in `cat Well_List; do; awk '{print substr($0,1,1)}'; done

Although I couldn�t resolve the difference between �Test_Well_1� and Test_Well_01� in my script, which both belong under the same Well Name.

Each comparison result will be placed in an individual file that will be titled after the �Well_List� Well Name (example: TEST_WELL_01.txt)

I hope I didn�t confuse anyone with this. I�ve spent most of today trying to figure this out�..

Could anyone provide some assistance in getting this done? If additional information is needed, please let me know.

Thanks!

durden_tyler · December 11, 2015, 7:13pm

If Perl is an option in your system, then here's a solution:

$ 
$ # list the files in current directory
$ ls -1
compare_and_print.pl
comp_file
well_list
$ 
$ # show the contents of the "well_list" and "comp_file" files
$ cat well_list
TEST_WELL_01
TEST_WELL_02
TEST_WELL_11
TEST_WELL_22
GOV_WELL_1
GOV_WELL_201
PUB_WELL_57
PUB_WELL_82
$ 
$ cat comp_file
/drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
/drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/Government_Well_Mnemonics.dlis
/drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/LL3_0893780002.LIS
/drive/t/Asset/13_Wells/Test_Well_011/LAS_and_LIS/LL3_9367000.LIS
/drive/r/Asset/13_Wells/Test_Well_01/LAS_and_LIS/LL3_9367000.LIS
/drill/t/Cnt-143/Wells/Gov_Well_01/Drilling/Government_Well_Mnemonics.dlis
/drive/w/Asset/13_Wells/Public_Well_82/County_82/NEUT&DN.LIS
$ 
$ # show the Perl program to compare these files and append results
$ cat -n compare_and_print.pl
     1	#!/usr/bin/perl -w
     2	use strict;
     3	
     4	# A hash to store the file names read from "well_list"
     5	my %files;
     6	my $file1 = "well_list";
     7	my $file2 = "comp_file";
     8	
     9	# Read "well_list" and set up a hash key that can be compared easily with
    10	# "comp_file". For a line like "TEST_WELL_01", the hash key would be
    11	# "test_well_1" i.e. lower case for the text joined with the integer value of
    12	# the last number. The value of this key is the actual line itself, since we
    13	# need this to create the file of this name. Thus, after reading line
    14	# "TEST_WELL_01" the hash would look like this:
    15	#     $files{"test_well_1"} = "TEST_WELL_01"
    16	open(FH, "<", $file1) or die "Can't open $file1: $!";
    17	while (<FH>) {
    18	    chomp;
    19	    my $val = $_;
    20	    /^(.*)_(.*?)$/;
    21	    my $key = lc($1)."_".int($2);
    22	    $files{$key} = $val;
    23	}
    24	close(FH) or die "Can't open $file1: $!";
    25	
    26	# Now read "comp_file". Split each line on "/" character. Then transform the
    27	# 5th token, the well name, as per the transformation rule above. That is,
    28	# lower case the text part joined with the integer value of the last number.
    29	# If this key exists in the hash %files, then open the file name derived
    30	# from the hash value and append the line we just read into the file.
    31	open(FH, "<", $file2) or die "Can't open $file2: $!";
    32	while (<FH>) {
    33	    chomp;
    34	    my $line = $_;
    35	    my @tokens = split("\/");
    36	    my ($txt, $num) = $tokens[5] =~ /^(.*)_(.*?)$/;
    37	    my $cmp = lc($txt)."_".int($num);
    38	    if (defined $files{$cmp}) {
    39	        print "Well found, appending to file: ", $line, "\n";
    40	        open(FH1, ">>", $files{$cmp}) or die "Can't open $files{$cmp}: $!";
    41	        print FH1 $line."\n";
    42	        close(FH1) or die "Can't close $files{$cmp}: $!";
    43	    } else {
    44	        print "Well not found               : ", $line, "\n";
    45	    }
    46	}
    47	close(FH) or die "Can't close $file2: $!";
    48	
$ 
$ # Run the Perl program
$ perl compare_and_print.pl
Well found, appending to file: /drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
Well found, appending to file: /drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/Government_Well_Mnemonics.dlis
Well found, appending to file: /drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/LL3_0893780002.LIS
Well found, appending to file: /drive/t/Asset/13_Wells/Test_Well_011/LAS_and_LIS/LL3_9367000.LIS
Well found, appending to file: /drive/r/Asset/13_Wells/Test_Well_01/LAS_and_LIS/LL3_9367000.LIS
Well found, appending to file: /drill/t/Cnt-143/Wells/Gov_Well_01/Drilling/Government_Well_Mnemonics.dlis
Well not found               : /drive/w/Asset/13_Wells/Public_Well_82/County_82/NEUT&DN.LIS
$ 
$ 
$ # now check if new files were created by the Perl program
$ ls -1
compare_and_print.pl
comp_file
GOV_WELL_1
GOV_WELL_201
TEST_WELL_01
TEST_WELL_11
well_list
$ 
$ # check the contents of the new files
$ cat GOV_WELL_1
/drill/t/Cnt-143/Wells/Gov_Well_01/Drilling/Government_Well_Mnemonics.dlis
$ 
$ cat GOV_WELL_201
/drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/Government_Well_Mnemonics.dlis
/drill/t/Cnt-143/Wells/Gov_Well_201/Drilling/LL3_0893780002.LIS
$ 
$ cat TEST_WELL_01
/drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
/drive/r/Asset/13_Wells/Test_Well_01/LAS_and_LIS/LL3_9367000.LIS
$ 
$ cat TEST_WELL_11
/drive/t/Asset/13_Wells/Test_Well_011/LAS_and_LIS/LL3_9367000.LIS
$ 
$

petfyp · December 11, 2015, 7:28pm

Thanks durden_tyler for the response.

Your script is great! I've been wanting to learn Perl and will definitely try this out (and research the commands). Once I can understand it more, I could probably walk through it with my team mates.

Unfortunately, this script will also be used by my work-mates (which, like myself, don't know much about Perl). If we need to update the script, we need to know how to read it (got any good references I could use to learn Perl like a good book, URL, etc?)

Truly, Thank you. Your script will not go to waste. I promise you that!

durden_tyler · December 11, 2015, 8:14pm

Sure, you're welcome!
If you have never programmed in Perl, then it might take some time to become comfortable with the language. But if you're going to do complex file processing, then learning a scripting language (any language: awk, Perl, Python, Ruby etc.) might be a good investment of your time.

A good way to learn Perl would be to start small and understand one concept at a time. For example,

the shebang line (first line)
the purpose of "use strict"
hashes, arrays, scalars,
how files are read, how they are written
regular expression syntax
functions like "lc", "int"
operators like "defined" etc.

The Perl documentation is the best place to get information about these individual bits:
(1) perldoc.perl.org

Simon Cozen's online book is good for understanding programming in Perl (only the first 6 chapters should be enough):
(2) Beginning Perl (free) - www.perl.org

The books by O'Reilly are well regarded:
(3) Learning Perl
(4) Intermediate Perl

And for the advanced Perl programmer:
(5) Programming Perl

Cookbooks are very interesting as well - they provide just enough information to get the job done. An online Perl cookbook is:
(6) PLEAC-Perl

And the published book is:
(7) Perl Cookbook, 2nd edition

This website has a list of Perl tutorials on the Internet and also mentions which ones are good and which ones are bad:
(8) Perl Tutorial Hub

However, besides reading about Perl, you'll also need to actually write programs to understand it.

=================

Having said that, there should be other forum members here who could post awk or plain Bash scripts for your problem, so keep an eye on this space.

RudiC · December 12, 2015, 6:07am

Not sure I fully understood what you're after, but you could use this as a starting point:

awk -F/ '
FNR==NR         {n = split ($1, T, "_")
                 sub (T[n] "$", sprintf ("%d", T[n]), $1)
                 KEY[$1]
                 next   
                }

                {SRCH = toupper($6)   
                 gsub (/-/, "_", SRCH)   
                 n = split (SRCH, T, "_")
                 sub (T[n] "$", sprintf ("%d", T[n]), SRCH)
                }

SRCH in KEY     {print  > SRCH ".txt"}
' file1 file2
cf *WELL*
GOV_WELL_1.txt:
/drill/t/Cnt-143/Wells/Gov_Well-01/Drilling/Government_Well_Mnemonics.dlis
GOV_WELL_201.txt:
/drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/Government_Well_Mnemonics.dlis
/drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/LL3_0893780002.LIS
TEST_WELL_11.txt:
/drive/t/Asset/13_Wells/Test_Well-011/LAS_and_LIS/LL3_9367000.LIS
TEST_WELL_1.txt:
/drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
/drive/r/Asset/13_Wells/Test_Well-01/LAS_and_LIS/LL3_9367000.LIS

Scrutinizer · December 12, 2015, 7:07am

Another awk version:

awk '
  {
    i=(NR==FNR)?1:(NF-2)
    split($i,F,/[-_]/)
    v=toupper(F[1] "_" F[2] "_" F[3]+0)
    sub("PUBLIC", "PUB", v)
  } 
  NR==FNR { 
    A[v]=$1
    next
  } 
  v in A {
    close(f)
    f=A[v] ".txt"
    print >> f
  }
' Well_List FS=/ Comp_File

$ grep "" *WELL*
GOV_WELL_1.txt: /drill/t/Cnt-143/Wells/Gov_Well-01/Drilling/Government_Well_Mnemonics.dlis
GOV_WELL_201.txt: /drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/Government_Well_Mnemonics.dlis
GOV_WELL_201.txt: /drill/t/Cnt-143/Wells/Gov_Well-201/Drilling/LL3_0893780002.LIS
PUB_WELL_82.txt: /drive/w/Asset/13_Wells/Public_Well_82/County_82/NEUT&DN.LIS
TEST_WELL_01.txt: /drive/t/Asset/13_Wells/Test_Well_1/LAS_and_LIS/LL3.LIS
TEST_WELL_01.txt: /drive/r/Asset/13_Wells/Test_Well-01/LAS_and_LIS/LL3_9367000.LIS
TEST_WELL_11.txt: /drive/t/Asset/13_Wells/Test_Well-011/LAS_and_LIS/LL3_9367000.LIS