Help with searching a text file

thekid2 · January 1, 2004, 1:39pm

Hello all!

I've been working for days on this and it is really bugging me!!

Here's my dilemma:

Say I have a very large text file which contains fields delimited my a ':' which logs various records. Each record is separated by a newline character, therefore I can search for lines with 'grep' to find the results i want. The problem is, there may be as many as 30 or so fields with associated values per entry...not an issue...but, if a certain field is not used for a particular record, that field is not included in the file entry (not even a '::' for an empty field...i didn't write the software).

So the problem arrises that yes i can search for lines containing whatever i want, but sometimes I don't want to view 30 or so fields, sometimes i am only interested in one or two fields...now some of you may say use something like '# cut -d: -f22,24' or '# awk -F: {print $1, $2}' but, since not every field is used for ever record, certain data does not always appear in the same field!!!! I can guess which field until i get the correct one, but this is often waaaaayyy too time consuming. Is there a way to somehow use a string or expression to delimit a field (as opposed to a character which cut or awk only seem to do)?? Or is there a way to do this with awk?? Thank you so much...

Sorry if it was really long winded...i'm flustered!!!

google · January 1, 2004, 2:54pm

Please post your operating system, shell, and some sample data. I bet several of the regulars here could provide a quick solution to your problem.

thekid2 · January 1, 2004, 3:31pm

I'm using Solaris 9 with the bash shell...

Here's a sample of what the data may look like:

# grep 'tid=1234567' /$HOME/somedirectory/somefile

:dt=031231130001:ca=50:sq=35:tid=1234567:ul=0542:tk=05:tm=07
:dt=031231130234:ca=23:sq=36:tid=1234567:tk=09:tm=12
:dt=031231130555:ca=99:sq=37:tid=1234567:ul=0425:ty=324:tk=03:tm=03
:dt=031231130925:ca=67:sq=38:tid=1234567:ul=0465:ty=324:uw=0778:tk=02:tm=22

I'd ideally search for any specific results in a particular field (which i can already do), but would like to only print out a certain field to the screen...for example, I'd like to print to the screen only the 'tk' field and value for each line (and not all the other garbage).

For each entry above though, it appears in the 6th, 5th, 7th, and then 8th field, which can be different for any entry.
If i added a 'cut -d: -f6', i'd only get the 'tk' for the first matching line and all the rest would be other fields...

How can I extract only that 'tk' field if it appears in a different place each time? That's why i was wondering if there was a way to use a string or expression as a delimiter rather than a character (like use 'tk' instead of ':' then print the first field, that being 'tk=whatever')

Thanks!

Perderabo · January 1, 2004, 6:26pm

sed -n '/tk=/s/^.*$tk=[^:]*$:.*/\1/p' < datafile

or something like that...

thekid2 · January 1, 2004, 10:58pm

Ohh, I think I am grasping that...please correct me if i am wrong, that basically means:

print 'tk=', then substitute, starting at the beginning of the line, any character, any # of times up to the value of 'tk', and then the ':' after that value, and any other character any number of times...then replace it all with just the value of 'tk', after the 'tk='?

I am still somewhat confused with the notation of the backreference, $tk=[^:]*$, but pretty much get the rest...I'll try it soon!

Thank you Perderabo!

Perderabo · January 1, 2004, 11:22pm

No you have it wrong. The -n says don't print anything unless explicitly told. The /tk=/ selects only lines that have a tk= in them. The s/this/that/p does a substitute on the line and prints the result. The stuff between $ and $ gets saved as a field. The \1 refers to the first saved field.

thekid2 · January 2, 2004, 1:09am

ahh, thanks...