analyzing data from more than one file

shira · January 25, 2009, 7:37am

Hello,

I have two data (.txt) files which I need to do some operations on them simultaneously. for example:

file1:
word11 word12 word13
word21 word 22 word 23
word31 word32 word33

file2:
word11 word12 word13
word21 word 22 word 23
word31 word32 word33

I need to see if each word in the first column in the first file exists in the second column in the second file. This is not the problem, the problem is to receive both of the files to the same script. Is it possible?

I thought about two options, but none of them worked;

first option:
I wrote a main script and pipelined the first file to a secondary script (let's call it scr1), and pipelined the other file to another secondary script (let's call it scr2). In scr1 I moved the word I needed into a var, and in scr2 I did the same thing to the word I needed in the second file. The problem is that now I can't compare between these two vars, because they're in different scripts. Is there a way to do that?

second option:
I wrote a main script and pipelined the two files to a secondary script, but I can't do
set line1 = ($<)
set line2 = ($<)
(obviously)
and also, it joins both of the files together so I can't refer to both of them, because it's one file now.

Is there another way to do this? And was the beginning of one of my ways somehow correct?

I have to use C-Shell and I can't use sed and awk.

Thank you so much if you're able to help me.
Shira.

Franklin52 · January 25, 2009, 7:42am

Why not? Is this a homework assignment?
Homework questions are not allowed, please read our rules.

Regards

shira · January 25, 2009, 7:48am

I'm using c-shell since I have background in ansi c and it's easier to comprehend, that's it.

cfajohnson · January 25, 2009, 8:55am

You'll find scripting much easier with a POSIX shell, and you'll find far more people ready to help, and much more documentation online and in print.

Top Ten Reasons not to use the C shell
Csh problems
Csh Programming Considered Harmful

shira · January 25, 2009, 9:14am

Thank you for your response!
The thing is that I've almost finished my script, I just have this problem which I cannot overcome. If there's anyone who is able to help, I would really appreciate it.

cfajohnson · January 25, 2009, 9:26am

Perhaps you have run up against one of the deficiencies of csh?

If it's not homework, why can't you use sed or awk?

shira · January 25, 2009, 9:37am

When you want to study a language to its depth, you have to have strong basic skills and then you can use the shortcuts. Like when I studied ansi c, I didn't use all the <string.h> functions, I wrote them by myself.
And as you can see, I cannot solve this particular problem, so I have to learn this in order to progress.
Also, cshell is a basic script language. I will move on to more advanced languages when I feel I have enough knowledge.

cfajohnson · January 25, 2009, 10:19am

In csh, there may be no solution. Progress is moving to a real scripting language.

No, it's not. It is a half-baked scripting language, and it is not used for serious scripting. It is not even required to be present on a UNIX system.

In a POSIX shell, your problem is trivial.

You have all the knowledge you need. Move on.

shira · January 25, 2009, 10:41am

Wow, Bill Joy is not very popular in here, heh?
Thanks for believing in me, but you've made me even more anxious to find out the answer (though I will move on, I promise).

cfajohnson · January 25, 2009, 10:57am

No one has said anything against Bill Joy; when it was written csh was an improvement for interactive use, but not for scripting.

What makes you think there is an answer in csh. Some things simply cannot be done in csh.

Have you read this article? Escpecially note item no. 5!
Top Ten Reasons not to use the C shell

shira · January 25, 2009, 12:09pm

I've read all the articles that have anything to do with c shell.
And they don't say that there isn't any solution, they say that there is a solution, but it's messy, long and ugly.

But I have the solution to my problem, and I'll write it down for all the future head-breakers. It is so simple, and I can't believe I didn't think about it myself (McCartney says it better - "With a little help from my friends"):

When you want to compare a word from a column in one file to a word from another column in another file, you simply do this (for any i, j you choose):

set list1 = `cut -d" " -fi ${1}`
set list2 = `cut -d" " -fj ${2}`
#We simply "crop" all the words in field i (in the first file) into a list called list1,
#and the same for the second file. Notice the back quotes!

So what was my head-on-the-wall-banging about?
Well, I thought about doing that, but since there isn't any single-charactered showcase for enter (\n) (for what I know of), I didn't think it would work. So what's the trick? The cut command, by definition, goes through each line until it reaches the end of the file, so out of each line it only takes the word in your chosen field and puts it in the list by order.

If there's anyone who wants to read the full program, you're welcome to leave me messages with your e-mail.

What's the lesson? Always check your ideas before eliminating them.

Franklin52 · January 25, 2009, 12:28pm

Simply with awk:

awk 'NR==FNR{a[$1];next} $2 in a' file1 file2

Regards

vgersh99 · January 25, 2009, 12:53pm

shira:

I've read all the articles that have anything to do with c shell.

<chow-chow>

When you want to compare a word from a column in one file to a word from another column in another file, you simply do this (for any i, j you choose):
set list1 = `cut -d" " -fi ${1}`
set list2 = `cut -d" " -fj ${2}`
#We simply "crop" all the words in field i (in the first file) into a list called list1,
#and the same for the second file. Notice the back quotes!
<chop-chop>

After all - this ain't a csh 'solution' - it's a 'cut' solution which could have been achieved with sed/awk/... in the same manner.

shira · January 25, 2009, 1:21pm

All of you are right.
But I needed a solution which doesn't contain sed or awk.
Thanks for the responses, though.