String pointer does not work

Hello, I am trying to reverse complement one string and reverse another (NO complement!), both with pointer. My code compiled without error, but did not do the job I wanted.

#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>
#include "kseq.h"
 
// STEP 1: declare the type of file handler and the read() function
KSEQ_INIT(gzFile, gzread)
 
char *rev_comp(char *seq);
char *strrev(char *str);
 
int main(int argc, char *argv[])
{
    gzFile fp;
    kseq_t *seq;
    int l;
 
    char *entry_rc;
    char *qual_rv;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s <infile.fastq> ", argv[0]);
        return 1;
    }
 
    fp = gzopen(argv[1], "r");                      // STEP 2: open the file handler
    seq = kseq_init(fp);                              // STEP 3: initialize seq
 
   
    while ((l = kseq_read(seq)) >= 0) {          // STEP 4: read sequence
      if (strlen(seq->seq.s) ==  strlen(seq->qual.s))   
{
//  entry_rc = malloc(sizeof(char)*strlen(seq->seq.s));          //Line 39:  This line is not needed! But Why????
   qual_rv =  malloc(sizeof(char)*strlen(seq->qual.s));          //Line 40: Not sure this is the right way either.
    entry_rc = rev_comp(seq->seq.s);                             //Line 41: Seems fine to have the DNA rev_complemented!
    qual_rv = strrev(seq->qual.s);                               //Line 42: This line did not work!
    printf(">%s\n%s\n%s\n%s\n%s\n", seq->name.s, seq->seq.s, seq->qual.s, entry_rc, qual_rv );
    }
 }
 
    kseq_destroy(seq);                         // STEP 5: destroy seq
    gzclose(fp);                                    // STEP 6: close the file handler
 
    return 0;
}
 
/*************************** Functions ***************************************************/
/*1. reverse and complement the sequence*/
 
char *rev_comp(char *seq) {
    if( seq == NULL || !(*seq) )
            return NULL;
 
    int i, j = strlen(seq)-1;
    char *seq_rc;
 
    seq_rc = malloc(sizeof(char) * (j+1));
 
    for(i=0; j>=0; i++, j--) {
        *(seq_rc+i) = *(seq+j);
        if (*(seq_rc+i) == 'A') *(seq_rc+i) ='T';
        else if (*(seq_rc+i) == 'C') *(seq_rc+i) ='G';
            else if (*(seq_rc+i) == 'G') *(seq_rc+i) ='C';
                else if (*(seq_rc+i) == 'T') *(seq_rc+i) ='A';
                    else
                         *(seq_rc+i) ='N';
    }
 
    return seq_rc;
}
 
 
/*2. reverse the quality string */
 
char *strrev( char *s)
  {
  char  c;
  char* s0 = s - 1;
  char* s1 = s;
 
  /* Find the end of the string */
  while (*s1) ++s1;
 
  /* Reverse it */
  while (s1-- > ++s0)
    {
    c   = *s0;
    *s0 = *s1;
    *s1 =  c;
    }
 
  return s;
  }

The problem seems to be with Line 42.
I believe strrev() function is fine as it was tested to be working well with another program. It may be related to the fact of seq->qual.s, which is a string pointer like seq->seq.s, which function rev_comp() worked well with. From my previous post, I seemed to understand the difference between string pointer and string array, so I tried use malloc to allocate memory for both of them, especially here I want reverse the string seq->qual.s in place with strrev() function.

My questions are:
1) What is wrong with Line 42 and Line 40?
2) Why Line 39 is not needed (tested!) and Line 41 worked well with string pointer as parameter?

infile:
@HWI-EAS121
CGTTACGAGATCGGAAGAGCGGTTCAGCAG
+HWI-EAS121
aaaaa`b_aa`aa`YaX]aZ`aZM^Z]YRa
@HWI-EAS122
GGGTGGGCATTTCCACTCGCAGTATGGGTT
+HWI-EAS122
a``^\__`_```^a``a`^a_^__]a_]\]
@HWI-EAS123
CGTTTATGTTTTTGAATATGTCTTATCTTA
+HWI-EAS123
abaa`^aaaaabbbaababbbbbb`bbbb_

I have attached the kseq.h header file which is needed to handle zipped or unzipped sequence files. What I expected to have 2nd, 6th, 10th row of string reverse complemented (DNA), and the 4th, 8th, 12th row of strings reversed of the infile, but the program did not do the job. Here is the output:

output:
>HWI-EAS121
CGTTACGAGATCGGAAGAGCGGTTCAGCAG
aRY]Z^MZa`Za]XaY`aa`aa_b`aaaaa
CTGCTGAACCGCTCTTCCGATCTCGTAACG
aRY]Z^MZa`Za]XaY`aa`aa_b`aaaaa
>HWI-EAS122
GGGTGGGCATTTCCACTCGCAGTATGGGTT
]\]_a]__^_a^`a``a^```_`__\^``a
AACCCATACTGCGAGTGGAAATGCCCACCC
]\]_a]__^_a^`a``a^```_`__\^``a
>HWI-EAS123
CGTTTATGTTTTTGAATATGTCTTATCTTA
_bbbb`bbbbbbabaabbbaaaaa^`aaba
TAAGATAAGACATATTCAAAAACATAAACG
_bbbb`bbbbbbabaabbbaaaaa^`aaba

I know this post is a little too long, but the idea of code is to call two functions to manipulate the string within the file. Could not figure out the bug by myself while trying to catch string pointer. Thanks a lot in advance!

rev_comp malloc's memory for it's result, so you don't need Line 39 to do that.

strrev has a side-effect: it actually changes the input. then it returns the same pointer you give it! look how the two lines are the same

>HWI-EAS121
CGTTACGAGATCGGAAGAGCGGTTCAGCAG
aRY]Z^MZa`Za]XaY`aa`aa_b`aaaaa # qual.s
CTGCTGAACCGCTCTTCCGATCTCGTAACG
aRY]Z^MZa`Za]XaY`aa`aa_b`aaaaa # qual_rv

easiest solution...

if (strlen(seq->seq.s) ==  strlen(seq->qual.s)) {
    entry_rc = rev_comp(seq->seq.s);
    printf(">%s\n%s\n%s\n%s\n%s\n", seq->name.s, seq->seq.s, seq->qual.s, entry_rc, strrev(seq->qual.s));
    free(entry_rc);
}

seq->qual.s will be reversed now in memory, but you don't seem to use it again ...

Thanks Scott!

Yes, this part is understood now. Ah, always I can't debug until someone points it out!
look how the two lines are the same. That's the problem I was trying to solve with Line 42, and your suggestion as strrev(seq->qual.s) which I have tried, but did not work.
By the way what do you mean

by this sentence? I was trying to replace the string in place, but not sure I am doing it the right way. Thanks again!

strlen(string) is one too short to store a string, since it doesn't count the NULL terminator as part of the length.

If you want the string to inhabit new memory, you have to actually put it in the new memory. For qual_rv, you do:

pointer=(pointer to new memory via malloc);
pointer=(pointer to the same old memory after overwriting it);

...which makes the malloc pretty pointless, no?

In summary:

Pointers do absolutely nothing magic for you. Assigning a pointer does not assign its contents.

Assigning a pointer does not assign its contents. I had thought I understand this, but actually I do not! Here is another example I thought I understood, but I am so unsure even it compiles fine and runs well as I expected.

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

//_____________Version 1_______________
void *strrev1(char* begin){
    char temp;
    char* end;
    end = begin + strlen(begin)-1;

    while(end>begin){
        temp = *end;
        *end = *begin;
        *begin = temp;
        end--;
        begin++;
    } 
}

//_____________Version 2_______________
char *strrev2( char *s)
  {
  char  c;
  char* s0 = s - 1;
  char* s1 = s;
 
  /* Find the end of the string */
  while (*s1) ++s1;
 
  /* Reverse it */
  while (s1-- > ++s0)
    {
    c   = *s0;
    *s0 = *s1;
    *s1 =  c;
    }
 
  return s;
  }

//_____________Version 3_______________
char *strrev3(char* seq){
  if( seq == NULL || !(*seq) ) 
            return NULL;

    int i, j = strlen(seq)-1;
    char *seq_rv;

    seq_rv = malloc(sizeof(char) * (j+1));

   for(i=0; j>=0; i++, j--) 
        *(seq_rv+i) = *(seq+j);

    return seq_rv;
//    free(seq_rv);                        //Line 57: Is this needed?
}

int main(){
    char *string, *copy1, *copy2, *copy3;            //using pointer for strings
    string=malloc(sizeof(char)*100);
    copy1=malloc(sizeof(char)*100);
    copy2=malloc(sizeof(char)*100);
    copy3=malloc(sizeof(char)*100);

 scanf("%s",string);
    strcpy(copy1, string);
    strcpy(copy2, string);
    strcpy(copy3, string);

    strrev1(string);

    printf("\n%s\n", copy1);
    printf("%s\n", string);
    printf("%s\n", strrev2(copy2));
    printf("%s\n", strrev3(copy3));

free(string);
free(copy1);
free(copy2);
free(copy3);

return 0;

}

The first two functions are adapted from stackoverflow, and the third one by myself. Now my questions are:
1)Both strrev1() and strrev2() reverse the string in place, but strrev1() does not have return value as the latter which returns a pointer; I tried to integrate to my original program for seq->qual.s , neither one worked. Why?
2) strrev3() malloc memory and has return value too. Does it take double amount of memory as compared with strrev2()? If I have 100 millions of sequence(Yes, that's normal with my files!) Then replace the original string in place would be a good idea, isn't it?
3) When malloc is used in function with returned value, should I free the memory in the function after the value is returned (Line 57) ? Memory should automatically be released on the stack once the function call is done, is this right?

Just because a function returns a value, doesn't mean you need to use it. The in-place functions return pointers for convenience -- so you can do printf("%s\n", strrev(string)); instead of strrev(string); printf("%s\n", string); But the pointer itself should not change (they'd hardly be "in place" if they did).

Notably, it means you cannot do this:

char buf[]="slartibartfast";
buf=strrev(buf);

But this will work just fine for a function that works in-place:

char buf[]="slartibartfast";
strrev(buf);

That other function on the other hand actually does allocate new memory which should be freed later. Take a good look at line 57. What, exactly would it do? You need a free() somewhere, but that's not really a useful place to put it. the return() causes the function to end before its freed. ...and if you put it first, you'd be returning invalid memory -- memory that'd already been freed. That's no good either. So where does it need to go?

char *newpointer;
printf("%s\n", newpointer=copyreverse(oldpointer));
free(newpointer);

As for why your program didn't work, I'd have to see it to guess. You piled several problems in your example for brevity but they interacted with each other and obscured the point.

1 Like