How to trim the white space around a string in C program

I am coding a C program to read a plain text file. There are a lot of blank fields or a string with white spaces. I want to know is there such a function called trim() in C to clean the white space around a string? Or some other way can do this efficiently? Thanks.

writing your own will be as efficient as provided :slight_smile:

char *trim(char *str)
{
char ptr[strlen(str)+1];
int i,j=0;
for(i=0;str[i]!='\0';i++)
{
if (str [i]!= ' ' && str [i]!= '\t')
ptr[j++]=str[i];
}
ptr[j]='\0'
str=ptr;
}

though written C code after long long time, and its not tested one too... but it left me in good mood.

happy programming!!! :slight_smile:

Rishi

Hi

this is only a part,

strip leading and trailing spaces; check for specified patterns

#define SUCCESS 1
#define FAILURE 0

// 0-for space and 1-for digit

// you can have allowable patterns in Patterns Array.

char *Patterns[]={"010", "1", "10", "01"};
char *b=" 1 23456";
char temp;
int i, j=0, len, prev=-1;
len=strlen(b);
temp=(char
)calloc(10, sizeof(char));
for ( i=0; i<len; i++ )
{
if( b [i]!= ' ' )
{
if( prev != 1 )
temp[j++]='1';
}
else
{
if( prev != 0 )
temp[j++]='0';
}
prev = ( b [i]== ' ' ) ? 0 : 1;
}
temp[j]='\0';
for ( i=0; i<=3; i++ )
{
if( strcmp(Patterns[i], temp) == 0 )
{
len=strlen(b);
for ( prev=0, j=0; j<len; j++ )
{
if ( b[j] != ' ' )
temp[prev++]=b[j];
}
temp[prev]='\0';
break;
}
}

yeah, this is a years' old thread; here's a generic pattern that can be used:

/*----------trim (char) c from right-side of string *p------------------*/
char *strtrim_right( register char *p, register c)
{
    register char *end;
    register int len;

    len = strlen( p);
    while ( *p && len)
    {
        end = p + len-1;
        if( c == *end)
            *end = 0;
        else
            break;
        len = strlen( p);
    }
    return( p);
}

strtrim_left() is left as an exercise to the reader, but you can use the principle here quite readily.

I should have mentioned that whitespace is considered to be any of the following characters:

0x09 - horizontal tab
0x0a - linefeed
0x0b - vertical tab
0x0c - form feed
0x0d - carriage return
0x20 - space

so keep that in mind when you program for trimming whitespace...

register char *end;
    register int len;

I don't see a specific reason to see register variables here, which is not guaranteed as well.

You do realize that the register class hints that the declared objects will be accessed frequently, right.

...then at that point it is up to the programmer to determine if 'frequently' warrants the register class or not.

If you don't like the register class, fine, don't use it, in an instance such as this.

I felt that it might very well improve the performance of the function, hence the inclusion.

register variables aren't that free when compared to auto variables and they should be only be sparsely used when that is really needed. I felt for the above code is not needed. Anyway its up to the programmer to decide and the compiler is not going to complain about this.

Its going to allocate if that can find one, else its going to be an auto variable anyway

:slight_smile:

This is optimizing in the wrong place. Your algorithm is O(n�), with or without register. Trimming a string can be done in O(n), which is much more efficient. On the other hand the function has not that much variables, so the compiler will most probobly use reigisters for all of them anyway (if the strlen() call doesn't make that impossible)

You could also use POSIX's regular expressions (regcomp(), regexec(), regfree()).

First you compile a regex using regcomp(), then you can match it several times to a certain string using regexec(). Finally you should free the compiled regex with regfree().

A very simple example:

#include <stdio.h>
#include <regex.h>

int
main ()
{
    regex_t reg;
    char * aaa = "          ";
    
    if (regcomp (&reg, "[^[:space:]$]", REG_EXTENDED|REG_NOSUB) != 0)
        exit(1);
    if ((regexec (&reg, aaa, 0, (regmatch_t *) NULL, 0)) == 0)
         printf ("no match\n");
    else
        printf ("match\n");
    regfree(&reg);
}

This is all just overkill. There are three variants to do trimming in C:

  1. Output in another place
void trim_copy(char *input, char *output)
{
  char *end = ouput;
  char c;

  // skip spaces at start
  while(*input && isspace(*input))
    ++input;

  // copy the rest while remembering the last non-whitespace
  while(*input)
  {
    // copy character
    c = *(output++) = *(input++);

    // if its not a whitespace, this *could* be the last character
    if( !isspace(c) )
      end = output;
  }

  // white the terminating zero after last non-whitespace
  *end = 0;
}
  1. In place
void trim_inplace(char *s)
{
  trim_copy(s, s);
 }
  1. In place, but avoid copying by shifting string pointer
char *trim_nocopy(char *s)
{
  char *start = s;

   // skip spaces at start
  while(*start && isspace(*start))
    ++start;

  char *i = start
  // iterate over the rest remebering last non-whitespace
  while(*i)
  {
    if( !isspace(*(i++)) )
      end = i;
  }

  // white the terminating zero after last non-whitespace
  *end = 0;

  return start;
}

All three solutions are O(n). Use in place if you don't need the original anymore (and it's not constant like a literal). If you use new or malloc to allocate memory, then delete/free the original pointer in case 3, not the returning value.

This code doesn't compile (regcomp throws an error)... is there any way to get regcomp/regexec to recognize whitespace (using "_" doesn't work) or a word boundary?

The code compiles under gcc 4.1.2 Linux. As to the other question, yes it's possible. Try searching google for "POSIX.2 regular expressions" for more information.

Note to anybody copying code: both of calv's examples will segfault as posted. I have not had time to mess with it. On Monday I can get time to post something.

Segfaults on the condition when there are no spaces in the string, for example.
Plus undeclared variables.... maybe someone else can fix it.

edit: corrected code

void trim_copy(char *input, char *output)
{
  char *end =NULL   //  ouput;
  char c;

  // skip spaces at start
  while(*input && isspace(*input))
    ++input;

  // copy the rest while remembering the last non-whitespace
  while(*input)
  {
    // copy character
    c = *(output++) = *(input++);

    // if its not a whitespace, this *could* be the last character
    if( !isspace(c) )
      end = output;
  }

  // white the terminating zero after last non-whitespace
  if(end!=NULL) *end = 0;
}
similar changes are needed for the other example, trim_nocopy();

Why would they crash? Please explain to me the conditions, under which that happens. I tested them, and they don't crash. The change you made has the effect, that if the input is empty, or contains only spaces, then in the output no terminating zero is written (which makes it worse). My version crashes only in the following instances:

  • the input pointer is not valid
  • the input data contains no terminating zero
  • the output pointer is not valid
  • the output pointer points to a read-only area in memory
  • the output pointer doesn't have enough memory to contain the result

all of those conditions are like in other string handling functions, like strcpy(). So if strcpy() works, trim_copy() should also work.

EDIT: oh, I see your problem now. I had a Typo in the first line, saying "char *end = ouput;". That was supposed to be "output". You probably just deleted the initialization, so that it crashed.

error in trim_copy() was a typo: "ouput" -> "output" (but no segfault)

error in trim_nocopy() was an undeclared variable (that also needs to be initialized. segfault, if the new variable is not initialized and the string contains no nonspace characters)

btw. trim_nocopy() has no real practical reason to exist. It is only to show that a left trimming can be done without writing to the string at all, just by changing the start pointer. Also right trimming can be done by just writing a zero after the last nonspace. In fact the trim_copy() function is the one you should always use, both for in place (input==output) or copying operations. Also it makes sense to switch the parameters, so they are in sync with other c std lib string functions like strcpy() and strcat().

ok, now the fixed version:

void trim_copy(char *input, char *output)
{
  char *end = output;
  char c;

  // skip spaces at start
  while(*input && isspace(*input))
    ++input;

  // copy the rest while remembering the last non-whitespace
  while(*input)
  {
    // copy character
    c = *(output++) = *(input++);

    // if its not a whitespace, this *could* be the last character
    if( !isspace(c) )
      end = output;
  }

  // write the terminating zero after last non-whitespace
  *end = 0;
}

void trim_inplace(char *s)
{
  trim_copy(s, s);
}

char *trim_nocopy(char *s)
{
  char *start = s;

   // skip spaces at start
  while(*start && isspace(*start))
    ++start;

  char *i = start;
  char *end = start;
  // iterate over the rest remebering last non-whitespace
  while(*i)
  {
    if( !isspace(*(i++)) )
      end = i;
  }

  // write the terminating zero after last non-whitespace
  *end = 0;

  return start;
}

I cut and pasted you code here - I had to make changes so the code would compile.
All my stuff is in red.

#include <string.h> 
/*  my changes are in red to get the code to compile without errors
              and to run a sample*/
char *trim_nocopy(char *s)
{
  char *start = s;
  char *end;
   // skip spaces at start
  while(*start && isspace(*start))
    ++start;

  char *i = start;
  // iterate over the rest remebering last non-whitespace
  while(*i)
  {
    if( !isspace(*(i++)) )
      end = i;
  }

  // white the terminating zero after last non-whitespace
  *end = 0;

  return start;
}

char *trim_nocopy1(char *s)
{
  char *start = s;
  char *end = NULL;
   // skip spaces at start
  while(*start && isspace(*start))
    ++start;

  char *i = start;
  // iterate over the rest remebering last non-whitespace
  while(*i)
  {
    if( !isspace(*(i++)) )
      end = i;
  }

  // white the terminating zero after last non-whitespace
  if (end !=NULL) *end = 0; 

  return start;
}


int main()
{
	char tmp[8]={0x0};
	char *test = "  ";
	
	printf("running char *trim_nocopy1(char *s)\n");
	strcpy(tmp, test);
	printf("trimmed: %s\n", trim_nocopy1(tmp));
	printf("running char *trim_nocopy(char *s)\n");
	strcpy(tmp, test);
	printf("trimmed: %s\n", trim_nocopy(tmp));

}
csaprd:/home/jmcnama> uname -a
HP-UX csaprd B.11.23 U 9000/800 52720173 unlimited-user license
csaprd:/home/jmcnama> cc t.c -g -o trim
csaprd:/home/jmcnama> trim
running char *trim_nocopy1(char *s)
trimmed:
running char *trim_nocopy(char *s)
Bus error(coredump)
csaprd:/home/jmcnama> gdb trim core
HP gdb 5.4.0 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.4.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `trim'.
Program terminated with signal 10, Bus error.

#0  0x2c60 in trim_nocopy (s=0x705f11e0 "  ") at t.c:20
20        *end = 0;

It dumps core on a string of all spaces, for example.
It will dump core on any string consisting soley of characters on which isspace returns > 0. I was in a hurry in the post above, my bad. I said incorrectly a string of all non-spaces, which is clearly wrong.

Yeah, on second thought I realized that if someone adds the variable "end" without initializing it, the result can crash. However the correct initial value is not NULL, but start (after the first loop) See earlier post about that. I posted the corrected code above. Your corrected version is wrong however. You see that, when you change your initialization of temp to

char tmp[8]={'x'};