#StackBounty: #c #strings #io fgets() Alternative

Bounty: 100

size_t readline_tostring(char * restrict dest, size_t size, FILE * restrict stream)

fgets() is OK yet it has some short-comings that the following readline_tostring() addresses for reading a line:

  1. When the buffer is insufficient, the rest of the line is consumed (and lost). An error is indicated.

  2. In C, a line of input is up to and including the 'n‘ C11 7.21.2 2. When the streams ends with something other than a new-line, how that is handled is implementation defined behavior. J.3.12. This code treats a 'n‘ and end-of-file as the same. In both cases, a 'n' is not include in the saved buffer.

  3. If code reads a '', that is not practical to discern with fgets(). This code returns the size of the space used in dest, which includes an appended null character.

  4. Lesser issues include fgets() handling of NULL augments, small buffer size, undefined buffer state on ferror() and use of int vs size_t. The below code also clearly – I hope – handles that.

  5. An alternative allocate memory: Allocating memory per external input can lead to abuse. This allows external forces to overwhelm memory allocation. The following does not use memory allocation like getline substitute that will enforce 'n' as limit of characters read or getline.
    Another alternative could use limited allocation, but that is not done here.

Primary review requests (of the non-test code)

Portability concerns: Might a common or rare case fail on some select systems?

Handling of exceptional/error cases: Any suggested alternates?

Performance concerns are appreciated when they are backed with real measurements.

General comments (on any code).


The code below is listed as one file for code review convenience, yet would usually would be is separate .h, .c files.

/////////////////////////////////////////////////////////////////
// Header info, usually in some *.h file

/*
 * Read a _line_ of text. Save characters up to a limit, and form a _string_.
 * The string saved in `dest` never contains a 'n'.
 * A null character is always appended ***1.
 * Reading only attempted in non-pathological cases.  
 * Otherwise the end-of-file flag and error flags are cleared before reading.
 *
 * Normal: The return value is greater than  0 and represents the _size_ of `dest` used.
 *     This includes all non-'n' characters read and an appended null character. ***2
 *     Reading text "abcn" forms string "abc" and return 4
 *
 * Exceptional cases:
 *   In these cases, the return value is 0 and `dest[0] = ''` except as noted.
 *   1: Pathological: Buffer invalid for string.
 *     `dest == NULL` or `size == 0` (No data is written into `dest`)
 *   2: Pathological: Stream invalid.
 *     `stream == NULL`
 *   3: End-of-file occurs and no data read.
 *     Typical end-of-file: `feof(stream)` will return true.
 *   4: Input error.
 *     `ferror(stream)` will return true.
 *     strlen(dest) is number of characters successfully read. ***3
 *   5: Buffer is too small.
 *     First `size-1` non-'n' characters are saved in `dest[]`.
 *     Additional characters are read up to and including 'n'.  These are not saved.
 *     The end-of-file flag and error flags are cleared again.
 *     strlen(dest) is number of characters successfully save. ***3
 *
 * ***1 Except when `dest == NULL` or `size == 0`
 * ***2 If code reads a null character, it is treated like any non-'n' character.
 * ***3 strlen(dest) does not reflect the number of characters in `dest` 
 *       if a null character was read and saved.
 *
 */

#include <stdio.h>
#include <stdlib.h>
size_t readline_tostring(char * restrict dest, size_t size,
    FILE * restrict stream);

/////////////////////////////////////////////////////////////////
// Code, usually in some *.c file

size_t readline_tostring(char * restrict dest, size_t size,
    FILE * restrict stream) {
  // Handle pathological cases
  if (dest == NULL || size == 0) {
    return 0;
  }
  if (stream == NULL) {
    dest[0] = '';
    return 0;
  }
  clearerr(stream);

  size_t i = 0;
  int ch;
  while ((ch = fgetc(stream)) != 'n' && ch != EOF) {
    if (i < size) {
      dest[i++] = (char) ch;
    }
  }

  // Add null character termination - always
  // If too many were read
  if (i >= size) {
    dest[size - 1] = '';
    clearerr(stream);
    return 0;
  }
  dest[i] = '';

  if ((ch == EOF) && (i == 0 || ferror(stream))) { // end-of-file or error
    return 0;
  }

  clearerr(stream);
  return i + 1;
}

/////////////////////////////////////////////////////////////////
// Test code

#include <string.h>
#include <ctype.h>

// Sample usage showing how to discern the results.
void sample(char * restrict dest, size_t size, FILE * restrict stream) {
  size_t sz;
  while ((sz = readline_tostring(dest, size, stream)) > 0) {
    printf("Size:%zu string:"%s"n", sz, dest);
  }

  // Well formed code need not perform this 1st test
  if (dest == NULL || size == 0 || stream == NULL) {
    puts("Pathological case");
  } else if (feof(stream)) {
    puts("End of file");
  } else if (ferror(stream)) {
    puts("Input error");
  } else {
    printf("Line too long: begins with <%s>n", dest);
  }
  puts("");
}

void test4(const char *s) {
  FILE *stream = fopen("tmp.bin", "wb");
  size_t len = strlen(s);
  fwrite(s, 1, len, stream);
  fclose(stream);
  for (size_t i = 0; i < len; i++) {
    printf(isprint((unsigned char)s[i]) ? "%c" : "<%d>", s[i]);
  }
  puts("");

  stream = fopen("tmp.bin", "r");
  char buf[4];
  sample(buf, sizeof buf, stream);
  fclose(stream);
  fflush(stdout);
}

int main(void) {
  test4("12nABn");
  test4("123nABCn");
  test4("1234nABCDn");
  test4("");
  test4("1");
  test4("12");
  test4("123");
  test4("1234");
  return 0;
}

Output

12<10>AB<10>
Size:3 string:"12"
Size:3 string:"AB"
End of file

123<10>ABC<10>
Size:4 string:"123"
Size:4 string:"ABC"
End of file

1234<10>ABCD<10>
Line too long: begins with <123>


End of file

1
Size:2 string:"1"
End of file

12
Size:3 string:"12"
End of file

123
Size:4 string:"123"
End of file

1234
Line too long: begins with <123>


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.