Skip to content
 编辑

Fault injection capabilities infrastructure

Fault injection capabilities infrastructure

See also drivers/md/md-faulty.c and “every_nth” module option for scsi_debug.

Available fault injection capabilities

Configure fault-injection capabilities behavior

debugfs entries

fault-inject-debugfs kernel module provides some debugfs entries for runtime configuration of fault-injection capabilities.

Boot option

In order to inject faults while debugfs is not available (early boot time), use the boot option:

failslab=
fail_page_alloc=
fail_usercopy=
fail_make_request=
fail_futex=
mmc_core.fail_request=<interval>,<probability>,<space>,<times>

proc entries

Error Injectable Functions

This part is for the kernel developers considering to add a function to ALLOW_ERROR_INJECTION() macro.

Requirements for the Error Injectable Functions

Since the function-level error injection forcibly changes the code path and returns an error even if the input and conditions are proper, this can cause unexpected kernel crash if you allow error injection on the function which is NOT error injectable. Thus, you (and reviewers) must ensure;

The first requirement is important, and it will result in that the release (free objects) functions are usually harder to inject errors than allocate functions. If errors of such release functions are not correctly handled it will cause a memory leak easily (the caller will confuse that the object has been released or corrupted.)

The second one is for the caller which expects the function should always does something. Thus if the function error injection skips whole of the function, the expectation is betrayed and causes an unexpected error.

Type of the Error Injectable Functions

Each error injectable functions will have the error type specified by the ALLOW_ERROR_INJECTION() macro. You have to choose it carefully if you add a new error injectable function. If the wrong error type is chosen, the kernel may crash because it may not be able to handle the error. There are 4 types of errors defined in include/asm-generic/error-injection.h

EI_ETYPE_NULL

: This function will return [NULL]{.title-ref} if it fails. e.g. return an allocateed object address.

EI_ETYPE_ERRNO

: This function will return an [-errno]{.title-ref} error code if it fails. e.g. return -EINVAL if the input is wrong. This will include the functions which will return an address which encodes [-errno]{.title-ref} by ERR_PTR() macro.

EI_ETYPE_ERRNO_NULL

: This function will return an [-errno]{.title-ref} or [NULL]{.title-ref} if it fails. If the caller of this function checks the return value with IS_ERR_OR_NULL() macro, this type will be appropriate.

EI_ETYPE_TRUE

: This function will return [true]{.title-ref} (non-zero positive value) if it fails.

If you specifies a wrong type, for example, EI_TYPE_ERRNO for the function which returns an allocated object, it may cause a problem because the returned value is not an object address and the caller can not access to the address.

How to add new fault injection capability

Application Examples



Tool to run command with failslab or fail_page_alloc

In order to make it easier to accomplish the tasks mentioned above, we can use tools/testing/fault-injection/failcmd.sh. Please run a command “./tools/testing/fault-injection/failcmd.sh —help” for more information and see the following examples.

Examples:

Run a command “make -C tools/testing/selftests/ run_tests” with injecting slab allocation failure:

# ./tools/testing/fault-injection/failcmd.sh \
    -- make -C tools/testing/selftests/ run_tests

Same as above except to specify 100 times failures at most instead of one time at most by default:

# ./tools/testing/fault-injection/failcmd.sh --times=100 \
    -- make -C tools/testing/selftests/ run_tests

Same as above except to inject page allocation failure instead of slab allocation failure:

# env FAILCMD_TYPE=fail_page_alloc \
    ./tools/testing/fault-injection/failcmd.sh --times=100 \
    -- make -C tools/testing/selftests/ run_tests

Systematic faults using fail-nth

The following code systematically faults 0-th, 1-st, 2-nd and so on capabilities in the socketpair() system call:

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main()
{
  int i, err, res, fail_nth, fds[2];
  char buf[128];

  system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
  sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
  fail_nth = open(buf, O_RDWR);
  for (i = 1;; i++) {
      sprintf(buf, "%d", i);
      write(fail_nth, buf, strlen(buf));
      res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
      err = errno;
      pread(fail_nth, buf, sizeof(buf), 0);
      if (res == 0) {
          close(fds[0]);
          close(fds[1]);
      }
      printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
          res, err);
      if (atoi(buf))
          break;
  }
  return 0;
}

An example output:

1-th fault Y: res=-1/23
2-th fault Y: res=-1/23
3-th fault Y: res=-1/12
4-th fault Y: res=-1/12
5-th fault Y: res=-1/23
6-th fault Y: res=-1/23
7-th fault Y: res=-1/23
8-th fault Y: res=-1/12
9-th fault Y: res=-1/12
10-th fault Y: res=-1/12
11-th fault Y: res=-1/12
12-th fault Y: res=-1/12
13-th fault Y: res=-1/12
14-th fault Y: res=-1/12
15-th fault Y: res=-1/12
16-th fault N: res=0/12