Undefined behavior

From cppreference.com
< cpp‎ | language
 
 
C++ language
General topics
Flow control
Conditional execution statements
if
Iteration statements (loops)
for
range-for (C++11)
Jump statements
Functions
Function declaration
Lambda function declaration
inline specifier
Dynamic exception specifications (until C++20)
noexcept specifier (C++11)
Exceptions
Namespaces
Types
Specifiers
decltype (C++11)
auto (C++11)
alignas (C++11)
Storage duration specifiers
Initialization
Expressions
Alternative representations
Literals
Boolean - Integer - Floating-point
Character - String - nullptr (C++11)
User-defined (C++11)
Utilities
Attributes (C++11)
Types
typedef declaration
Type alias declaration (C++11)
Casts
Implicit conversions - Explicit conversions
static_cast - dynamic_cast
const_cast - reinterpret_cast
Memory allocation
Classes
Class-specific function properties
explicit (C++11)
static
Special member functions
Templates
Miscellaneous
 
 

Renders the entire program meaningless if certain rules of the language are violated.

Explanation

The C++ standard precisely defines the observable behavior of every C++ program that does not fall into one of the following classes:

  • ill-formed - the program has syntax errors or diagnosable semantic errors. A conforming C++ compiler is required to issue a diagnostic, even if it defines a language extension that assigns meaning to such code (such as with variable-length arrays). The text of the standard uses shall, shall not, and ill-formed to indicate these requirements.
  • ill-formed, no diagnostic required - the program has semantic errors which may not be diagnosable in general case (e.g. violations of the ODR or other errors that are only detectable at link time). The behavior is undefined if such program is executed.
  • implementation-defined behavior - the behavior of the program varies between implementations, and the conforming implementation must document the effects of each behavior. For example, the type of std::size_t or the number of bits in a byte, or the text of std::bad_alloc::what. A subset of implementation-defined behavior is locale-specific behavior, which depends on the implementation-supplied locale.
  • unspecified behavior - the behavior of the program varies between implementations, and the conforming implementation is not required to document the effects of each behavior. For example, order of evaluation, whether identical string literals are distinct, the amount of array allocation overhead, etc. Each unspecified behavior results in one of a set of valid results.
  • undefined behavior - there are no restrictions on the behavior of the program. Examples of undefined behavior are data races, memory accesses outside of array bounds, signed integer overflow, null pointer dereference, more than one modifications of the same scalar in an expression without any intermediate sequence point (until C++11)that are unsequenced (since C++11), access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.

UB and optimization

Because correct C++ programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled:

For example,

Signed overflow

int foo(int x) {
    return x+1 > x; // either true or UB due to signed overflow
}

may be compiled as (demo)

foo(int):
        movl    $1, %eax
        ret


Access out of bounds

int table[4] = {};
bool exists_in_table(int v)
{
    // return true in one of the first 4 iterations or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

May be compiled as (demo)

exists_in_table(int):
        movl    $1, %eax
        ret

Uninitialized scalar

std::size_t f(int x)
{
    std::size_t a;
    if(x) // either x nonzero or UB
        a = 42;
    return a; 
}

May be compiled as (demo)

f(int):
        mov     eax, 42
        ret

The output shown was observed on an older version of gcc

bool p; // uninitialized local variable
if(p) // UB access to uninitialized scalar
    std::puts("p is true");
if(!p) // UB access to uninitialized scalar
    std::puts("p is false");

Possible output:

p is true
p is false

Invalid scalar

int f() {
  bool b = true;
  unsigned char* p = reinterpret_cast<unsigned char*>(&b);
  *p = 10;
  // reading from b is now UB
  return b == 0;
}

May be compiled as (demo)

f():
        movl    $11, %eax
        ret

Null pointer dereference

int foo(int* p) {
    int x = *p;
    if(!p) return x; // Either UB above or this branch is never taken
    else return 0;
}
int bar() {
    int* p = nullptr;
    return *p;        // Unconditional UB
}

may be compiled as (foo with gcc, bar with clang)

foo(int*):
        xorl    %eax, %eax
        ret
bar():
        retq

Access to pointer passed to realloc

Choose clang to observe the output shown

#include <iostream>
#include <cstdlib>
int main() {
    int *p = (int*)std::malloc(sizeof(int));
    int *q = (int*)std::realloc(p, sizeof(int));
    *p = 1; // UB access to a pointer that was passed to realloc
    *q = 2;
    if (p == q) // UB access to a pointer that was passed to realloc
        std::cout << *p << *q << '\n';
}

Possible output:

12

Infinite loop without side-effects

Choose clang to observe the output shown

#include <iostream>
 
int fermat() {
  const int MAX = 1000;
  int a=1,b=1,c=1;
  // Endless loop with no side effects is UB
  while (1) {
    if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1;
    a++;
    if (a>MAX) { a=1; b++; }
    if (b>MAX) { b=1; c++; }
    if (c>MAX) { c=1;}
  }
  return 0;
}
 
int main() {
  if (fermat())
    std::cout << "Fermat's Last Theorem has been disproved.\n";
  else
    std::cout << "Fermat's Last Theorem has not been disproved.\n";
}

Possible output:

Fermat's Last Theorem has been disproved.

External links

See also