如何用C双重表示无穷大？

我从计算机系统：程序员的观点中了解到，IEEE标准要求使用以下64位二进制格式表示双精度浮点数：

s：1位用于标志
exp：指数为11位
压裂：分数为52位

+无穷大表示为具有以下模式的特殊值：

s = 0
所有exp位都是1
所有分数位均为0

我认为double的完整64位应按以下顺序排列：

（一个或多个）（EXP）（FRAC）

所以我编写以下C代码来validation它：

//Check the infinity double x1 = (double)0x7ff0000000000000; // This should be the +infinity double x2 = (double)0x7ff0000000000001; // Note the extra ending 1, x2 should be NaN printf("\nx1 = %f, x2 = %f sizeof(double) = %d", x1,x2, sizeof(x2)); if (x1 == x2) printf("\nx1 == x2"); else printf("\nx1 != x2");

但结果是：

 x1 = 9218868437227405300.000000, x2 = 9218868437227405300.000000 sizeof(double) = 8 x1 == x2

为什么数字是有效数字而不是无穷大错误？

为什么x1 == x2？

（我正在使用MinGW GCC编译器。）

添加1

我修改了下面的代码，并成功validation了Infinity和NaN。

 //Check the infinity and NaN unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double double y1 =* ((double *)(&x1)); double y2 =* ((double *)(&x2)); double y3 =* ((double *)(&x3)); printf("\nsizeof(long long) = %d", sizeof(x1)); printf("\nx1 = %f, x2 = %f, x3 = %f", x1, x2, x3); // %f is good enough for output printf("\ny1 = %f, y2 = %f, y3 = %f", y1, y2, y3);

结果是：

 sizeof(long long) = 8 x1 = 1.#INF00, x2 = -1.#INF00, x3 = 1.#SNAN0 y1 = 1.#INF00, y2 = -1.#INF00, y3 = 1.#QNAN0

详细输出看起来有点奇怪，但我认为这一点很清楚。

PS。：似乎没有必要转换指针。只需使用%f告诉printf函数以double格式解释unsigned long long变量。

添加2

出于好奇，我用以下代码检查了变量的位表示。

 typedef unsigned char *byte_pointer; void show_bytes(byte_pointer start, int len) { int i; for (i = len-1; i>=0; i--) { printf("%.2x", start[i]); } printf("\n"); }

我尝试了下面的代码：

 //check the infinity and NaN unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double double y1 =* ((double *)(&x1)); double y2 =* ((double *)(&x2)); double y3 = *((double *)(&x3)); unsigned long long x4 = x1 + x2; // I want to check (+infinity)+(-infinity) double y4 = y1 + y2; // I want to check (+infinity)+(-infinity) printf("\nx1: "); show_bytes((byte_pointer)&x1, sizeof(x1)); printf("\nx2: "); show_bytes((byte_pointer)&x2, sizeof(x2)); printf("\nx3: "); show_bytes((byte_pointer)&x3, sizeof(x3)); printf("\nx4: "); show_bytes((byte_pointer)&x4, sizeof(x4)); printf("\ny1: "); show_bytes((byte_pointer)&y1, sizeof(y1)); printf("\ny2: "); show_bytes((byte_pointer)&y2, sizeof(y2)); printf("\ny3: "); show_bytes((byte_pointer)&y3, sizeof(y3)); printf("\ny4: "); show_bytes((byte_pointer)&y4, sizeof(y4));

输出是：

 x1: 7ff0000000000000 x2: fff0000000000000 x3: 7ff0000000000001 x4: 7fe0000000000000 y1: 7ff0000000000000 y2: fff0000000000000 y3: 7ff8000000000001 y4: fff8000000000000 // <== Different with x4

奇怪的是，虽然x1和x2具有与y1和y2相同的位模式，但是和x4与y4不同。

和

 printf("\ny4=%f", y4);

给出这个：

 y4=-1.#IND00 // What does it mean???

他们为什么不同？ y4是如何获得的？

首先， 0x7ff0000000000000确实是双无穷大的位表示。但是转换没有设置位表示，它将0x7ff0000000000000的逻辑值转换为64位整数。因此，您需要使用其他方式来设置位模式。

设置位模式的直接方法是

 uint64_t bits = 0x7ff0000000000000; double infinity = *(double*)&bits;

但是，这是未定义的行为。 C标准禁止读取已存储为一种基本类型（ uint64_t ）的值作为另一种基本类型（ double ）。这称为严格别名规则，并允许编译器生成更好的代码，因为它可以假定一种类型的读取顺序和另一种类型的写入无关紧要。

此规则的唯一例外是char类型：显式允许您将任何指针转换为char*并返回。所以你可以尝试使用这段代码：

 char bits[] = {0x7f, 0xf0, 0, 0, 0, 0, 0, 0}; double infinity = *(double*)bits;

即使这不是未定义的行为，它仍然是实现定义的行为 ： double字节的顺序取决于您的机器。给定的代码适用于ARM和Power系列等大型端机，但不适用于X86。对于X86，您需要以下版本：

 char bits[] = {0, 0, 0, 0, 0, 0, 0xf0, 0x7f}; double infinity = *(double*)bits;

实际上没有办法解决此实现定义的行为，因为无法保证计算机将以与整数值相同的顺序存储浮点值。甚至有机器使用像这样的字节顺序：<1,0,3,2>我甚至不想知道是谁提出了这个好主意，但它存在并且我们必须忍受它。

对于你的上一个问题：浮点运算与整数运算本质上是不同的。这些位具有特殊含义，浮点单元将其考虑在内。特别是像无穷大，NAN和非规范化数字这样的特殊值以特殊方式处理。由于+inf + -inf被定义为产生NAN，因此您的浮点单元会发出NAN的位模式。整数单元不知道无穷大或NAN，所以它只是将位模式解释为一个巨大的整数，并愉快地执行整数加法（在这种情况下恰好溢出）。得到的位模式不是NAN的模式。它恰好是一个非常巨大的正浮点数的位模式（准确地说是2^1023 ），但这没有任何意义。

实际上，有一种方法可以以可移植的方式设置除NAN之外的所有值的位模式：给定三个包含符号，指数和尾数位的变量，您可以这样做：

 uint64_t sign = ..., exponent = ..., mantissa = ...; double result; assert(!(exponent == 0x7ff && mantissa)); //Can't set the bits of a NAN in this way. if(exponent) { //This code does not work for denormalized numbers. And it won't honor the value of mantissa when the exponent signals NAN or infinity. result = mantissa + (1ull << 52); //Add the implicit bit. result /= (1ull << 52); //This makes sure that the exponent is logically zero (equals the bias), so that the next operation will work as expected. result *= pow(2, (double)((signed)exponent - 0x3ff)); //This sets the exponent. } else { //This code works for denormalized numbers. result = mantissa; //No implicit bit. result /= (1ull << 51); //This ensures that the next operation works as expected. result *= pow(2, -0x3ff); //Scale down to the denormalized range. } result *= (sign ? -1.0 : 1.0); //This sets the sign.

这使用浮点单元本身将位移动到正确的位置。由于无法使用浮点运算与NAN的尾数位进行交互，因此无法在此代码中包含NAN的生成。好吧，你可以生成一个NAN，但你无法控制它的尾数位模式。

初始化

 double x1=(double)0x7ff0000000000000;

正在将整数转换为double 。您可能希望共享按位表示。这是特定于实现的（可能是未指定的行为），但您可以使用union：

 union { double x; long long n; } u; un = 0x7ff0000000000000LL;

然后用ux ; 我假设您的机器上的long long和double都是64位。并且endianess和浮点表示也很重要。

另见http://floating-point-gui.de/

请注意，并非所有处理器都是x86 ，并非所有浮点实现都是IEEE754 （即使在2014年大多数都是）。您的代码在ARM处理器上可能不会起作用，例如在平板电脑中。

你正在将值转换为double，这不会像你期望的那样工作。

 double x1=(double)0x7ff0000000000000; // Not setting the value directly

为了避免这个问题，您可以将该值解释为双指针并取消引用它（ 尽管这是非常不推荐的，并且只适用于无符号long long == double size约束 ）：

 unsigned long long x1n = 0x7ff0000000000000ULL; // Inf double x1 = *((double*)&x1n); unsigned long long x2n = 0x7ff0000000000001ULL; // Signaling NaN double x2 = *((double*)&x2n); printf("\nx1=%f, x2=%f sizeof(double) = %d", x1, x2, sizeof(x2)); if (x1 == x2) printf("\nx1==x2"); else printf("\nx1!=x2"); // x1 != x2

关于ideone的示例

您已将常量0x7ff00...转换为double 。这与获取该值的位表示并将其解释为double完全相同。

这也解释了为什么x1==x2 。当你转换为双精度时，你会失去精确度; 所以有时对于大整数，你最终得到的double在两种情况下是相同的。这给你一些奇怪的效果，对于一个大的浮点值，加1使它保持不变。

如何用C双重表示无穷大？

添加1

添加2

C中的递归堆栈

如何编写一个C程序来检查一个点是否位于一个正方形内，给定一个对角线的端点

Delphi XE3：使用多个C头文件和源文件

什么是好的C / C ++ CSS解析器？

是否可以在C程序中调用PHP的C函数？

“。”在.symtab部分的Ndx列中的含义是什么？

编译器找不到“aligned_alloc”函数

在C中打印“（双引号）”

无法在Cygwin中使用#include 编译C代码

使用std :: sort（）按元素块排序