汇编代码fsqrt和fmul指令

我正在尝试使用汇编代码在此函数中计算1.34 * sqrt（lght），但我收到的错误如下：

‘_asm’未声明（在此函数中首次使用）每个未声明的标识符仅针对预期的’;’中出现的每个函数报告一次在'{‘之前

我一直在研究如何解决这个问题，但找不到太多的信息。有人可以建议一种让这个工作的方法吗？

我的代码是：

double hullSpeed(double lgth) { _asm { global _start fld lght; //load lght fld st(0); //duplicate lght on Top of stack fsqrt; square root of lght fld st(0); //load square result on top of stack fld 1.34; //load 1.34 on top of stack fld st(i); duplicate 1.34 on TOS fmulp st(0), st(i); //multiply them fst z; save result in z } return z; // return result of [ 1.34 *sqrt(lght) ] }

看起来你正在尝试做类似这样的事情：

 #include  double hullSpeed(double lgth) { double result; __asm__( "fldl %1\n\t" //st(0)=>st(1), st(0)=lgth . FLDL means load double float "fsqrt\n\t" //st(0) = square root st(0) "fmulp\n\t" //Multiplies st(0) and st(1) (1.34). Result in st(0) : "=&t" (result) : "m" (lgth), "0" (1.34)); return result; } int main() { printf ("%f\n", hullSpeed(64.0)); }

我使用的模板可以简化，但出于演示目的，它就足够了。我们使用"=&t"约束，因为我们将结果返回到st(0)浮点堆栈的顶部，并且我们使用＆符号表示早期clobber（我们将使用浮点堆栈的顶部通过在1.34）。我们通过约束"m" (lgth)传递带有内存引用的lgth的地址，并且"0"(1.34)约束表示我们将在与参数0相同的寄存器中传入1.34，在这种情况下是参数0浮点堆栈。这些是我们的汇编器将覆盖但不作为输入或输出约束出现的寄存器（或存储器）。

使用内联汇编程序学习汇编语言是一种非常难以学习的方法。特定于x86的机器约束可以在x86系列下找到。有关约束修饰符的信息可以在这里找到，有关GCC扩展汇编程序模板的信息可以在这里找到。

我只给你一个起点，因为GCC的内联汇编程序使用可能相当复杂，而且对于Stackoverflow的答案，任何答案都可能过于宽泛。您使用带有x87浮点的内联汇编程序的事实使它变得更加复杂。

一旦掌握了约束和修饰符，编译器会产生更好的汇编代码的另一种机制是：

 __asm__( "fsqrt\n\t" // st(0) = square root st(0) "fmulp\n\t" // Multiplies st(0) and st(1) (1.34). Result in st(0) : "=t"(result) : "0"(lgth), "u" (1.34));

提示：约束"u"在x87浮点寄存器st(1)放置一个值。汇编程序模板约束有效地将lgth放在st(1)中的st(0)和1.34中。我们使用约束将我们的值放在浮点堆栈上。这样可以减少我们在汇编程序代码中必须完成的工作。

如果您正在开发64位应用程序，我强烈建议您至少使用SSE / SSE2进行基本浮点计算。上面的代码应该适用于32位和64位。在64位代码中，x87浮点指令通常不如SSE / SSE2有效，但它们可以工作。

使用内联汇编和x87进行舍入

如果您尝试基于x87上的4种舍入模式之一进行舍入，则可以使用以下代码：

 #include  #include  #define RND_CTL_BIT_SHIFT 10 typedef enum { ROUND_NEAREST_EVEN = 0 << RND_CTL_BIT_SHIFT, ROUND_MINUS_INF = 1 << RND_CTL_BIT_SHIFT, ROUND_PLUS_INF = 2 << RND_CTL_BIT_SHIFT, ROUND_TOWARD_ZERO = 3 << RND_CTL_BIT_SHIFT } RoundingMode; double roundd (double n, RoundingMode mode) { uint16_t cw; /* Storage for the current x87 control register */ uint16_t newcw; /* Storage for the new value of the control register */ uint16_t dummyreg; /* Temporary dummy register used in the template */ __asm__( "fstcw %w[cw] \n\t" /* Read current x87 control register into cw*/ "fwait \n\t" /* Do an fwait after an fstcw instruction */ "mov %w[cw],%w[treg] \n\t" /* ax = value in cw variable*/ "and $0xf3ff,%w[treg] \n\t" /* Set rounding mode bits 10 and 11 of control register to zero*/ "or %w[rmode],%w[treg] \n\t" /* Set the rounding mode bits */ "mov %w[treg],%w[newcw]\n\t" /* newcw = value for new control reg value*/ "fldcw %w[newcw] \n\t" /* Set control register to newcw */ "frndint \n\t" /* st(0) = round(st(0)) */ "fldcw %w[cw] \n\t" /* restore control reg to orig value in cw*/ : [cw]"=m"(cw), [newcw]"=m"(newcw), [treg]"=&r"(dummyreg), /* Register constraint with dummy variable allows compiler to choose available register */ [n]"+t"(n) /* +t constraint passes `n` through top of FPU stack (st0) for both input&output*/ : [rmode]"rmi"((uint16_t)mode)); /* "g" constraint same as "rmi" */ return n; } double hullSpeed(double lgth) { double result; __asm__( "fsqrt\n\t" // st(0) = square root st(0) "fmulp\n\t" // Multiplies st(0) and st(1) (1.34). Result in st(0) : "=t"(result) : "0"(lgth), "u" (1.34)); return result; } int main() { double dbHullSpeed = hullSpeed(64.0); printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_NEAREST_EVEN)); printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_MINUS_INF)); printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_PLUS_INF)); printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_TOWARD_ZERO)); return 0; }

正如您在评论中指出的那样，这个Stackoverflow答案中有相同的代码，但它使用了多个__asm__语句，您很好奇如何编写单个__asm__语句。

舍入模式（0,1,2,3）可以在英特尔架构文档中找到：

舍入模式RC场

00B圆形结果最接近无限精确结果。如果两个值相等，则结果为偶数值（即最低有效位为零的值）。默认向下舍入（朝向-∞）

01B圆形结果最接近但不大于无限精确结果。向上舍入（朝向+∞）

10B圆形结果最接近但不小于无限精确结果。回零（截断）

11B圆形结果绝对值最接近但不大于无限精确结果。

在8.1.5节（第8.1.5.3节中具体描述的舍入模式）中，有对字段的描述。 4种舍入模式在图4-8的4.8.4节中定义。

汇编代码fsqrt和fmul指令

使用内联汇编和x87进行舍入

IA32注册地址

用于Sparc架构的GCC内联汇编

寻找最快的汉明距离C实现

C unsigned long long和imulq

SSE FPU并行

执行从x86程序集编译的程序时出现分段错误？

x86的MOV真的可以“免费”吗？为什么我不能重现这个呢？

编写MIPS机器指令并从C执行它们

英特尔的时间戳读取asm代码示例是否使用了两个以上的寄存器？

Cygwin gcc – asm错误：

汇编代码fsqrt和fmul指令

使用内联汇编和x87进行舍入

IA32注册地址

用于Sparc架构的GCC内联汇编

寻找最快的汉明距离C实现

C unsigned long long和imulq

SSE FPU并行

执行从x86程序集编译的程序时出现分段错误？

x86的MOV真的可以“免费”吗？ 为什么我不能重现这个呢？

编写MIPS机器指令并从C执行它们

英特尔的时间戳读取asm代码示例是否使用了两个以上的寄存器？

Cygwin gcc – asm错误：

x86的MOV真的可以“免费”吗？为什么我不能重现这个呢？