如何从内联asm访问C结构/变量？

请考虑以下代码：

int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr) { uint32 q, m; /* Division Result */ uint32 i; /* Loop Counter */ uint32 j; /* Loop Counter */ /* Check Input */ if (bn1 == NULL) return(EFAULT); if (bn1->dat == NULL) return(EFAULT); if (bn2 == NULL) return(EFAULT); if (bn2->dat == NULL) return(EFAULT); if (bnr == NULL) return(EFAULT); if (bnr->dat == NULL) return(EFAULT); #if defined(__i386__) || defined(__amd64__) __asm__ (".intel_syntax noprefix"); __asm__ ("pushl %eax"); __asm__ ("pushl %edx"); __asm__ ("pushf"); __asm__ ("movl %eax, (bn1->dat[i])"); __asm__ ("xorl %edx, %edx"); __asm__ ("divl (bn2->dat[j])"); __asm__ ("movl (q), %eax"); __asm__ ("movl (m), %edx"); __asm__ ("popf"); __asm__ ("popl %edx"); __asm__ ("popl %eax"); #else q = bn->dat[i] / bn->dat[j]; m = bn->dat[i] % bn->dat[j]; #endif /* Return */ return(0); }

数据类型uint32基本上是无符号long int或uint32_t无符号32位整数。 bnint类型是unsigned short int（uint16_t）或uint32_t，具体取决于64位数据类型是否可用。如果64位可用，则bnint是uint32，否则它是uint16。这样做是为了捕获代码其他部分的进位/溢出。结构bn_t定义如下：

 typedef struct bn_data_t bn_t; struct bn_data_t { uint32 sz1; /* Bit Size */ uint32 sz8; /* Byte Size */ uint32 szw; /* Word Count */ bnint *dat; /* Data Array */ uint32 flags; /* Operational Flags */ };

该函数在我的源代码中的第300行开始。因此，当我尝试编译/制作它时，我会收到以下错误：

 system:/home/user/c/m3/bn 1036 $$$ ->make clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal -Winline -Wunknown-pragmas -Wundef -Wendif-labels -c /home/user/c/m3/bn/bn.c /home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable] uint32 q, m; /* Division Result */ ^ /home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable] uint32 q, m; /* Division Result */ ^ /home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable] uint32 i; /* Loop Counter */ ^ /home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable] uint32 j; /* Loop Counter */ ^ /home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression __asm__ ("movl %eax, (bn1->dat[i])"); ^ :1:18: note: instantiated into assembly here movl %eax, (bn1->dat[i]) ^ /home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression __asm__ ("divl (bn2->dat[j])"); ^ :1:12: note: instantiated into assembly here divl (bn2->dat[j]) ^ 4 warnings and 2 errors generated. *** [bn.o] Error code 1 Stop in /home/user/c/m3/bn. system:/home/user/c/m3/bn 1037 $$$ ->

我知道的：

我认为自己相当精通x86汇编程序（从我上面编写的代码中可以看出）。然而，我最后一次混合使用高级语言和汇编程序时大约15 – 20年前使用Borland Pascal为游戏编写图形驱动程序（Windows 95之前的版本）。我熟悉英特尔语法。

我不知道的是：

如何从asm访问bn_t（尤其是* dat）的成员？由于* dat是指向uint32的指针，因此我将元素作为数组访问（例如，bn1-> dat [i]）。

如何访问堆栈中声明的局部变量？

我正在使用push / pop将clobbered寄存器恢复到以前的值，以免扰乱编译器。但是，我是否还需要在局部变量中包含volatile关键字？

或者，有没有更好的方法，我不知道？我不想将它放在单独的函数调用中，因为调用开销因为此函数对性能至关重要。

额外：

现在，我刚刚开始编写这个函数，所以它不是完整的。缺少循环和其他此类支持/粘合代码。但是，主要要点是访问局部变量/结构元素。

编辑1：

我使用的语法似乎是clang支持的唯一语法。我尝试了下面的代码，clang给了我各种错误：

 __asm__ ("pushl %%eax", "pushl %%edx", "pushf", "movl (bn1->dat[i]), %%eax", "xorl %%edx, %%edx", "divl ($0x0c + bn2 + j)", "movl %%eax, (q)", "movl %%edx, (m)", "popf", "popl %%edx", "popl %%eax" );

它希望我在第一行放置一个右括号，替换逗号。我切换到使用%%而不是％，因为我读到某处内联汇编需要%%来表示CPU寄存器，而clang告诉我我使用的是无效的转义序列。

如果你只需要32b / 32b => 32bit除法， 让编译器使用div两个输出 ，gcc，clang和icc都可以正常使用，正如你在Godbolt编译器浏览器中看到的那样：

 uint32_t q = bn1->dat[i] / bn2->dat[j]; uint32_t m = bn1->dat[i] % bn2->dat[j];

编译器非常擅长将CSE转换为一个div 。只要确保你没有将除法结果存储在gcc无法certificate不会影响余数输入的地方。

例如*m = dat[i] / dat[j]可能重叠（别名） dat[i]或dat[j] ，因此gcc必须重新加载操作数div做%操作的div 。有关坏/好示例，请参阅godbolt链接。

对于32位/ 32位= 32位div使用内联asm并不能获得任何东西，实际上使用clang会使代码更糟（请参阅godbolt链接）。

如果你需要64位/ 32位= 32位，你可能需要asm，如果它没有内置的编译器。（GNU C没有，AFAICT）。 C中的显而易见的方法（将操作数转换为uint64_t ）会生成对64位/ 64位= 64位libgcc函数的调用，该函数具有分支和多个div指令。 gcc不擅长certificate结果适合32位，因此单个div指令不会导致#DE 。

对于很多其他指令，你可以避免在很多时候使用内置函数来编写内联函数，例如popcount 。使用-mpopcnt ，它会编译为popcnt指令（并说明Intel CPU所具有的输出操作数的假依赖性。）如果没有，则编译为libgcc函数调用。

总是更喜欢使用内置函数或编译为良好asm的纯C，因此编译器知道代码的作用 。当内联在编译时使一些参数已知时，可以优化或简化纯C，但使用内联asm的代码只会将常量加载到寄存器中并在运行时执行div 。内联asm也在相同数据的类似计算之间击败CSE，当然也不能自动向量化。

正确使用GNU C语法

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html解释了如何告诉汇编器在寄存器中需要哪些变量，以及输出是什么。

如果您愿意，可以使用类似Intel / MASM的语法和助记符，以及非％寄存器名称，最好使用-masm=intel进行编译。 AT＆T语法错误（ fsub和fsubr助记符被颠倒）可能仍然存在于intel-syntax模式中; 我忘了。

大多数使用GNU C inline asm的软件项目仅使用AT＆T语法。

有关更多GNU C内联asm信息和x86标记wiki，请参阅此答案的底部。

asm语句需要一个字符串arg和3组约束。使其成为多行的最简单方法是使每个asm行成为以\n结尾的单独字符串，并让编译器隐式连接它们。

此外，您告诉编译器您需要哪些寄存器。然后，如果变量已经存在于寄存器中，则编译器不必溢出它们并让您加载和存储它们。这样做会让自己陷入困境。 Brett Hale在评论中链接的教程有望涵盖所有这些内容。

用GNU C inline asm更正`div`例子

您可以在godbolt上看到编译器asm输出。

 uint32_t q, m; // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code. asm ("divl %[bn2dat_j]\n" : "=a" (q), "=d" (m) // results are in eax, edx registers : "d" (0), // zero edx for us, please "a" (bn1->dat[i]), // "a" means EAX / RAX [bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient : // no register clobbers, and we don't read/write "memory" other than operands );

"divl %4"也可以工作，但是当您添加更多输入/输出约束时，命名输入/输出不会更改名称。

如何从内联asm访问C结构/变量？

正确使用GNU C语法

用GNU C inline asm更正`div`例子

使用C语言更改GRUB的变量

c双指针数组

在C中打印字符及其ASCII码

目标C：7 – 1 = 3？

为什么我们不能以二进制forms打印十进制，八进制，hexforms的值？

快速跨平台C / C ++图像处理库

当我们以相同的代价获得全局变量时，静态局部变量的用途是什么？

GTK定时器 – 如何在帧内制作定时器

转换指针不会产生左值。为什么？

在x64 Visual Studio中内联汇编函数

如何从内联asm访问C结构/变量？

正确使用GNU C语法

用GNU C inline asm更正div例子

使用C语言更改GRUB的变量

c双指针数组

在C中打印字符及其ASCII码

目标C：7 – 1 = 3？

为什么我们不能以二进制forms打印十进制，八进制，hexforms的值？

快速跨平台C / C ++图像处理库

当我们以相同的代价获得全局变量时，静态局部变量的用途是什么？

GTK定时器 – 如何在帧内制作定时器

转换指针不会产生左值。 为什么？

在x64 Visual Studio中内联汇编函数

用GNU C inline asm更正`div`例子

转换指针不会产生左值。为什么？