线程通过sysfs调用内核信号量的死锁

源于这个问题（和我的解决方案），我已经意识到可能存在死锁，但我无法理解为什么以及如何避免它。

简而言之，内核空间中有一个semaphore ，即内核模块（它们实际上是在内核空间中运行的应用程序）可以采用，但用户空间应用程序也需要使用相同的信号量来保护全局共享内存。

我通过公开给出正确字符的sysfs文件，在内核空间中down或up信号量来完成此操作。用户空间应用程序只是保持此文件打开并为锁定write适当的字符。

这是一个用于演示的示例内核模块：

 #include  #include  #include  #include  MODULE_LICENSE("GPL"); MODULE_AUTHOR("Shahbaz Youssefi"); MODULE_DESCRIPTION("Test module"); static struct kobject *_kobj = NULL; static struct semaphore sem; static ssize_t _lock_op(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { switch (buf[0]) { case '0': printk("down (%u)\n", sem.count); if (down_interruptible(&sem)) printk("error: sem wait interrupted\n"); break; case '1': printk("up (%u)\n", sem.count); up(&sem); break; default: printk("error: invalid request %d\n", buf[0]); } return count; } static struct kobj_attribute _lock_attr = __ATTR(test, 0222, NULL, _lock_op); static int __init _main_init(void) { sema_init(&sem, 1); _kobj = kobject_create_and_add("test", NULL); if (!_kobj) { printk("error: failed to create /sys directory for test\n"); return -ENOMEM; } if (sysfs_create_file(_kobj, &_lock_attr.attr)) printk("error: could not create /sys file\n"); printk("loaded\n"); return 0; } static void __exit _main_exit(void) { if (_kobj) kobject_put(_kobj); _kobj = NULL; printk("unloaded\n"); } module_init(_main_init); module_exit(_main_exit);

这一般来说效果很好。用户空间应用程序可以将'0'或'1'写入sysfs文件，它们可以实现互斥而不会出现问题。

但是，有一种情况会锁定进程，即同一进程的多个线程尝试获取锁。

基本上，这样的事情：

  Thread 1 Thread 2 write '0' system call _lock_op down_interruptible return from syscall write '0' system call _lock_op down_interruptible (blocked) *go on to release the lock* *return from syscall* *go on to release the lock*

问题在于，在这种情况下，第二次发生down而第一次发生down仍然没有释放锁定，而不是仅仅第二次线程被阻止， 整个过程被阻止 。也就是说，标有*的步骤不会发生。

这是一个用户空间应用程序，可以在插入上述内核模块时触发它：

 #include  #include  #include  #include  #include  #include  #include  #include  static int fid; static volatile sig_atomic_t interrupted = 0; static void sig_handler(int signum) { interrupted = 1; } static void *func(void *arg) { while (!interrupted) { write(fid, "0", 1); write(fid, "1", 1); usleep(1000); } return NULL; } int main(void) { pthread_t tid; struct sigaction sa = { .sa_handler = sig_handler, }; sigemptyset(&sa.sa_mask); sigaction(SIGSEGV, &sa, NULL); sigaction(SIGINT, &sa, NULL); sigaction(SIGHUP, &sa, NULL); sigaction(SIGTERM, &sa, NULL); sigaction(SIGQUIT, &sa, NULL); sigaction(SIGUSR1, &sa, NULL); sigaction(SIGUSR2, &sa, NULL); fid = open("/sys/test/test", O_WRONLY); if (fid < 0) return EXIT_FAILURE; pthread_create(&tid, NULL, func, NULL); while (!interrupted) { write(fid, "0", 1); write(fid, "1", 1); usleep(793); } pthread_join(tid, NULL); close(fid); return 0; }

注意：做echo 1 > /sys/test/test来解锁你自己;）

我的问题是，为什么Linux会阻止整个进程而不仅仅是调用线程？ 我能做些什么呢？

注意：在x86上测试，内核3.8用RTAI修补。我稍后会尝试使用更新的香草内核，但我怀疑它与RTAI无关。

我实际上找到了解决这个问题的方法，但我仍然认为应该有一个正确的解释和解决方案。

我的解决方法如下：

在应用程序中使用pthread互斥锁。在每个线程上，而不是：

 write(fid, "0", 1); /* access */ write(fid, "1", 1);

做

 pthread_mutex_lock(&mutex); write(fid, "0", 1); /* access */ write(fid, "1", 1); pthread_mutex_unlock(&mutex);

这使得进程对sysfs文件的所有访问都是互斥的。 sysfs文件确保访问在进程和内核模块之间是互斥的。

线程通过sysfs调用内核信号量的死锁

内联函数的前向声明

关于C字符串的问题

这个C语句是什么意思？

Scanf（“％c％f％d％c”）返回奇怪的值

字符串数组中的指针类型警告不兼容

无法在OpenGL中将id分配给属性

文件描述符，open（）返回零

创建静态库

GTK + gcc：链接时对所有gtk函数的未定义引用

如何将整数值转换为罗马数字字符串？