一旦pthread_barrier_wait返回，屏障怎么可以破坏？

这个问题基于：

什么时候摧毁pthread屏障是否安全？

和最近的glibc错误报告：

http://sourceware.org/bugzilla/show_bug.cgi?id=12674

我不确定在glibc中报告的信号量问题，但据推测，一旦pthread_barrier_wait返回，它应该是有效的，以便根据上述链接问题消除障碍。（通常，获得PTHREAD_BARRIER_SERIAL_THREAD的线程或已经认为自己对屏障对象“负责”的“特殊”线程将是销毁它的那个。）我能想到的主要用例是当使用屏障时同步新线程在创建线程堆栈上使用数据，防止创建线程返回，直到新线程使用数据为止; 其他障碍可能具有与整个程序相同的生命周期，或由其他一些同步对象控制。

在任何情况下，只要pthread_barrier_wait在任何线程中返回，实现如何确保屏障的破坏（甚至可能取消映射它所驻留的内存）都是安全的？似乎尚未返回的其他线程需要检查屏障对象的至少某些部分才能完成其工作并返回，就像在上面引用的glibc错误报告中一样， sem_post必须检查服务员计数后调整了信号量值。

我将使用pthread_barrier_wait()的示例实现来解决此问题，该实现使用pthreads实现可能提供的互斥和条件变量function。请注意，此示例不会尝试处理性能注意事项（特别是，当等待线程被解除阻塞时，它们在退出等待时都会重新序列化）。我认为使用Linux Futex对象之类的东西可以帮助解决性能问题，但是Futexes仍然完全不符合我的经验。

此外，我怀疑这个例子正确处理信号或错误（如果是信号的话）。但我认为对这些事情的适当支持可以作为读者的练习添加。

我主要担心的是这个例子可能有竞争条件或死锁（互斥锁处理比我喜欢的更复杂）。另请注意，这是一个甚至尚未编译的示例。将其视为伪代码。还要记住，我的经验主要是在Windows中 – 我将此作为一个教育机会而不是其他任何东西。因此伪代码的质量可能非常低。

但是，除了免责声明之外，我认为它可以说明如何处理问题中提出的问题（即， pthread_barrier_wait()函数如何允许其使用的pthread_barrier_t对象被任何已释放的线程销毁在出路时由一个或多个线程使用屏障对象的危险）。

开始：

 /* * Since this is a part of the implementation of the pthread API, it uses * reserved names that start with "__" for internal structures and functions * * Functions such as __mutex_lock() and __cond_wait() perform the same function * as the corresponding pthread API. */ // struct __barrier_wait data is intended to hold all the data // that `pthread_barrier_wait()` will need after releasing // waiting threads. This will allow the function to avoid // touching the passed in pthread_barrier_t object after // the wait is satisfied (since any of the released threads // can destroy it) struct __barrier_waitdata { struct __mutex cond_mutex; struct __cond cond; unsigned waiter_count; int wait_complete; }; struct __barrier { unsigned count; struct __mutex waitdata_mutex; struct __barrier_waitdata* pwaitdata; }; typedef struct __barrier pthread_barrier_t; int __barrier_waitdata_init( struct __barrier_waitdata* pwaitdata) { waitdata.waiter_count = 0; waitdata.wait_complete = 0; rc = __mutex_init( &waitdata.cond_mutex, NULL); if (!rc) { return rc; } rc = __cond_init( &waitdata.cond, NULL); if (!rc) { __mutex_destroy( &pwaitdata->waitdata_mutex); return rc; } return 0; } int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count) { int rc; result = __mutex_init( &barrier->waitdata_mutex, NULL); if (!rc) return result; barrier->pwaitdata = NULL; barrier->count = count; //TODO: deal with attr } int pthread_barrier_wait(pthread_barrier_t *barrier) { int rc; struct __barrier_waitdata* pwaitdata; unsigned target_count; // potential waitdata block (only one thread's will actually be used) struct __barrier_waitdata waitdata; // nothing to do if we only need to wait for one thread... if (barrier->count == 1) return PTHREAD_BARRIER_SERIAL_THREAD; rc = __mutex_lock( &barrier->waitdata_mutex); if (!rc) return rc; if (!barrier->pwaitdata) { // no other thread has claimed the waitdata block yet - // we'll use this thread's rc = __barrier_waitdata_init( &waitdata); if (!rc) { __mutex_unlock( &barrier->waitdata_mutex); return rc; } barrier->pwaitdata = &waitdata; } pwaitdata = barrier->pwaitdata; target_count = barrier->count; // all data necessary for handling the return from a wait is pointed to // by `pwaitdata`, and `pwaitdata` points to a block of data on the stack of // one of the waiting threads. We have to make sure that the thread that owns // that block waits until all others have finished with the information // pointed to by `pwaitdata` before it returns. However, after the 'big' wait // is completed, the `pthread_barrier_t` object that's passed into this // function isn't used. The last operation done to `*barrier` is to set // `barrier->pwaitdata = NULL` to satisfy the requirement that this function // leaves `*barrier` in a state as if `pthread_barrier_init()` had been called - and // that operation is done by the thread that signals the wait condition // completion before the completion is signaled. // note: we're still holding `barrier->waitdata_mutex`; rc = __mutex_lock( &pwaitdata->cond_mutex); pwaitdata->waiter_count += 1; if (pwaitdata->waiter_count < target_count) { // need to wait for other threads __mutex_unlock( &barrier->waitdata_mutex); do { // TODO: handle the return code from `__cond_wait()` to break out of this // if a signal makes that necessary __cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex); } while (!pwaitdata->wait_complete); } else { // this thread satisfies the wait - unblock all the other waiters pwaitdata->wait_complete = 1; // 'release' our use of the passed in pthread_barrier_t object barrier->pwaitdata = NULL; // unlock the barrier's waitdata_mutex - the barrier is // ready for use by another set of threads __mutex_unlock( barrier->waitdata_mutex); // finally, unblock the waiting threads __cond_broadcast( &pwaitdata->cond); } // at this point, barrier->waitdata_mutex is unlocked, the // barrier->pwaitdata pointer has been cleared, and no further // use of `*barrier` is permitted... // however, each thread still has a valid `pwaitdata` pointer - the // thread that owns that block needs to wait until all others have // dropped the pwaitdata->waiter_count // also, at this point the `pwaitdata->cond_mutex` is locked, so // we're in a critical section rc = 0; pwaitdata->waiter_count--; if (pwaitdata == &waitdata) { // this thread owns the waitdata block - it needs to hang around until // all other threads are done // as a convenience, this thread will be the one that returns // PTHREAD_BARRIER_SERIAL_THREAD rc = PTHREAD_BARRIER_SERIAL_THREAD; while (pwaitdata->waiter_count!= 0) { __cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex); }; __mutex_unlock( &pwaitdata->cond_mutex); __cond_destroy( &pwaitdata->cond); __mutex_destroy( &pwaitdata_cond_mutex); } else if (pwaitdata->waiter_count == 0) { __cond_signal( &pwaitdata->cond); __mutex_unlock( &pwaitdata->cond_mutex); } return rc; }

20111年7月17日：针对流程共享障碍的评论/问题进行更新

我完全忘记了流程之间共享障碍的情况。正如你所提到的那样，我概述的想法在这种情况下会失败。我对POSIX共享内存的使用并不熟悉，所以我提出的任何建议都应该受到怀疑 。

总结一下（为了我的利益，如果没有其他人的话）：

当pthread_barrier_wait()返回后任何线程获得控制时，barrier对象需要处于’init’状态（但是，该对象上的最新pthread_barrier_init()设置它）。 API还暗示，一旦任何线程返回，就会发生以下一种或多种情况：

另一个调用pthread_barrier_wait()来启动新一轮的线程同步
屏障对象上的pthread_barrier_destroy()
如果屏障对象位于共享内存区域，则可以释放或取消共享为屏障对象分配的内存。

这些意味着在pthread_barrier_wait()调用允许任何线程返回之前，它几乎需要确保所有等待的线程不再在该调用的上下文中使用barrier对象。我的第一个答案是通过在屏障对象之外创建一个阻止所有线程的“本地”同步对象（互斥和相关条件变量）来解决这个问题。这些本地同步对象是在首先调用pthread_barrier_wait()的线程的堆栈上分配的。

我认为需要对流程共享的障碍做类似的事情。但是，在这种情况下，简单地在线程堆栈上分配这些同步对象是不够的（因为其他进程没有访问权限）。对于进程共享屏障，必须在进程共享内存中分配这些对象。我认为上面列出的技术可以类似地应用：

控制本地同步变量（等待数据块）的“分配”的waitdata_mutex已经在进程共享内存中，因为它位于屏障结构中。当然，当屏障设置为THEAD_PROCESS_SHARED ，该属性也需要应用于waitdata_mutex
当__barrier_waitdata_init()来初始化本地互斥和条件变量时，它必须在共享内存中分配这些对象，而不是简单地使用基于堆栈的waitdata变量。
当’cleanup’线程破坏waitdata块中的互斥锁和条件变量时，它还需要清理块的进程共享内存分配。
在使用共享内存的情况下，需要有一些机制来确保共享内存对象在每个进程中至少打开一次，并在每个进程中关闭正确的次数（但不是在每个进程之前完全关闭）使用它完成了该过程）。我还没有想到如何做到这一点……

我认为这些变化将使该计划能够与流程共享障碍一起运作。上面的最后一个要点是要弄清楚的关键项目。另一个是如何为共享内存对象构造一个名称，该名称将保存“本地”进程共享的waitdata 。您希望该名称具有某些属性：

您希望名称的存储位于struct pthread_barrier_t结构中，以便所有进程都可以访问它; 这意味着名称长度的已知限制
你希望这个名称对pthread_barrier_wait()的一组调用的每个’实例’都是唯一的，因为在所有线程完全从第一轮等待之前可能有第二轮等待开始（因此可能尚未释放为waitdata设置的进程共享内存块）。因此，名称可能必须基于诸如进程ID，线程ID，屏障对象的地址和primefaces计数器之类的东西。
我不知道这个名字是否“可猜测”是否存在安全隐患。如果是这样，需要添加一些随机化 – 不知道多少。也许你还需要将上面提到的数据与随机位一起散列。就像我说的，我真的不知道这是否重要。

据我所知， pthread_barrier_destroy不需要立即操作。您可以让它等到所有仍处于唤醒阶段的线程被唤醒。

例如，你可能有一个primefaces计数器awakening ，最初设置为被唤醒的线程数。然后它将在pthread_barrier_wait返回之前作为最后一个动作递减。然后pthread_barrier_destroy可能会旋转直到该计数器降至0 。