概述
在Linux环境下,pthread库提供的pthread_create()API函数,用于创建一个线程。线程创建失败时,它可能会返回ENOMEM或EAGAIN。这篇文章主要讨论线程创建过程中碰到的一些问题和解决方法。
创建线程
首先,本文用的实例代码example.c:
| 
 /* example.c*/ 
#include <stdio.h> 
#include <stdlib.h> 
#include <unistd.h> 
#include <pthread.h> 
void thread(void) 
{ 
    int i; 
    for(i=0;i<3;i++) 
        printf("This is a pthread.\n"); 
 
    sleep(30); 
} 
 
int main(int argc,char **argv) 
{ 
    pthread_t id; 
    int i,ret; 
    ret=pthread_create(&id,NULL,(void *) thread,NULL); 
    if(ret!=0){ 
        printf ("Create pthread error!\n"); 
        exit (1); 
    } 
    for(i=0;i<3;i++) 
        printf("This is the main process.\n"); 
    pthread_join(id,NULL); 
    return 0; 
} 
 | 
 
编译,执行下面命令:
| 
 # example.c -lpthread -o example -g 
 | 
 
用strace工具跟踪线程创建的过程:
 
Strace工具输出:
| 
 getrlimit(RLIMIT_STACK,
 {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0  
uname({sys="Linux", node="yjye", ...})  = 0  
mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb6d1c000 
brk(0)                                  = 0x90e0000 
brk(0x9101000)                          = 0x9101000 
mprotect(0xb6d1c000, 4096, PROT_NONE)  = 0
 
clone(child_stack=0xb771c494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE 
_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb771cbd8, {entry_number:6, base_addr:0xb771cb70, limit:1048575, seg_32bi 
t:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb771cbd8) = 17209 
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 
 | 
 
由上表中的输出可以看出创建线程过程中的调用步骤:
- 通过系统调用getrlimit()
获取线程栈的大小(参数中的RLIMIT_STACK),在我的环境里(CentOS6),缺省值是10M。
 
- 调用mmap2()分配内存,大小为10489856字节,合10244K,比栈空间大了4K。返回0xb6d1c000。
 
- 调用mprotect(),设置个内存页的保护区(大小为4K),页面起始地址为0xb6d1c000。这个页面用于监测栈溢出,如果对这片内存有读写操作,那么将会触发一个SIGSEGV信号。下面布局图中的红色区域既是。
 
- 调用clone()创建线程。调用的第一个参数是一个地址:栈底的地址(这里具体为0xb771c494)。栈空间的内存使用,是从高位内存开始的。
 
从/proc/<pid>/smaps文件里,我们可以清楚地看到栈内存的映射情况:
| 
 090e0000-09101000 rw-p 00000000 00:00 0         [heap] 
Size:                132 kB 
Rss:                   4 kB 
Pss:                   4 kB 
Shared_Clean:          0 kB 
Shared_Dirty:          0 kB 
Private_Clean:         0 kB 
Private_Dirty:         4 kB 
Referenced:            4 kB 
Swap:                  0 kB 
KernelPageSize:        4 kB 
MMUPageSize:           4 kB 
b6d1c000-b6d1d000 ---p 00000000 00:00 0    #线程栈溢出监测区域 
Size:                  4 kB 
Rss:                   0 kB 
Pss:                   0 kB 
Shared_Clean:          0 kB 
Shared_Dirty:          0 kB 
Private_Clean:         0 kB 
Private_Dirty:         0 kB 
Referenced:            0 kB 
Swap:                  0 kB 
KernelPageSize:        4 kB 
MMUPageSize:           4 kB 
b6d1d000-b771e000 rw-p 00000000 00:00 0    #线程栈 
Size:              10244 kB 
Rss:                   8 kB 
Pss:                   8 kB 
Shared_Clean:          0 kB 
Shared_Dirty:          0 kB 
Private_Clean:         0 kB 
Private_Dirty:         8 kB 
Referenced:            8 kB 
Swap:                  0 kB 
KernelPageSize:        4 kB 
MMUPageSize:           4 kB 
 | 
 
从上面的映射文件的深蓝色部分中,我们看到,栈的空间总共为10244Kb,内存段是从b6d1d000到b771e000。从strace的输出中,我们看到栈底的地址为0xb771c494,那么,从0xb771c494到b771e000这段内存是做什么用的呢?它就是线程的TCB(thread‘s
 control block)和TLS区域( thread‘s local storage)。具体的线程内存空间布局如下:
GLIBC2.5与2.8
    研究GLIBC2.5和2.8里的pthread_create()相关代码,会发现在mmap()调用失败并返回ENOMEM时,作了点变动,新版里替换了错误码。
V2.5相关代码.../nptl/allocatestack.c:
| 
 mem = mmap (NULL, size, prot, 
              MAP_PRIVATE | MAP_ANONYMOUS | ARCH_MAP_FLAGS, -1, 0); 
 
      if (__builtin_expect (mem == MAP_FAILED, 0)) 
        { 
#ifdef ARCH_RETRY_MMAP 
          mem = ARCH_RETRY_MMAP (size); 
          if (__builtin_expect (mem == MAP_FAILED, 0)) 
#endif 
        return errno; 
        } 
 | 
 
V2.8里的.../nptl/allocatestack.c:
| 
 mem = mmap (NULL, size, prot, 
              MAP_PRIVATE | MAP_ANONYMOUS | ARCH_MAP_FLAGS, -1, 0); 
 
      if (__builtin_expect (mem == MAP_FAILED, 0)) 
        { 
#ifdef ARCH_RETRY_MMAP 
          mem = ARCH_RETRY_MMAP (size, prot); 
          if (__builtin_expect (mem == MAP_FAILED, 0)) 
#endif 
            { 
              if (errno == ENOMEM) 
                errno = EAGAIN; 
 
              return errno; 
            } 
        } 
 | 
 
如上面的代码片段所示,在V2.5,简单地将mmap()调用结果返回给用户,而在V2.8里,如果mmap()返回ENOMEM,那么GLIBC会将错误码改成EAGAIN再返回。
 
为什么pthread_create()会调用失败?
随着运行中的线程数量的增大,pthread_create()失败的可能性也会增大。因为这会使分配给线程的内存空间(比如说线程栈)累积太多,导致mmap()系统调用失败。
比如说,/proc/<pid>/smaps里有这样一个内存映射片段:
| 
 [...] 
7eb3d000-7f33c000 rw-p 7eb3d000 00:00 0 
Size:               8188 kB 
Rss:                  12 kB 
Pss:                  12 kB 
Shared_Clean:          0 kB 
Shared_Dirty:          0 kB 
Private_Clean:         0 kB 
Private_Dirty:        12 kB 
Referenced:           12 kB 
Swap:                  0 kB 
7f8f5000-7f90a000 rw-p 7ffeb000 00:00 0          [stack] 
Size:                 84 kB 
Rss:                  16 kB 
Pss:                  16 kB 
Shared_Clean:          0 kB 
Shared_Dirty:          0 kB 
Private_Clean:         0 kB 
Private_Dirty:        16 kB 
Referenced:           16 kB 
Swap:                  0 kB 
 | 
 
可用的内存空间是最后一个内存段和[stack]标签之间的空间:0x7F8F5000
 - 0x7F33C000 = 0x5B9000 = 6000640字节(也就是6MB)。按缺省配置,小于一个线程栈的空间(10MB)。这时再创建线程就要失败。
解决方法
   通常情况下,缺省10M的线程栈空间显然是太大了,所以建议通过调用pthread_attr_setstacksize()API来改变线程栈的大小。比如说以下代码片段:
| 
 //------------------------------------------------------- 
// Name   : create_thd 
// Usage  : Create a thread 
// Return : 0, if OK 
//          -1, if error (errno is set) 
//------------------------------------------------------- 
static int create_thd( 
                    void       *thd_par,  // Thread parameters 
                    size_t      stack_sz, 
                    void       *(*entry)(void *), 
                    pthread_t  *pThreadId // Thread identifier 
                     ) 
{ 
pthread_attr_t      attr; 
int                 rc = 0; 
int                 err_sav; 
 
  // Check the parameters 
  if (!pThreadId) 
  { 
    fprintf(stderr, "NULL thread id\n"); 
    errno = EINVAL; 
    return -1; 
  } 
 
  memset(&attr, 0, sizeof(attr)); 
 
  errno = pthread_attr_init(&attr); 
  if (0 != errno) 
  { 
    err_sav = errno; 
    fprintf(stderr, "pthread_attr_init() failed (errno = %d)\n", errno); 
    errno = err_sav; 
    return -1; 
  } 
 
  errno = pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); 
  if (0 != errno) 
  { 
    err_sav = errno; 
    fprintf(stderr, "pthread_attr_setscope() failed (errno = %d)\n", errno); 
    errno = err_sav; 
    rc = -1; 
    goto err; 
  } 
 
  errno = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); 
  if (0 != errno) 
  { 
    err_sav = errno; 
    fprintf(stderr, "pthread_attr_setdetachstate() failed (errno = %d)\n", errno); 
    errno = err_sav; 
    rc = -1; 
    goto err; 
  } 
 
  // Set the stack size 
  errno = pthread_attr_setstacksize(&attr, stack_sz); 
  if (0 != errno) 
  { 
    err_sav = errno; 
    fprintf(stderr, "Error %d on pthread_attr_setstacksize()\n", errno); 
    errno = err_sav; 
    rc = -1; 
    goto err; 
  } 
 
  // Thread creation 
  errno = pthread_create(pThreadId, 
                         &attr, 
                         entry, 
                         thd_par); 
  if (0 != errno) 
  { 
    err_sav = errno; 
    fprintf(stderr, "pthread_create() failed (errno = %m - %d)\n", errno); 
    errno = err_sav; 
    rc = -1; 
    goto err; 
  } 
 
  goto ok; 
 
err: 
 
ok: 
 
  // The following calls will alter errno 
  err_sav = errno; 
 
  errno = pthread_attr_destroy(&attr); 
  if (0 != errno) 
  { 
    fprintf(stderr, "pthread_attr_destroy() failed (errno = %d)\n", errno); 
    rc = -1; 
  } 
 
  errno = err_sav; 
 
  return rc; 
} // create_thd 
 | 
 
符号版本的链接问题
     回到我们前面的示例代码中来,在里面,我们在主进程里直接调用pthread_create()函数。我们来看一下它的链接情况:
| 
 [root@yjye yeyj]# nm example | grep pthread 
         U pthread_create@@GLIBC_2.1 
         U pthread_join@@GLIBC_2.0 
 | 
 
而上次在调试Freeswitch时,发现配置的栈大小居然不生效,所有子线程全部继承父线程的大小。这是怎么回事呢?Freeswitch调用的是apr封装后的接口,那我们看下apr的链接符号:
| 
 [root@yjye .libs]# nm libapr-1.a | grep pthread 
         U pthread_rwlock_destroy 
         U pthread_rwlock_init 
         U pthread_rwlock_rdlock 
         U pthread_rwlock_tryrdlock 
         U pthread_rwlock_trywrlock 
         U pthread_rwlock_unlock 
         U pthread_rwlock_wrlock 
         U pthread_mutex_destroy 
         U pthread_mutex_init 
         U pthread_mutex_lock 
         U pthread_mutex_trylock 
         U pthread_mutex_unlock 
         U pthread_mutexattr_destroy 
         U pthread_mutexattr_init 
         U pthread_mutexattr_settype 
         U pthread_cond_broadcast 
         U pthread_cond_destroy 
         U pthread_cond_init 
         U pthread_cond_signal 
         U pthread_cond_timedwait 
         U pthread_cond_wait 
00000080 d mutex_proc_pthread_methods 
00000a10 t proc_mutex_proc_pthread_acquire 
00000990 t proc_mutex_proc_pthread_cleanup 
00000a50 t proc_mutex_proc_pthread_create 
00000960 t proc_mutex_proc_pthread_release 
         U pthread_mutex_destroy 
         U pthread_mutex_init 
         U pthread_mutex_lock 
         U pthread_mutex_unlock 
         U pthread_mutexattr_destroy 
         U pthread_mutexattr_init 
         U pthread_mutexattr_setprotocol 
         U pthread_mutexattr_setpshared 
         U pthread_mutexattr_setrobust_np 
         U pthread_attr_destroy 
         U pthread_attr_getdetachstate 
         U pthread_attr_init 
         U pthread_attr_setdetachstate 
         U pthread_attr_setguardsize 
         U pthread_attr_setstacksize 
         U pthread_create 
         U pthread_detach 
         U pthread_exit 
         U pthread_join 
         U pthread_once 
         U pthread_self 
         U pthread_sigmask 
         U pthread_getspecific 
         U pthread_key_create 
         U pthread_key_delete 
         U pthread_setspecific 
 | 
   和前面相比,好像符号后面少了e@@GLIBC_2.1或者e@@GLIBC_2.0。通过GDB跟踪,发现最终调用的是pthread_join@@GLIBC_2.0。弄出两个版本来了。通过第三库调用pthread库,经常会出现这种情况。
我们看2.0的代码,打开文件…//nptl/pthread_create.c:
| 
 int 
__pthread_create_2_0 (newthread, attr, start_routine, arg) 
     pthread_t *newthread; 
     const pthread_attr_t *attr; 
     void *(*start_routine) (void *); 
     void *arg; 
{ 
  /* The ATTR attribute is not really of type `pthread_attr_t *‘.  It has 
     the old size and access to the new members might crash the program. 
     We convert the struct now.  */ 
  struct pthread_attr new_attr; 
 
  if (attr != NULL) 
    { 
      struct pthread_attr *iattr = (struct pthread_attr *) attr; 
      size_t ps = __getpagesize (); 
 
      /* Copy values from the user-provided attributes.  */ 
      new_attr.schedparam = iattr->schedparam; 
      new_attr.schedpolicy = iattr->schedpolicy; 
      new_attr.flags = iattr->flags; 
 
      /* Fill in default values for the fields not present in the old 
     implementation.  */ 
      new_attr.guardsize = ps; 
      new_attr.stackaddr = NULL; 
      new_attr.stacksize = 0; 
      new_attr.cpuset = NULL; 
 
      /* We will pass this value on to the real implementation.  */ 
      attr = (pthread_attr_t *) &new_attr; 
    } 
 
  return __pthread_create_2_1 (newthread, attr, start_routine, arg); 
} 
 | 
 
很明显,如果链接到老版本,那么设置栈大小的属性完全被忽略掉了。
怎么解决这个问题呢?强制指定链接的符号,让它调用GLIBC_2.1。感谢Linux提供的系统调用,dlvsym()正好可以解决这个问题:
| 
 #include <dlfcn.h> 
……… 
 
typedef int (*lxb_pcreate_t)(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg); 
 
static lxb_pcreate_t lxb_pthread_create; 
[...] 
   void *pSym; 
 
  // Get the version GLIBC_2.1 of pthread_create() symbol 
  pSym = dlvsym(RTLD_DEFAULT, "pthread_create", "GLIBC_2.1"); 
  if (NULL == pSym) 
  { 
    lxb_pthread_create = pthread_create; 
  } 
  else 
  { 
    lxb_pthread_create = (lxb_pcreate_t)pSym; 
    if (pSym != (void *)pthread_create) 
    { 
      LXB_PRINTF("Unexpected version of pthread_create() symbol ==> Forced to GLIBC_2.1\n"); 
    } 
  } 
 |