通过研究案例学习Python GIL

本文介绍: 综上所述，在使用Py t h on时，了解GIL是很重要的。GIL的限制不会影响大多数计算密集型的AI/ML和科学计算工作负载，因为像NumPy、Tenso rFlow和PyTorch等流行框架的核心实际上是用C++实现的，并且只有Pyt h on的API接口。这显示的结果与上一次实验类似，在这种情况下，不是Pyt h on 解释器预先释放了GIL，而是numpy 自己主动释放了GIL。本文将在numpy中实现计数函数，并进行与之前相同的实验，但这次要计数到一千万，因为num py的实现效率更高。获得GIL并运行的机会。

大家好，Pyt h on因其全局解释器锁（GIL）而声名狼藉，GIL限制了Pyth on 解释器一次只能执行一个线程。在现代多核CPU上，这是一个问题，因为程序无法利用多个核心。不过，尽管存在这种限制，Pyth on仍已成为从后端We b应用到AI/ML和科学计算等领域的顶级语言。

1.训练数据管道的结构

对于大多数后端We b应用来说，GIL的限制并不是一个约束，因为它们通常受到I/O的限制。在这些应用中，大部分时间只是等待来自用户、数据库或下游服务的输入。系统只需具备并发性，而不一定需要并行性。Pyth on 解释器在执行I/O操作时会释放GIL，因此当线程等待I/O完成时，就会给另一个线程获得GIL并执行的机会。

GIL的限制不会影响大多数计算密集型的AI/ML和科学计算工作负载，因为像NumPy、Tenso rFlow和PyTorch等流行框架的核心实际上是用C++实现的，并且只有Python的API接口。大部分计算可以在不获取GIL的情况下进行。这些框架使用的底层C/C++内核库（如Op enBLAS或Intel MKL）可以利用多个核心而不受GIL的限制。

2.使用纯Python的计算任务

具体来说，可以考虑以下两个简单的任务。

import time

def io_task():
    start = time.time()
    while True:
        time.sleep(1)
        wake = time.time()
        print(f"woke after: {wake - start}")
        start = wake
        
def count_py(n):
  compute_start = time.time()
  s = 0
  for i in range(n):
      s += 1
  compute_end = time.time()
  print(f"compute time: {compute_end - compute_start}")
  return s

在这里，通过休眠一秒钟来模拟一个I/O限制的任务，然后唤醒并打印它休眠了多长时间，然后再次休眠。count_py是一个计算密集型的任务，它简单地对数字n进行计数。如果同时运行这两个任务会发生什么？

import threading

io_thread = threading.Thread(target=io_task, daemon=True)
io_thread.start()
count_py(100000000)

woke after: 1.0063529014587402
woke after: 1.009704828262329
woke after: 1.0069530010223389
woke after: 1.0066332817077637
compute time: 4.311860084533691

count_py需要大约4.3秒才能计数到一百万，但是io_task在同一时间内运行而不受影响，大约在1秒后醒来，与预期相符。尽管计算任务需要4.3秒，但Python解释器可以预先从运行计算任务的主线程中释放GIL，并给予io_thread获得GIL并运行的机会。

import numpy as np

def count_np(n):
    compute_start = time.time()
    s = np.ones(n).sum()
    compute_end = time.time()
    print(f"compute time: {compute_end - compute_start}")
    return s
  
io_thread = threading.Thread(target=io_task, daemon=True)
io_thread.start()
count_np(1000000000)

woke after: 1.0001161098480225
woke after: 1.0008511543273926
woke after: 1.0004539489746094
woke after: 1.1320469379425049
compute time: 4.1334803104400635

// importing Python C API Header
#include <Python.h&gt;
#include <vector&gt;

static PyObject *count(PyObject *self, PyObject *args){
  long num;

  if (!PyArg_ParseTuple(args, "l", &amp;num))
         return NULL;
  long result = 0L;
  std::vector<long&gt; v(num, 1L);
  for (long i=0L; i<num; i++) {
    result += v[i];
   }

  return Py_BuildValue("l", result);

}


// defining our functions like below:
// function_name, function, METH_VARARGS flag, function documents
static PyMethodDef functions[] = {
  {"count", count, METH_VARARGS, "Count."},
  {NULL, NULL, 0, NULL}
};

// initializing our module informations and settings in this structure
// for more informations, check head part of this file. there are some important links out there.
static struct PyModuleDef countModule = {
  PyModuleDef_HEAD_INIT, // head informations for Python C API. It is needed to be first member in this struct !!
  "count",  // module name
  NULL,
  -1,
  functions  // our functions list
};

// runs while initializing and calls module creation function.
PyMODINIT_FUNC PyInit_count(void){
  return PyModule_Create(&amp;countModule);
}

可以通过运行python setup.py build来构建扩展，使用以下setup.py：

from distutils.core import setup, Extension

count_module = Extension('count', sources=['count.cpp'])

setup(name='python_count_extension',
      version='0.1',
      description='An Example For Python C Extensions',
      ext_modules=[count_module],
      )

import count 

def count_custom(n):
    compute_start = time.time()
    s = count.count(n)
    compute_end = time.time()
    print(f"compute time: {compute_end - compute_start}")
    return s



io_thread = threading.Thread(target=io_task, daemon=True)
io_thread.start()
count_custom(1000000000)

woke after: 4.414866924285889
compute time: 4.414893865585327

在这种情况下，本例进行了一个不会影响任何Python对象的琐碎计算，因此可以在C++的计数函数中使用宏Py_BEGIN_ALLOW_THREADS和Py_END_ALLOW_THREADS来释放GIL：

static PyObject *count(PyObject *self, PyObject *args){
  long num;

  if (!PyArg_ParseTuple(args, "l", &amp;num))
         return NULL;
  long result = 0L;
  Py_BEGIN_ALLOW_THREADS
  std::vector<long> v(num, 1L);
  for (long i=0L; i<num; i++) {
    result += v[i];
   }
   Py_END_ALLOW_THREADS

  return Py_BuildValue("l", result);

}

woke after: 1.0026037693023682
woke after: 1.003467082977295
woke after: 1.0028629302978516
woke after: 1.1772480010986328
compute time: 4.186192035675049