Python Multiprocessing: A Beginner's Guide with Examples
Multiprocessing in Python is the technique of running multiple processes concurrently in parallel in order to achieve improved performance and efficiency in applications that require a lot of computational resources.
Python provides a built-in module called multiprocessing
which allows us to create and manage processes in a simple and efficient manner. The multiprocessing
module provides both high-level and low-level interfaces for creating and managing processes.
Here's a basic example of how to use multiprocessing to run a function in parallel:
pythonCopy code
import multiprocessing
defmy_func
(
x):
return
x * x
if__name__ ==
'__main__':
with
multiprocessing.Pool(processes=
4)
aspool:
result = pool.
map(my_func, [
1,
2,
3,
4,
5])
(result)
In this example, we define a function my_func
that takes an argument x
and returns its square. We then use the multiprocessing.Pool
object to create a pool of four worker processes and apply the my_func
function to a list of numbers using the pool.map
method. The pool.map
method distributes the work across the worker processes, and returns the results as a list.
Note that we use the if __name__ == '__main__':
guard to ensure that the code is only executed when the script is run directly and not when it is imported by another module.
This is just a basic example, and there are many other features and techniques for using multiprocessing in Python, such as using shared memory, interprocess communication, and synchronization primitives. However, the basic idea is to split up a computation into smaller parts that can be run in parallel, and then use multiprocessing to distribute the work across multiple processes to speed up the overall computation.
Other features and techniques
1. Using the Process
class: The multiprocessing.Process
class provides a low-level interface for creating and managing individual processes. You can subclass the Process
class and define your own run
method to perform the work in the new process.
python code
import multiprocessing
classMyProcess
(multiprocessing.Process):
def
__init__
(
self, x):
super
().__init__()
self.x = x
def
run
(
self):
(self.x * self.x)
if__name__ ==
'__main__':
processes = []
for
i
inrange
(
5):
p = MyProcess(i)
processes.append(p)
p.start()
for
p
inprocesses:
p.join()
In this example, we define a custom process class MyProcess
that takes an argument x
and defines a run
method that prints the square of x
. We create a list of five MyProcess
objects, start them, and then wait for them to finish using the join
method.
2. Using shared memory: The multiprocessing.Value
and multiprocessing.Array
classes allow you to share data between processes. This can be useful when you want to avoid the overhead of copying large amounts of data between processes.
python code
import multiprocessing
defworker
(
n, counter):
for
i
inrange
(n):
with
counter.get_lock():
counter.value +=
1
if__name__ ==
'__main__':
counter = multiprocessing.Value(
'i',
0)
processes = []
for
i
inrange
(
4):
p = multiprocessing.Process(target=worker, args=(
1000000, counter))
processes.append(p)
p.start()
for
p
inprocesses:
p.join()
(counter.value)
In this example, we define a worker
function that takes a number n
and a shared counter counter
, and increments the counter n
times. We create a shared counter using multiprocessing.Value
, start four worker processes, and wait for them to finish. We then print the final value of the counter.
3. Using interprocess communication: The multiprocessing.Queue
class allows you to pass messages between processes. This can be useful when you want to coordinate the work of multiple processes.
python code
import multiprocessing
defproducer
(
queue):
for
i
inrange
(
5):
queue.put(i)
defconsumer
(
queue):
while
True
:
item = queue.get()
if
item
isNone
:
break
(item)
if__name__ ==
'__main__':
queue = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(queue,))
p2 = multiprocessing.Process(target=consumer, args=(queue,))
p1.start()
p2.start()
p1.join()
queue.put(
None)
p2.join()
In this example, we define a producer
function that puts five items into a queue, and a consumer
function that repeatedly gets items from the queue and prints them. We create a queue using multiprocessing.Queue
, start a producer process and a consumer process, and wait for them to finish. We then signal the consumer process to exit by putting a None
item into the queue.
4. Using synchronization primitives: The multiprocessing.Lock
, multiprocessing.RLock
, and multiprocessing.Semaphore
classes allow you to synchronize access to shared resources between processes. This can be useful when you want to avoid race conditions and ensure that multiple processes don't access a shared resource at the same time.
python code
import multiprocessing
defworker
(
n, lock):
for
i
inrange
(n):
with
lock:
(
f'Worker {multiprocessing.current_process().name} acquired lock')
(
f'Worker {multiprocessing.current_process().name} is working')
(
f'Worker {multiprocessing.current_process().name} released lock')
if__name__ ==
'__main__':
lock = multiprocessing.Lock()
processes = []
for
i
inrange
(
4):
p = multiprocessing.Process(target=worker, args=(
5, lock))
processes.append(p)
p.start()
for
p
inprocesses:
p.join()
In this example, we define a worker
function that takes a number n
and a lock lock
, and acquires the lock, does some work, and then releases the lock. We create a lock using multiprocessing.Lock
, start four worker processes, and wait for them to finish.
These are just a few examples of the many features and techniques available in the multiprocessing
module. Other features include using multiprocessing.Manager
to create shared objects, using multiprocessing.Pipe
to create bidirectional communication channels between processes, and using multiprocessing.Event
to signal between processes.
Define and Example of pool
A multiprocessing.Pool
is a convenient way to parallelize the execution of a function across multiple input values. A pool object maintains a pool of worker processes that can be used to execute calls asynchronously. Here's an example:
python code
import multiprocessing
defsquare
(
x):
return
x * x
if__name__ ==
'__main__':
pool = multiprocessing.Pool()
result = pool.
map(square, [
1,
2,
3,
4,
5])
(result)
In this example, we define a square
function that takes a single argument x
and returns the square of x
. We create a multiprocessing.Pool
object with no arguments, which creates a pool of worker processes with a default size of the number of available CPUs. We then call the map
method of the pool object with the square
function and a list of input values [1, 2, 3, 4, 5]
. The map
method returns a list of the results of applying the function to each input value, which we then print.
The Pool
object provides several other methods for executing functions asynchronously across multiple input values, including:
· apply
: Applies a function to a single input value and blocks until the result is available.
· imap
: Like map
, but returns an iterator over the results as they become available.
· imap_unordered
: Like imap
, but does not guarantee the order of the results.
Here's an example using imap
:
python code
import multiprocessing
import time
defslow_square
(
x):
time.sleep(
1)
return
x * x
if__name__ ==
'__main__':
pool = multiprocessing.Pool()
result_iterator = pool.imap(slow_square, [
1,
2,
3,
4,
5])
for
result
inresult_iterator:
(result)
In this example, we define a slow_square
function that takes a single argument x
, sleeps for one second to simulate a slow computation, and then returns the square of x
. We create a multiprocessing.Pool
object as before, and then call the imap
method with the slow_square
function and a list of input values. This returns an iterator over the results as they become available, which we then loop over and print. Note that the results are printed in order, even though the computations are slow, because imap
guarantees the order of the results.
Multiprocessing.Array
multiprocessing.Array
is a class in the multiprocessing
module that can be used to create shared arrays between multiple processes. Shared arrays allow multiple processes to access and modify the same array data without the need for copying data between processes.
Here's an example of using multiprocessing.Array
to create a shared array:
python code
import multiprocessing
defworker
(
arr):
for
i
inrange
(
len(arr)):
arr[i] *=
2
if__name__ ==
'__main__':
arr = multiprocessing.Array(
'i', [
1,
2,
3,
4,
5])
(
f'Before: {arr[:]}')
p = multiprocessing.Process(target=worker, args=(arr,))
p.start()
p.join()
(
f'After: {arr[:]}')
In this example, we create a shared array of type 'i' (integer) with the initial values [1, 2, 3, 4, 5]
. We define a worker
function that takes an array arr
, and multiplies each element of the array by 2. We start a new process running the worker
function with the shared array as an argument. After the process has finished, we print the contents of the shared array, which should now be [2, 4, 6, 8, 10]
.
The first argument to multiprocessing.Array
is a string that specifies the type of the array elements. The following type codes are supported:
· 'b': signed byte
· 'B': unsigned byte
· 'h': signed short
· 'H': unsigned short
· 'i': signed integer
· 'I': unsigned integer
· 'l': signed long
· 'L': unsigned long
· 'f': float
· 'd': double
You can also specify the size of the array as a second argument. If you don't specify a size, the array will be created with a single element. You can access and modify elements of the shared array using standard Python indexing and slicing notation. Note that because the array is shared, you should always use a lock or other synchronization mechanism to avoid race conditions when modifying the array from multiple processes.
Multiprocessing pipe
A multiprocessing.Pipe
is a method for creating a two-way communication channel between two processes in Python's multiprocessing
module. The two processes can be running on the same machine or different machines.
Here's an example of using multiprocessing.Pipe
to communicate between two processes:
python code
import multiprocessing
defsender
(
conn):
conn.send(
'hello')
conn.close()
defreceiver
(
conn):
message = conn.recv()
conn.close()
(message)
if__name__ ==
'__main__':
parent_conn, child_conn = multiprocessing.Pipe()
p1 = multiprocessing.Process(target=sender, args=(child_conn,))
p2 = multiprocessing.Process(target=receiver, args=(parent_conn,))
p1.start()
p2.start()
p1.join()
p2.join()
In this example, we define a sender
function that takes a conn
object representing one end of a pipe. The sender
process sends the string 'hello' through the pipe and then closes the connection. We also define a receiver
function that takes a conn
object representing the other end of the pipe. The receiver
process waits for a message to arrive on the pipe, reads the message using the recv
method, and then prints the message. The connection is then closed.
In the main block, we create a pair of multiprocessing.Pipe
objects, one for each end of the pipe. We create two new processes running the sender
and receiver
functions, passing in the appropriate conn
object for each process. We then start the two processes and wait for them to finish using the join
method.
When the sender
process sends the message through the pipe, it will be received by the receiver
process, which will then print the message 'hello'. Note that the order in which the processes execute is not guaranteed, so you may see the output in a different order each time you run the program.
What is multiprocessing in python? Why is it useful?
Multiprocessing is a Python module that enables the creation of processes that can run in parallel on a computer's CPU. By allowing multiple processes to run concurrently, multiprocessing can improve the performance of CPU-bound tasks in Python programs. Multiprocessing is useful for speeding up calculations, processing large amounts of data, and performing other computationally intensive tasks.
What are the differences between multiprocessing and multithreading?
Comparing multiprocessing and multithreading in Python:
Multiprocessing |
Multithreading |
|
Definition |
Multiple processes running concurrently |
Multiple threads running concurrently |
Memory |
Each process has its own memory space |
All threads share the same memory space |
Communication |
Processes communicate via IPC mechanisms |
Threads communicate via shared memory and locks |
Overhead |
Higher overhead due to process creation |
Lower overhead due to thread creation |
Scalability |
Good scalability on multi-core CPUs |
Limited scalability on multi-core CPUs |
Resource use |
Processes use more system resources (CPU, memory) |
Threads use less system resources |
Error handling |
Processes are isolated, so one process crashing does not affect others |
Threads can affect each other if not properly synchronized |
Overall, multiprocessing is better suited for CPU-bound tasks where parallelism is needed, while multithreading is better suited for I/O-bound tasks where parallelism is needed. Multiprocessing can also offer better scalability on multi-core CPUs, but comes with higher overhead due to process creation and more system resource usage. Multithreading, on the other hand, has lower overhead and uses less system resources, but has limited scalability on multi-core CPUs and can be more error-prone due to shared memory and lock synchronization issues.
Write a python code to create a process using the multiprocessing module.
Python code that creates a process using the multiprocessing
module:
python code
import multiprocessing
defmy_function
():
(
"Hello from a child process!")
if__name__ ==
'__main__':
p = multiprocessing.Process(target=my_function)
p.start()
p.join()
(
"The child process has completed.")
In this example, we first define a function my_function
that will be run by the child process. The function simply prints a message to the console.
We then use the multiprocessing.Process
class to create a new process, passing in the target
argument to specify the function to run in the child process. We start the child process using the start()
method and wait for it to complete using the join()
method. Finally, we print a message to indicate that the child process has completed.
When you run this code, you should see the message "Hello from a child process!" printed to the console followed by the message "The child process has completed." The order of these messages may vary since the parent and child processes are running concurrently.
Here's another example that demonstrates passing arguments to a child process using the multiprocessing
module:
import multiprocessing
defmy_function
(
name):
(
f"Hello from {name}!")
if__name__ ==
'__main__':
p = multiprocessing.Process(target=my_function, args=(
'Alice',))
p.start()
p.join()
(
"The child process has completed.")
In this example, we define a function my_function
that takes a single argument name
. The function simply prints a message to the console using the provided name.
We then create a new process using the multiprocessing.Process
class, passing in the target
argument to specify the function to run in the child process and the args
argument to specify the arguments to pass to the function. In this case, we pass the argument 'Alice'
to the child process.
We start the child process using the start()
method and wait for it to complete using the join()
method. Finally, we print a message to indicate that the child process has completed.
When you run this code, you should see the message "Hello from Alice!" printed to the console followed by the message "The child process has completed."
What is a multiprocessing pool in python? Why is it used?
In Python, a multiprocessing pool is a way to execute a function or callable object using a pool of worker processes. The pool manages a set of worker processes, and tasks are assigned to the available workers as they become available.
The multiprocessing
module provides the Pool
class that can be used to create a pool of worker processes. The Pool
class has a method called map
that allows you to apply a function to a sequence of arguments in parallel.
Here's an example that demonstrates the use of a multiprocessing pool:
python code
import multiprocessing
defsquare
(
x):
return
x**
2
if__name__ ==
'__main__':
# Create a pool with 4 worker processes
with
multiprocessing.Pool(processes=
4)
aspool:
# Apply the square function to the numbers 0 to 9 in parallel
results = pool.
map(square,
range(
10))
(results)
In this example, we define a function called square
that returns the square of its argument. We then create a Pool
object with 4 worker processes using a context manager. We use the map
method of the Pool
object to apply the square
function to the numbers 0 to 9 in parallel. The map
method returns a list of results, which we print to the console.
The output of this code should be:
[
0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The multiprocessing pool is useful because it allows you to parallelize the execution of a function, which can lead to significant performance improvements on multi-core systems. By distributing the work across multiple processes, you can take advantage of all available CPU resources and reduce the overall processing time.
How can we create a pool of worker processes in python using the multiprocessing module?
In Python, you can create a pool of worker processes using the multiprocessing.Pool
class. Here's an example of how to create a pool of worker processes:
python code
import multiprocessing
defmy_function
(
x):
# Do some work here
return
x*x
if__name__ ==
'__main__':
# Create a pool of 4 worker processes
with
multiprocessing.Pool(processes=
4)
aspool:
# Apply the function to a sequence of inputs in parallel
results = pool.
map(my_function,
range(
10))
(results)
In this example, we define a function called my_function
that takes a single argument x
and returns its square. We then create a pool of 4 worker processes using the multiprocessing.Pool
class. We use the map
method of the Pool
object to apply the my_function
function to a sequence of inputs (the numbers 0 to 9 in this case) in parallel. The map
method returns a list of results, which we print to the console.
When you run this code, you should see the following output:
[
0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The Pool
class automatically manages the worker processes, distributing the work across them as needed. Once all the work has been completed, the pool closes and the worker processes are terminated. Using a pool of worker processes can be an effective way to parallelize computations and take advantage of multi-core systems.
Write a python program to create 4 processes, each process should print a different number using the multiprocessing module in python.
python code
import multiprocessing
defprint_number
(
num):
(
f"Process {num}: {num}")
if__name__ ==
'__main__':
# Create a list of numbers
numbers = [
1,
2,
3,
4]
# Create a process for each number in the list
processes = [multiprocessing.Process(target=print_number, args=(num,))
fornum
innumbers]
# Start each process
for
process
inprocesses:
process.start()
# Wait for each process to finish
for
process
inprocesses:
process.join()
In this example, we define a function called print_number
that takes a single argument num
and prints it along with the process number. We then create a list of numbers and a list of processes, where each process is created using the multiprocessing.Process
class and targets the print_number
function with a different number as its argument. We start each process using the start
method, and then wait for each process to finish using the join
method.
When you run this code, you should see output similar to the following:
Process 1:1
Process 2:2
Process 3:3
Process 4:4
Each process should print a different number, and the order in which they are printed may vary depending on the timing and scheduling of the operating system.