Python Multiprocessing: A Beginner's Guide with Examples

Multiprocessing in Python is the technique of running multiple processes concurrently in parallel in order to achieve improved performance and efficiency in applications that require a lot of computational resources.

Python provides a built-in module called multiprocessing which allows us to create and manage processes in a simple and efficient manner. The multiprocessing module provides both high-level and low-level interfaces for creating and managing processes.

Here's a basic example of how to use multiprocessing to run a function in parallel: 

pythonCopy code

import multiprocessing
def my_func(x):
    return x * x
if __name__ == '__main__':
    with multiprocessing.Pool(processes=4) as pool:
        result = pool.map(my_func, [1, 2, 3, 4, 5])
    print(result)

In this example, we define a function my_func that takes an argument x and returns its square. We then use the multiprocessing.Pool object to create a pool of four worker processes and apply the my_func function to a list of numbers using the pool.map method. The pool.map method distributes the work across the worker processes, and returns the results as a list.

Note that we use the if __name__ == '__main__': guard to ensure that the code is only executed when the script is run directly and not when it is imported by another module.

This is just a basic example, and there are many other features and techniques for using multiprocessing in Python, such as using shared memory, interprocess communication, and synchronization primitives. However, the basic idea is to split up a computation into smaller parts that can be run in parallel, and then use multiprocessing to distribute the work across multiple processes to speed up the overall computation.

Other features and techniques

 

1.     Using the Process class: The multiprocessing.Process class provides a low-level interface for creating and managing individual processes. You can subclass the Process class and define your own run method to perform the work in the new process.

python code

 

import multiprocessing
 
class MyProcess(multiprocessing.Process):
    def __init__(self, x):
        super().__init__()
        self.x = x
 
    def run(self):
        print(self.x * self.x)
 
if __name__ == '__main__':
    processes = []
    for i in range(5):
        p = MyProcess(i)
        processes.append(p)
        p.start()
    for p in processes:
        p.join()

In this example, we define a custom process class MyProcess that takes an argument x and defines a run method that prints the square of x. We create a list of five MyProcess objects, start them, and then wait for them to finish using the join method.

2.     Using shared memory: The multiprocessing.Value and multiprocessing.Array classes allow you to share data between processes. This can be useful when you want to avoid the overhead of copying large amounts of data between processes.

python code

import multiprocessing
def worker(n, counter):
    for i in range(n):
        with counter.get_lock():
            counter.value += 1
if __name__ == '__main__':
    counter = multiprocessing.Value('i', 0)
    processes = []
    for i in range(4):
        p = multiprocessing.Process(target=worker, args=(1000000, counter))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()
    print(counter.value)

In this example, we define a worker function that takes a number n and a shared counter counter, and increments the counter n times. We create a shared counter using multiprocessing.Value, start four worker processes, and wait for them to finish. We then print the final value of the counter.

3.     Using interprocess communication: The multiprocessing.Queue class allows you to pass messages between processes. This can be useful when you want to coordinate the work of multiple processes.

python code

import multiprocessing
 
def producer(queue):
    for i in range(5):
        queue.put(i)
 
def consumer(queue):
    while True:
        item = queue.get()
        if item is None:
            break
        print(item)
 
if __name__ == '__main__':
    queue = multiprocessing.Queue()
    p1 = multiprocessing.Process(target=producer, args=(queue,))
    p2 = multiprocessing.Process(target=consumer, args=(queue,))
    p1.start()
    p2.start()
    p1.join()
    queue.put(None)
    p2.join()

In this example, we define a producer function that puts five items into a queue, and a consumer function that repeatedly gets items from the queue and prints them. We create a queue using multiprocessing.Queue, start a producer process and a consumer process, and wait for them to finish. We then signal the consumer process to exit by putting a None item into the queue.

4.     Using synchronization primitives: The multiprocessing.Lock, multiprocessing.RLock, and multiprocessing.Semaphore classes allow you to synchronize access to shared resources between processes. This can be useful when you want to avoid race conditions and ensure that multiple processes don't access a shared resource at the same time.

python code

import multiprocessing
 
def worker(n, lock):
    for i in range(n):
        with lock:
            print(f'Worker {multiprocessing.current_process().name} acquired lock')
            print(f'Worker {multiprocessing.current_process().name} is working')
            print(f'Worker {multiprocessing.current_process().name} released lock')
 
if __name__ == '__main__':
    lock = multiprocessing.Lock()
    processes = []
    for i in range(4):
        p = multiprocessing.Process(target=worker, args=(5, lock))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()

In this example, we define a worker function that takes a number n and a lock lock, and acquires the lock, does some work, and then releases the lock. We create a lock using multiprocessing.Lock, start four worker processes, and wait for them to finish.

These are just a few examples of the many features and techniques available in the multiprocessing module. Other features include using multiprocessing.Manager to create shared objects, using multiprocessing.Pipe to create bidirectional communication channels between processes, and using multiprocessing.Event to signal between processes.

Define and Example of pool

multiprocessing.Pool is a convenient way to parallelize the execution of a function across multiple input values. A pool object maintains a pool of worker processes that can be used to execute calls asynchronously. Here's an example:

python code

import multiprocessing
 
def square(x):
    return x * x
 
if __name__ == '__main__':
    pool = multiprocessing.Pool()
    result = pool.map(square, [1, 2, 3, 4, 5])
    print(result)

In this example, we define a square function that takes a single argument x and returns the square of x. We create a multiprocessing.Pool object with no arguments, which creates a pool of worker processes with a default size of the number of available CPUs. We then call the map method of the pool object with the square function and a list of input values [1, 2, 3, 4, 5]. The map method returns a list of the results of applying the function to each input value, which we then print.

The Pool object provides several other methods for executing functions asynchronously across multiple input values, including:

·       apply: Applies a function to a single input value and blocks until the result is available.

·       imap: Like map, but returns an iterator over the results as they become available.

·       imap_unordered: Like imap, but does not guarantee the order of the results.

Here's an example using imap:

python code

import multiprocessing
import time
 
def slow_square(x):
    time.sleep(1)
    return x * x
 
if __name__ == '__main__':
    pool = multiprocessing.Pool()
    result_iterator = pool.imap(slow_square, [1, 2, 3, 4, 5])
    for result in result_iterator:
        print(result)

In this example, we define a slow_square function that takes a single argument x, sleeps for one second to simulate a slow computation, and then returns the square of x. We create a multiprocessing.Pool object as before, and then call the imap method with the slow_square function and a list of input values. This returns an iterator over the results as they become available, which we then loop over and print. Note that the results are printed in order, even though the computations are slow, because imap guarantees the order of the results.

Multiprocessing.Array

multiprocessing.Array is a class in the multiprocessing module that can be used to create shared arrays between multiple processes. Shared arrays allow multiple processes to access and modify the same array data without the need for copying data between processes.

Here's an example of using multiprocessing.Array to create a shared array:

python code

import multiprocessing
 
def worker(arr):
    for i in range(len(arr)):
        arr[i] *= 2
 
if __name__ == '__main__':
    arr = multiprocessing.Array('i', [1, 2, 3, 4, 5])
    print(f'Before: {arr[:]}')
    p = multiprocessing.Process(target=worker, args=(arr,))
    p.start()
    p.join()
    print(f'After: {arr[:]}')

In this example, we create a shared array of type 'i' (integer) with the initial values [1, 2, 3, 4, 5]. We define a worker function that takes an array arr, and multiplies each element of the array by 2. We start a new process running the worker function with the shared array as an argument. After the process has finished, we print the contents of the shared array, which should now be [2, 4, 6, 8, 10].

The first argument to multiprocessing.Array is a string that specifies the type of the array elements. The following type codes are supported:

·       'b': signed byte

·       'B': unsigned byte

·       'h': signed short

·       'H': unsigned short

·       'i': signed integer

·       'I': unsigned integer

·       'l': signed long

·       'L': unsigned long

·       'f': float

·       'd': double

You can also specify the size of the array as a second argument. If you don't specify a size, the array will be created with a single element. You can access and modify elements of the shared array using standard Python indexing and slicing notation. Note that because the array is shared, you should always use a lock or other synchronization mechanism to avoid race conditions when modifying the array from multiple processes.

Multiprocessing pipe

A multiprocessing.Pipe is a method for creating a two-way communication channel between two processes in Python's multiprocessing module. The two processes can be running on the same machine or different machines.

Here's an example of using multiprocessing.Pipe to communicate between two processes:

python code

import multiprocessing
 
def sender(conn):
    conn.send('hello')
    conn.close()
 
def receiver(conn):
    message = conn.recv()
    conn.close()
    print(message)
 
if __name__ == '__main__':
    parent_conn, child_conn = multiprocessing.Pipe()
    p1 = multiprocessing.Process(target=sender, args=(child_conn,))
    p2 = multiprocessing.Process(target=receiver, args=(parent_conn,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

In this example, we define a sender function that takes a conn object representing one end of a pipe. The sender process sends the string 'hello' through the pipe and then closes the connection. We also define a receiver function that takes a conn object representing the other end of the pipe. The receiver process waits for a message to arrive on the pipe, reads the message using the recv method, and then prints the message. The connection is then closed.

In the main block, we create a pair of multiprocessing.Pipe objects, one for each end of the pipe. We create two new processes running the sender and receiver functions, passing in the appropriate conn object for each process. We then start the two processes and wait for them to finish using the join method.

When the sender process sends the message through the pipe, it will be received by the receiver process, which will then print the message 'hello'. Note that the order in which the processes execute is not guaranteed, so you may see the output in a different order each time you run the program.

 

What is multiprocessing in python? Why is it useful?

Multiprocessing is a Python module that enables the creation of processes that can run in parallel on a computer's CPU. By allowing multiple processes to run concurrently, multiprocessing can improve the performance of CPU-bound tasks in Python programs. Multiprocessing is useful for speeding up calculations, processing large amounts of data, and performing other computationally intensive tasks.

 

 

What are the differences between multiprocessing and multithreading?

Comparing multiprocessing and multithreading in Python:

 

Multiprocessing

Multithreading

Definition

Multiple processes running concurrently

Multiple threads running concurrently

Memory

Each process has its own memory space

All threads share the same memory space

Communication

Processes communicate via IPC mechanisms

Threads communicate via shared memory and locks

Overhead

Higher overhead due to process creation

Lower overhead due to thread creation

Scalability

Good scalability on multi-core CPUs

Limited scalability on multi-core CPUs

Resource use

Processes use more system resources (CPU, memory)

Threads use less system resources

Error handling

Processes are isolated, so one process crashing does not affect others

Threads can affect each other if not properly synchronized

Overall, multiprocessing is better suited for CPU-bound tasks where parallelism is needed, while multithreading is better suited for I/O-bound tasks where parallelism is needed. Multiprocessing can also offer better scalability on multi-core CPUs, but comes with higher overhead due to process creation and more system resource usage. Multithreading, on the other hand, has lower overhead and uses less system resources, but has limited scalability on multi-core CPUs and can be more error-prone due to shared memory and lock synchronization issues.

 

Write a python code to create a process using the multiprocessing module.

Python code that creates a process using the multiprocessing module:

python code

import multiprocessing
 
def my_function():
    print("Hello from a child process!")
 
if __name__ == '__main__':
    p = multiprocessing.Process(target=my_function)
    p.start()
    p.join()
    print("The child process has completed.")

In this example, we first define a function my_function that will be run by the child process. The function simply prints a message to the console.

We then use the multiprocessing.Process class to create a new process, passing in the target argument to specify the function to run in the child process. We start the child process using the start() method and wait for it to complete using the join() method. Finally, we print a message to indicate that the child process has completed.

When you run this code, you should see the message "Hello from a child process!" printed to the console followed by the message "The child process has completed." The order of these messages may vary since the parent and child processes are running concurrently.

 

Here's another example that demonstrates passing arguments to a child process using the multiprocessing module:

python code
import multiprocessing
 
def my_function(name):
    print(f"Hello from {name}!")
 
if __name__ == '__main__':
    p = multiprocessing.Process(target=my_function, args=('Alice',))
    p.start()
    p.join()
    print("The child process has completed.")

In this example, we define a function my_function that takes a single argument name. The function simply prints a message to the console using the provided name.

We then create a new process using the multiprocessing.Process class, passing in the target argument to specify the function to run in the child process and the args argument to specify the arguments to pass to the function. In this case, we pass the argument 'Alice' to the child process.

We start the child process using the start() method and wait for it to complete using the join() method. Finally, we print a message to indicate that the child process has completed.

When you run this code, you should see the message "Hello from Alice!" printed to the console followed by the message "The child process has completed."

 

What is a multiprocessing pool in python? Why is it used?

In Python, a multiprocessing pool is a way to execute a function or callable object using a pool of worker processes. The pool manages a set of worker processes, and tasks are assigned to the available workers as they become available.

The multiprocessing module provides the Pool class that can be used to create a pool of worker processes. The Pool class has a method called map that allows you to apply a function to a sequence of arguments in parallel.

Here's an example that demonstrates the use of a multiprocessing pool:

python code

import multiprocessing
 
def square(x):
    return x**2
 
if __name__ == '__main__':
    # Create a pool with 4 worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Apply the square function to the numbers 0 to 9 in parallel
        results = pool.map(square, range(10))
        print(results)

In this example, we define a function called square that returns the square of its argument. We then create a Pool object with 4 worker processes using a context manager. We use the map method of the Pool object to apply the square function to the numbers 0 to 9 in parallel. The map method returns a list of results, which we print to the console.

The output of this code should be:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The multiprocessing pool is useful because it allows you to parallelize the execution of a function, which can lead to significant performance improvements on multi-core systems. By distributing the work across multiple processes, you can take advantage of all available CPU resources and reduce the overall processing time.

 

How can we create a pool of worker processes in python using the multiprocessing module?

In Python, you can create a pool of worker processes using the multiprocessing.Pool class. Here's an example of how to create a pool of worker processes:

python code

import multiprocessing
 
def my_function(x):
    # Do some work here
    return x*x
 
if __name__ == '__main__':
    # Create a pool of 4 worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Apply the function to a sequence of inputs in parallel
        results = pool.map(my_function, range(10))
        print(results)

In this example, we define a function called my_function that takes a single argument x and returns its square. We then create a pool of 4 worker processes using the multiprocessing.Pool class. We use the map method of the Pool object to apply the my_function function to a sequence of inputs (the numbers 0 to 9 in this case) in parallel. The map method returns a list of results, which we print to the console.

When you run this code, you should see the following output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The Pool class automatically manages the worker processes, distributing the work across them as needed. Once all the work has been completed, the pool closes and the worker processes are terminated. Using a pool of worker processes can be an effective way to parallelize computations and take advantage of multi-core systems.

 

Write a python program to create 4 processes, each process should print a different number using the multiprocessing module in python.

python code

import multiprocessing
 
def print_number(num):
    print(f"Process {num}: {num}")
 
if __name__ == '__main__':
    # Create a list of numbers
    numbers = [1, 2, 3, 4]
    
    # Create a process for each number in the list
    processes = [multiprocessing.Process(target=print_number, args=(num,)) for num in numbers]
    
    # Start each process
    for process in processes:
        process.start()
    
    # Wait for each process to finish
    for process in processes:
        process.join()

In this example, we define a function called print_number that takes a single argument num and prints it along with the process number. We then create a list of numbers and a list of processes, where each process is created using the multiprocessing.Process class and targets the print_number function with a different number as its argument. We start each process using the start method, and then wait for each process to finish using the join method.

When you run this code, you should see output similar to the following:

Process 1: 1
Process 2: 2
Process 3: 3
Process 4: 4

Each process should print a different number, and the order in which they are printed may vary depending on the timing and scheduling of the operating system.

 

Next Post Previous Post
No Comment
Add Comment
comment url