iph.ar
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
        • 1.1 - HPC landscape overview
        • 1.2 - Introduction to the multiprocessing module
        • 1.3 - Using Process, Queue, Pipe, and Pool
        • 1.4 - Introduction to concurrent.futures
        • 1.5 - Examples of CPU-bound tasks with performance profiling
        • 1.6 - Comparison with sequential execution
        • 1.7 - Synchronization and locking considerations
        • 1.8 - Profiling CPU-bound vs I/O-bound tasks
        • Solution: Parallel prime number calculation
        • Solution: Signal filtering comparison
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
        • 1.1 - HPC landscape overview
        • 1.2 - Introduction to the multiprocessing module
        • 1.3 - Using Process, Queue, Pipe, and Pool
        • 1.4 - Introduction to concurrent.futures
        • 1.5 - Examples of CPU-bound tasks with performance profiling
        • 1.6 - Comparison with sequential execution
        • 1.7 - Synchronization and locking considerations
        • 1.8 - Profiling CPU-bound vs I/O-bound tasks
        • Solution: Parallel prime number calculation
        • Solution: Signal filtering comparison
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT

1.3 - Using `Process`, `Queue`, `Pipe`, and `Pool`

September 2025

Key Concept

This section explores various tools in Python's `multiprocessing` module that enable concurrent execution of code. These tools allow you to leverage multiple CPU cores to speed up computationally intensive tasks, improve responsiveness, and parallelize operations. We'll delve into the purpose and usage of `Process`, `Queue`, `Pipe`, and `Pool`, highlighting their strengths and weaknesses.

Topics

Process: Create and manage independent processes for parallel tasks.

The `Process` class is the fundamental building block for creating independent processes within a Python program. Each process has its own memory space, isolated from other processes. This isolation is crucial for preventing crashes in one process from affecting others. To use `Process`, you define a function that will be executed in the new process, then create a `Process` object, passing the function and any necessary arguments. You then start the process using the `start()` method. You can retrieve the return value of the function using the `get()` method of the `Process` object. `Process` is ideal for tasks that are computationally intensive and can benefit from running on multiple cores without needing to share data directly. However, inter-process communication (IPC) can be more complex compared to other methods.

Queue: Facilitate inter-process communication by providing a buffer for data exchange.

The `Queue` class provides a thread-safe (and process-safe) data structure for exchanging information between processes. It acts as a buffer, allowing one process to put data into the queue and another process to retrieve it. This is a common pattern for passing results or intermediate data between processes. You can create a `Queue` object and then use the `put()` method to add data to the queue and the `get()` method to retrieve data. The `Queue` class handles synchronization, ensuring that data is safely transferred between processes. `Queue` is particularly useful when you need to collect results from multiple processes or pass data between them without worrying about race conditions. It's a robust and reliable mechanism for inter-process communication.

Pipe: Establish a unidirectional communication channel between two processes.

The `Pipe` class offers a simple and efficient way to establish a bidirectional communication channel between two processes. It creates a pair of file descriptors, one for writing and one for reading. One process writes to the writing end of the pipe, and the other process reads from the reading end. `Pipe` is typically used for simple, one-way communication between two processes. It's often faster than using a `Queue` for this purpose, as it avoids the overhead of queue synchronization. However, `Pipe` is limited to communication between exactly two processes. It's a good choice for scenarios where you need a direct, high-speed connection between two processes.

Pool: Manage a collection of worker processes, simplifying task distribution and result collection.

The `Pool` class provides a convenient way to distribute tasks across a pool of worker processes. You create a `Pool` object, specifying the number of worker processes. Then, you can use the `map()` method to apply a function to a sequence of inputs, distributing the work across the worker processes. The `Pool` class automatically handles the distribution of tasks, the collection of results, and the management of worker processes. This simplifies the process of parallelizing tasks and can significantly improve performance. The `Pool` class is particularly useful for tasks that can be broken down into independent subtasks. It's a high-level abstraction that hides the complexities of process management. The `Pool` class also offers other methods like `apply_async` for asynchronous task submission and `close` and `join` for proper resource management.

Exercise

Consider a scenario where you need to perform a computationally intensive task on multiple datasets. How would you leverage `Process` and `Queue` to parallelize the processing? (5 min)

Answer: Suppose you have 10 CSV datasets, each with millions of rows of sensor readings. You want to calculate the average temperature per file.

  • You create one Process per dataset.
    • Each process:
      • Reads its file.
        • Computes the average.
          • Puts the result into a shared Queue.
          • The main process collects results from the Queue and merges them.

            This way, CPU cores work in parallel, and the Queue ensures safe communication.

            Parallel CSV Dataset Example

            Here each dataset is “processed” in parallel, results collected safely in the Queue.

            from multiprocessing import Process, Queue
            import numpy as np
            
            def avg_file(data, q, name):
                avg = np.mean(data)  # pretend this is a huge CSV
                q.put((name, avg))
            
            if __name__ == "__main__":
                datasets = {
                    "file1": np.random.rand(1_000_000),
                    "file2": np.random.rand(1_000_000),
                    "file3": np.random.rand(1_000_000),
                }
                q = Queue()
                procs = []
            
                for name, data in datasets.items():
                    p = Process(target=avg_file, args=(data, q, name))
                    p.start()
                    procs.append(p)
            
                results = [q.get() for _ in range(len(datasets))]
                for p in procs:
                    p.join()
            
                print("Averages:", dict(results))
            

            💡 Common Pitfalls

            * Incorrectly managing shared resources (e.g., race conditions) when using `Process`.

            💡 Best Practices

            * Use `Pool` for managing a large number of worker processes to avoid overhead of individual process creation.

            ⟵ 1.2 - Introduction to the `multiprocessing` module 1.4 - Introduction to `concurrent.futures` ⟶
            • © 2025 iph.ar