In Python, process-based parallelism means running different parts of your program at the same time in separate processes. Each process runs independently with its own memory, like workers in different rooms handling tasks without interfering with one another.
Python’s built-in multiprocessing
module makes it simple to create and manage these separate processes. This lets you run tasks side by side instead of one after the other.
What Is a Process?
A process is an independent program running on your computer. Each process has its own memory space and does not share data directly with others. Think of it as several workers in separate rooms, each focused on their own job. This separation keeps processes safe and stable.
Why Use Processes in Python?
Running tasks one after another can take a long time. Using multiple processes lets your program perform many tasks at once, known as process-based parallelism. Because each process has its own memory and resources, Python processes run truly in parallel.
When Does Process-Based Parallelism Come in Handy?
Processes are useful when your program needs to handle heavy or multiple independent tasks, like processing images, doing calculations, or managing network requests. Instead of waiting for one task to finish before starting another, processes let you run many tasks simultaneously, improving speed and efficiency.
The multiprocessing
module provides easy tools to create and run processes. In this article, we’ll focus on how to run functions in parallel using multiprocessing
so you can make the most of your CPU cores.
Importing the multiprocessing Module
Before you can run code in parallel, you need to bring in Python’s built-in multiprocessing
module. It gives you all the tools to create and manage processes easily.
Here’s how to import it:
import multiprocessing
With that one line, you’re ready to start building parallel programs.
Creating a Simple Target Function
To run code in a separate process, you need to give it a function — this is called the target function. It’s the task the new process will do.
Here’s a simple and friendly example:
def say_hello(name):
print(f"Hello from {name}!")
This function just prints a greeting with a name. In the next step, we’ll run it in a separate process.
Starting a Single Process
Now that we’ve written a simple function to run, it’s time to start a new process that will execute it. To do this in Python, we use the multiprocessing.Process()
class. You pass in the function you want to run (called the “target”) along with any arguments it needs.
Here’s how it works in action:
if __name__ == "__main__":
p = multiprocessing.Process(target=say_hello, args=("Edward",))
p.start()
p.join()
This code first creates a new process p
, telling it to run the say_hello
function with "Edward"
as the argument. The start()
method begins the process, launching it separately from the main program. The join()
method is called right after to pause the main program until that process finishes. Without join()
, the main program might exit before the child process gets a chance to finish its task. Using both together ensures your program runs cleanly and in order.
Why use if __name__ == "__main__"
?
When you use multiprocessing, Python creates new processes by re-running the whole script. If you don’t put the process creation code inside if __name__ == "__main__":
, the new processes will try to create more processes again, causing an endless loop. This check makes sure that the process-starting code only runs when you start the script directly, preventing new processes from spawning more processes. This keeps your program safe and working correctly.
Running Multiple Processes
Running just one process is useful, but the real power of parallelism comes when you run many processes at the same time. You can easily do this by creating a loop that starts several processes, each with its own input.
In this example, we have a list of names. For each name, we create a process that runs the say_hello
function. After starting each one, we keep track of it in a list. Once all the processes are started, we go back and call join()
on each one to wait for them to finish.
import multiprocessing
def say_hello(name):
print(f"Hello from {name}!")
if __name__ == "__main__":
names = ["Lucia", "Cherish", "Stephen", "Mary"]
processes = []
for name in names:
p = multiprocessing.Process(target=say_hello, args=(name,))
p.start()
processes.append(p)
for p in processes:
p.join()
All of these processes run independently and at the same time. This means each say_hello
call runs in parallel, making your program more dynamic and efficient when doing many tasks at once.
Passing More Data to Processes
Functions become more interesting when they work with real data. Instead of just printing a name, let’s write a function that takes a number, squares it, and prints the result. To make it feel more like real work, we’ll add a small delay using time.sleep()
.
Here’s the updated function:
import time
def square_number(num):
print(f"Squaring {num}")
time.sleep(1)
print(f"{num} squared is {num * num}")
Now let’s run this function in multiple processes, each handling a different number from a list. Each process will work on squaring a number at the same time.
if __name__ == "__main__":
numbers = [1, 2, 3, 4]
processes = []
for num in numbers:
p = multiprocessing.Process(target=square_number, args=(num,))
p.start()
processes.append(p)
for p in processes:
p.join()
Each process calls the square_number
function with a different number. Because they run in parallel, you’ll see the messages interleave a bit depending on which process gets scheduled first. This is a simple and fun way to see parallelism in action while working with different kinds of input.
Getting Return Values from Processes
When you run a function in a separate process using multiprocessing.Process
, it doesn’t return a value the normal way. That’s because each process runs in its own memory space, separate from the main program. But if you want to get a result back, you can pass in a special object like a Queue
.
A multiprocessing.Queue
lets the child process send data back to the main process. Here’s a simple example where a number is doubled and sent back through the queue:
import multiprocessing
def double_number(n, q):
q.put(n * 2)
if __name__ == "__main__":
q = multiprocessing.Queue()
p = multiprocessing.Process(target=double_number, args=(10, q))
p.start()
p.join()
result = q.get()
print(f"Doubled number: {result}")
In this case, the function double_number
calculates n * 2
and puts it into the queue. The main program then reads the result using q.get()
. This is a clean and easy way to collect output from parallel processes.
Spawning vs Forking
When you create a new process in Python, the way it’s started depends on the operating system. By default, Unix-like systems use forking, while Windows uses spawning. Forking copies the parent process, while spawning starts fresh and imports everything again.
To control this yourself, Python’s multiprocessing
module provides a way to choose the method using get_context()
.
Here’s how to explicitly use the “spawn” method:
import multiprocessing
def say_hello(name):
print(f"Hello from {name}!")
if __name__ == '__main__':
ctx = multiprocessing.get_context("spawn")
p = ctx.Process(target=say_hello, args=("Samantha",))
p.start()
p.join()
This code gets a context that uses spawning, then creates and runs a process from that context. It works the same way as before, but now you’ve clearly set how the process should start. This is especially helpful when writing cross-platform code.
Organizing Code with main()
When working with multiprocessing, it’s important to keep your code clean and well-organized. A common and simple way to do this is by putting your main logic inside a main()
function. This helps avoid mistakes, like accidentally running code multiple times when new processes start.
Here’s an example of how to do this:
import multiprocessing
def greet(name):
print(f"Greetings from {name}!")
def main():
p = multiprocessing.Process(target=greet, args=("George",))
p.start()
p.join()
if __name__ == "__main__":
main()
By wrapping the process code inside main()
and calling it only when the script runs directly, you make sure that child processes don’t re-run the whole script. This keeps your multiprocessing code safe and easy to understand.
Conclusion
In this article, you learned how to run tasks in parallel using Python’s multiprocessing.Process
. You now know how to create and start processes, pass data to them, collect results, and organize your code cleanly. This is the foundation of process-based parallelism in Python.
Try using these ideas with your own functions to split work across multiple CPU cores. It’s a simple way to make your programs do more at the same time and explore the power of parallel processing.