In this section we will see how easy it is to run the computaion of Renoir in parallel and distribute it across multiple machines.
The informations about the environment in which the computation will run are stored in a StreamContext
. The context can contain multiple streams and operators and is the object that rules the execution of all the streams within it.
Local Deployment
To run a computation on a single machine, you can create a StreamContext
with the new_local()
method. This method creates a context that will run the stream using all the available resources of the machine.
let ctx = StreamContext::new_local();
// ... Streams and operators
if you want to specify the number of threads to use, you can create a custom RuntimeConfig
easily by using the local(..)
method.
let config = RuntimeConfig::local(4).unwrap();
let ctx = StreamContext::new(config);
// ... Streams and operators
Distributed Deployment
To run a computation on multiple machines, you can create a StreamContext
with the remote(..)
method. This method takes as argument the path to a configuration file (toml) that contains the information about the cluster.
let config = RuntimeConfig::remote("path/to/config.toml").unwrap();
config.spawn_remote_workers();
let ctx = StreamContext::new(config);
// ... Streams and operators
If you want to use a distributed environment you have to spawn them using
spawn_remote_workers
before asking for some stream.
the configuration file should contain the information necessary to connect to the various machines in the cluster, for example:
# config.toml
[[host]]
address = "host1.lan"
base_port = 9500
num_cores = 16
[[host]]
address = "host2.lan"
base_port = 9500
num_cores = 24
ssh = { username = "renoir", key_file = "/home/renoir/.ssh/id_ed25519" }
And just like that your pipeline will be automatically distributed across both machines.
Available options for the
RuntimeConfig
are:
address
: a string with the address of the machinebase_port
: starting port for the communication between operators on different machinesnum_cores
: number of cores available on the machinessh
: object to store the ssh connection information
ssh_port
: port to connect to the machineusername
: username to connect to the machinepassword
: password to connect to the machinekey_file
: path to the private key filekey_passphrase
: passphrase for the private key file
Context from Arguments
To decide the environment in which the computation will run each time, you can pass the context as an argument to the program.
let (config, args) = RuntimeConfig::from_args();
let ctx = StreamContext::new(config);
// ... Streams and operators
and when you run the program you can pass the arguments to the program, specifing if you want to run the computation locally or remotely.
cargo run -- --local num_of_thread
cargo run -- --remote path/to/config.toml