Recently, I am trying to build an efficient distributed data processing framework for deep learning. And I think the PUSH / PULL message pattern provided by ZMQ could be a very useful tool in my case.
But it seems that few details on how the messages will be dispatched could be found. All I know is that PUSH works in a round robin way. But what if some of the connections to the PUSH sockes are blocked? And When will the PUSH sockets be blocked?
I need to do some experiments.
Here is my experimental code in github gist: https://gist.github.com/chkap/ef52145aa9e2c35aed4428862c2fb3fb
All I did in the above experiments is that:
And the result is like this:
8 -> 10: [8, 18, 28, 38, 48, 58, 68, 78, 88, 98]
6 -> 10: [4, 14, 24, 32, 42, 52, 62, 72, 82, 92]
3 -> 10: [3, 13, 23, 33, 43, 53, 63, 73, 83, 93]
1 -> 10: [2, 12, 22, 34, 44, 54, 64, 74, 84, 94]
7 -> 10: [7, 17, 27, 37, 47, 57, 67, 77, 87, 97]
2 -> 10: [1, 11, 21, 31, 41, 51, 61, 71, 81, 91]
9 -> 10: [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]
5 -> 10: [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
4 -> 10: [6, 16, 26, 36, 46, 56, 66, 76, 86, 96]
0 -> 10: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
total received: 100
The PUSH works indeed in round robin way, even though the worker 0 has slept for 10s. This is not what I expected.
The hwm is set to 1, but the message is still sent to worker 0. I guess the reason is the system-level buffer for sockets.
To prove it, I increase the message load from 1-len string to 1e7-len string. So that, the socket buffer can be easily fullfilled.
Guess the result:
8 -> 10: [8, 18, 28, 38, 48, 57, 66, 75, 84, 94]
9 -> 10: [7, 17, 27, 37, 47, 56, 65, 74, 83, 93]
2 -> 10: [9, 19, 29, 39, 44, 53, 62, 71, 80, 90]
1 -> 11: [1, 11, 21, 31, 41, 50, 59, 68, 77, 87, 97]
6 -> 11: [2, 12, 22, 32, 42, 51, 60, 69, 78, 88, 98]
4 -> 10: [6, 16, 26, 36, 46, 55, 64, 73, 82, 92]
7 -> 10: [5, 15, 25, 35, 45, 54, 63, 72, 81, 91]
0 -> 6: [4, 14, 24, 34, 85, 95]
3 -> 11: [0, 10, 20, 30, 40, 49, 58, 67, 76, 86, 96]
5 -> 11: [3, 13, 23, 33, 43, 52, 61, 70, 79, 89, 99]
total received: 100
The result just proves it.
Finally, the behaviour of PUSH socket in zmq could be summarized: