Please note that the forum is in English only. Please log in using the link in the top menu to add new forum topics or comments.

Running Dask

Hello, I'm trying to run a Python/Dask application (with a skein driver) on the cluster using Dask-Yarn, but every time I try to deploy it, I can't get past the initial phase where it crashes indefinitely on the constructor. From the logs I can see that the scheduler and workers are created without throwing exceptions (so it's not a problem with deploying the Conda environment) and it's classified as RUNNING, and I've called kinit before submitting anything, but I never get a monitoring URL (which I should have with skein), and it stops doing anything. Has anyone tried using Dask on the cluster? Are you aware of any configuration issues when using it, like on using specific ports?

Tags

Comments

Hi,

if you look at the YARN logs you see this message:

21/03/22 17:15:41 WARN skein.ApplicationMaster: Application master shutdown by external signal. This usually means that the application was killed by a user/administrator, or that the application master memory limit was exceeded. See the diagnostics for more information.

Checking the diagnostics can be done using the YARN UI. There you see: "Application application_1611572280718_137915 was killed by user mep_jobcontrol at 192.168.113.74", so it looks like this application was killed using the jobcontrol UI?

I also checked application_1611572280718_138433: this jobs has a Dask scheduler running, but no Dask workers.

We also use Skein to start Dask workers on YARN (e.g. to run Airflow workers, etc ...) so this should work. In our setup the Dask scheduler runs on a dedicated host, while in your setup it is also running on one of the Hadoop worker nodes.

Regards, Dirk

Hi, thank you for your reply. My problem is that my code hangs in the YarnCluster constructor, and after a while I am forced to kill it (sometimes via SSH, other times through the UI)

I tried a simple example and it still hangs:

from dask_yarn import YarnCluster

print("Connecting to the cluster")

with YarnCluster(environment="test.tar.gz") as cluster: # Hangs here, test.tar.gz has only dask_yarn and its dependencies<br>

    client = Client(cluster)

    print("Connection to the cluster enstablished")

May I ask if you have set anything in particular compared to the default parameters?

Hi, would it be possible to share your script with us on your PROBA-V MEP public folder? This way it's easier for us to help out.

Thanks, Dirk

Sure, thank you, I have moved my files to my Public folder, the test script is called test.py.

I also tried running a test with Skein's echo-server example, but I have problems communicating with my running server (I get the exception "ConnectionError: Unable to connect to application" when I call client.py). I suspect it has to do with the network settings, did you have to configure any particular network settings?

Hi, I copied a snippet of how we successfully use Skein to start Airflow workers using Dask in my public folder, see /data/users/Public/daemsd/skein-snippet/snippet.py. Maybe you can try to convert your script to align with this sample?

Regards, Dirk

Thank you very much for your snipped, it will try to use it to see what is the problem with my configuration. Do you have any special network settings?

Restez à jour
Inscrivez-vous
Des Nouvelles perspectives

Restez à jour