Local Queue Initialization Failed¶
Issue Description¶
When creating a Notebook, training task, or inference service, if the queue is being used for the first time in that namespace, you will be prompted to initialize the queue with a single click. However, the initialization may fail.
Issue Analysis¶
In intelligent computing, queue management capabilities are provided by Kueue. Kueue offers two types of queue management resources: ClusterQueue and LocalQueue.
- ClusterQueue: This is a cluster-level queue mainly used to manage resource quotas within the queue, including CPU, memory, GPU, etc.
- LocalQueue: This is a namespace-level queue that needs to point to a ClusterQueue for resource allocation within the queue.
In intelligent computing, if a specified Namespace does not have a LocalQueue when creating a service, you will be prompted to initialize the queue.
In rare cases, the LocalQueue initialization might fail due to special circumstances.
Solution¶
Check if Kueue is running properly. If the kueue-controller-manager is not running, you can check its status with the following command:
If the kueue-controller-manager is not running properly, please fix Kueue first.