Skip to content

Problems caused by launching multiple pods at the same time #25

@rv64m

Description

@rv64m

Why do I get an error when I start multiple GPU-resource pods simultaneously (concurrently) using vcuda?

In vcuda loader.c, I add ferror to print errno related error message, I get it

image

But when I start the pods sequentially, I don't have this problem. So I guess it may be caused by a gap between the kubelet startup container and the gpu-manager placing the libcuda.so file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions