You can find the remote code in the huggingface repo.
Ahh, interesting.
I mean, it’s published by a fairly reputable organization so the chances of a problem are fairly low but I’m not sure there’s any guarantee that the compiled Python in the pickle matches the source files there. I wrote my own pickle interpreter a while back and it’s an insane file format. I think it would be nearly impossible to verify something like that. Loading a pickle file with the safety stuff disabled is basically the same as running a .pyc
file: it can do anything a Python script can.
So I think my caution still applies.
It could also be PyTorch or one of the huggingface libraries, since mps support is still very beta.
From their description here: https://github.com/QwenLM/Qwen-7B/blob/main/tech_memo.md#model
It doesn’t seem like anything super crazy is going on. I doubt the issue would be in Transformers or PyTorch.
I’m not completely sure what you mean by “MPS”.
Ah, I see. Wouldn’t it be pretty easy to determine if MPS is actually the issue by trying to run the model with the non-MPS PyTorch version? Since it’s a 7B model, CPU inference should be reasonably fast. If you still get the memory leak, then you’ll know it’s not MPS at fault.