Roasting your own coffee is really easy and all you need is an oven, a cookie sheet and some green coffee beans

Kerfuffle@sh.itjust.works · 1 year ago

Ah, I see. Wouldn’t it be pretty easy to determine if MPS is actually the issue by trying to run the model with the non-MPS PyTorch version? Since it’s a 7B model, CPU inference should be reasonably fast. If you still get the memory leak, then you’ll know it’s not MPS at fault.

Kerfuffle@sh.itjust.works · 1 year ago

You can find the remote code in the huggingface repo.

Ahh, interesting.

I mean, it’s published by a fairly reputable organization so the chances of a problem are fairly low but I’m not sure there’s any guarantee that the compiled Python in the pickle matches the source files there. I wrote my own pickle interpreter a while back and it’s an insane file format. I think it would be nearly impossible to verify something like that. Loading a pickle file with the safety stuff disabled is basically the same as running a .pyc file: it can do anything a Python script can.

So I think my caution still applies.

It could also be PyTorch or one of the huggingface libraries, since mps support is still very beta.

From their description here: https://github.com/QwenLM/Qwen-7B/blob/main/tech_memo.md#model

It doesn’t seem like anything super crazy is going on. I doubt the issue would be in Transformers or PyTorch.

I’m not completely sure what you mean by “MPS”.

Kerfuffle@sh.itjust.works · edit-2 1 year ago

Another one that made a good impression on me is Qwen-7B-Chat

Bit off-topic but if I’m looking at this correctly, it uses a custom architecture which requires turning on trust_remote_code and the code that would be embedded into the models and trusted is not included in the repo. In fact, there’s no real code in the repo: it’s the just a bit of boilerplate to run inference and tests. If so, that’s kind of spooky and I suggest being careful not to run inference on those models outside of a locked down environment like a container.

Kerfuffle@sh.itjust.works · 1 year ago

Roasting your own coffee is really easy and all you need is an oven, a cookie sheet and some green coffee beans

Kerfuffle@sh.itjust.works · 1 year ago

Are you using a distro with fairly recent packages? If not, then possibly you could try looking for supplementary sources that could provide more recent version. Just as an example, someone else mentioned having a similar issue on Debian. Debian tends to be very conservative about updating their packages and they may be quite outdated. (It’s possible to be on the other side of the problem, with fast moving distros like Arch but they also tend to fix stuff pretty fast as well.)

Possibly worth considering that hardware can also cause random crashes, faulty RAM, overheating GPUs, CPUs, memory or overclocking stuff beyond its limits. Try checking sensors to make sure temperatures are in a reasonable range, etc.

You can also try to determine if the times it crashes have anything in common or anything unusual is happening. I.E. playing graphics intensive games, hardware video decoding, that kind of thing. Some distros have out of memory process killers set up that have been known to be too aggressive, and processes like the WM that can control a lot of memory will sometimes be a juicy target for them.

As you probably already know if you’ve been using Linux for a while, diagnosing problems is usually a process of elimination. So you need to eliminate as many other possibilities as you can. Also, it’s general hard for people to help you with such limited information. We don’t know the specific CPU, GPU, distribution, versions of software, what you were doing when it occurred, anything like that. So we can’t eliminate many possibilities to give you more specific help. More information is almost always better when asking for technical help on the internet.