Seattle based Puget Systems has entered the AI world and the company is demonstrating at SIGGRAPH 2023 in Los Angeles this week its specialized AI Training and Inference server. At the company’s booth, #630, in the LA Convention Center, the team from Puget Systems will demonstrate its new specialized AI Training and Inference server, configured with four NVIDIA RTX 6000 Ada graphics cards to handle intensive generative AI and machine learning and to effectively manage real-time rendering, graphics, AR/MR/VR/XR, compute, and deep learning processing.
Here is some more information shared by the company regarding the new solution:
The Puget Systems AI Training and Inference server is a rackmount workstation capable of hosting a web-based chat server using STOA models such as the Meta-Llama-2-70b large language models (LLMs) supporting multiple simultaneous users. Puget Systems Labs conducted extensive testing of this configuration with Llama-2-70b and Falcon-40b. (Falcon-40b requires less memory space and can run with only two RTX 6000 Ada GPUs.) In addition to running a chat interface, this hardware is also suitable for base model fine-tuning within the available gpu memory limits.
Puget Labs testing Processes and Results
The Puget Systems Lab team conducted extensive testing of the new AI Training and Inference server, utilizing a full set of four NVIDIA RTX 6000 Ada graphics cards. Labs tested the system with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI. The test model used approximately 130GB of video memory (VRAM), and the Labs confirmed that the system should work well with other LLMs that fit within available GPU memory (192GB with four cards installed).
Following are some notable performance stats from the testing:
Typical usage measured response:
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts (20-30 users) over 5 minutes
Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms
Follow the link for more information on Puget Systems AI Training and Inference server.
Puget Systems custom AI Training and Inference servers will be available for configuration for a wide range of generative AI applications beginning in the coming weeks. To learn more or to join the waitlist, please visit here. Follow the link to learn more about Puget Systems Canadian consulting and sales operations.