The onsite system diagram outlines the resources Vultron will deploy within the customer's infrastructure and how those resources will utilize the customer's existing software and hardware. All hardware will be purchased and owned by the customer.
.png)
Customer Intranet
- A private DNS resolver allows laptops to use a VPN or office Wi-Fi to access
internal.vultron.ai and connect to the API server's private IP address to serve the web application.
- Okta is tunneled to be reachable by the API server.
- The API and GPU machine are on the same LAN/VLAN with bi-directional network bandwidth exceeding 1 GB/s.
API Machine
- Vultron serves the web application via TLS on port 443 of the API machine.
- Docker containers are used for all Vultron micro-services, which utilize the file system to persist data.
- The system specifications include 32 cores, 64GB RAM, and 1TB NVMe storage.
GPU Machine
- Dockerized Llama 3.1 70B instruct model and E5 embedding model utilize vLLM for tensor parallelism while interfacing with the GPUs/CUDA.
- Two H100 GPUs are NVLinked and accessible by the OS (GPU passthrough is used if Vultron is run within a virtual machine on the GPU machine).
- The system is equipped with 2x AMD EPYC 9334 processors, 128GB DDR5 RAM, and 1TB NVMe storage.