Major Home Improvement Retailer
Enabling LLMs With an Effective Fine-Tuning System
What if... our client could confidently invest in the best infrastructure for fine-tuning large language models – and build the internal expertise to drive long-term GenAI success?
Our client is a Fortune 50 home improvement retailer with a nationwide presence and a commitment to technology innovation. As part of its enterprise GenAI strategy, the client sought to evaluate the performance of multiple on-premises hardware platforms for large-scale LLM fine-tuning, while also upskilling its technical teams to support ongoing AI initiatives.
Laying the Foundation for Scalable GenAI Success by Fine-Tuning LLM Infrastructure and Talent
Our client wanted to evaluate three leading on-premises vendor hardware platforms for large-scale GenAI hosting. Specifically, it wanted to compare the platforms’ abilities to optimize the fine-tuning process for large language models (LLMs). The client also needed to upskill its infrastructure and machine learning (ML) teams in fine-tuning LLMs.
Ultimately, the retailer sought to make an informed decision about its infrastructure investment and resource allocation and enhance its ML capabilities. Achieving these goals would ensure it could deploy and maintain LLMs effectively to drive business value, competitiveness, and align with its overall GenAI strategy.
Sizing-Up LLM Fine-Tuning Hardware Vendors
To identify the hardware solution that would most effectively help achieve its goals, our client determined several criteria for assessing the three vendor platforms’ performance: time to converge, resource utilization (GPU, power, and network usage), and overall efficiency in fine-tuning LLMs.
We created fine-tuning training scripts to assess the vendors, iteratively executing them, and improving their performance to run on the vendor platforms. The time it took to perform the fine-tuning runs, the utilization of the GPUs, the power consumption, and other stats provided the major home improvement retailer the data it needed to determine which vendor platform would best support its needs.
Our scripts are a comprehensive end-to-end system for fine-tuning LLMs, designed to optimize performance in a multi-node, multi-GPU environment. This scalable architecture supports the training of LLMs with substantial memory requirements, while also accommodating a range of fine-tuning tasks.
Perficient developed a comprehensive end-to-end system for fine-tuning LLMs, designed to optimize performance in a multi-node, multi-GPU environment.
Results
Laying a Strong Foundation for Future GenAI Projects
The vendor assessment allowed technical leadership to make an informed decision about its infrastructure investment and resource allocation, and the client was able to confidently select a vendor to host its GenAI.
During the assessment, we collaborated with our client’s technical resources to help build their knowledge, demonstrating how we set up, performed, monitored, measured, and modified the fine-tuning training runs.
The retailer also gained a comprehensive end-to-end fine-tuning pipeline for future LLM projects. This, along with its upskilled resources, will have a lasting impact on the company’s ability to effectively fine-tune LLMs, supporting its GenAI strategy and driving business value and competitiveness.