NEWS & EVENT
お知らせ・イベント情報
Challenges to consider when incorporating Large Language Models (LLMs) into a system
28/12/2023
Large language models (LLMs), represented by ChatGPT, are becoming increasingly prevalent. Given their API availability, many may consider integrating them into their systems for use. However, there are several challenges to be aware of when incorporating LLMs into a system.
This article outlines these challenges and summarizes points to consider when integrating LLMs into a system.
Hallucination
Hallucination refers to the generation of hallucinatory responses or misinformation. LLMs themselves do not understand context but merely predict the next word based on the preceding and succeeding flow. Therefore, LLMs may generate responses that do not align with the context.
Moreover, in many cases, they do not respond with "I don't know." Instead, they generate some plausible response, which may contain incorrect information, potentially providing misinformation to service users.
Data Privacy and Security
LLMs are trained on large amounts of data, which may include personal or confidential information. Therefore, when using LLMs, attention must be paid to data privacy and security.
Data used by platforms like ChatGPT is considered to be publicly available online, but caution is necessary when additional training is conducted, such as through tuning. When training with internal data, precautions must be taken to ensure that the data is not output externally or beyond permissions.
Data Ownership
A frequently discussed issue with image-generating AI is copyright. If an individual's property is included in the training data, using the generated data could lead to copyright infringement.
Similarly, there are licensing issues with programming code. If open-source code is included in the training data, using the generated code could lead to license violations. Care must be taken when training with data containing licenses such as GPL.
Integration Complexity
Platforms like ChatGPT offer APIs, but there are several challenges when integrating them into a system. For example, JSON output is common in system integration, and providing accurate prompts is necessary for proper JSON output. With ChatGPT, JSON output can be enforced through function calling.
Additionally, desired results may not always be returned, responses may take time, or only partial responses may be returned. System design must account for these various cases.
Reference: Function calling – OpenAI API
Latency and Performance
LLMs generally take time to process. Especially when referencing past messages, the amount of data being sent and received gradually increases. Delayed responses can lead to user stress from a UX perspective.
Therefore, when integrating LLMs into a system, UX considerations to alleviate user stress are necessary.
Tuning
When integrating LLMs into a system, additional training with proprietary data is often conducted. This data must be properly structured and recognized.
Of course, as the amount of training data increases, so does the cost. Balancing this is also necessary.
Cost
The biggest issue is cost. Building LLMs independently incurs significant costs, but even when using APIs, training them with large amounts of data can be costly. Allowing users unrestricted access can lead to high costs.
Mechanisms such as caching responses or limiting usage per user must be considered.
Prompt Injection Issues
Prompt injection is an attack that generates unintended responses from LLMs by providing them with special prompts. LLMs only predict the following words based on the prompt. Therefore, certain prompts can generate responses unintended by developers.
Various countermeasures are currently being implemented, but by providing malicious prompts related to generating license keys, involvement in crime, or discrimination, unintended responses can be generated.
Conclusion
We've summarized the concerns to be addressed when integrating LLMs into a system. While LLMs are convenient, they also have many vulnerabilities when it comes to users.
However, this field is constantly evolving, with challenges being addressed one by one. Consider checking these latest developments and considering their use within your system.
Contact Us
Click here for more information about Hexabase, including how to use it, costs, and partner inquiries.