Security Implications of Coding with AI Assistants

Introduction

The potential for developer efficiency gains through GenAI tools is massive, and we’re only just getting started. However, realizing these gains is difficult for a number of reasons. First, the organization must have a thorough understanding of who exactly is using GenAI coding tools and how they are being used. Second, there is a mindset shift that has to occur to weave GenAI tools into the developer experience and the software development lifecycle rather than just plopping them in at whim. Finally, and perhaps the most pressing challenge is security which begs the question; is AI-generated code secure? Security is where we’ll focus our attention this time around.

The Top Security Risks of Coding with AI Assistants

IP and privacy

One of the biggest concerns when using GenAI tools to write code is the risk of exposing proprietary or sensitive data because the tool must be given code and vast datasets from which to learn. If a company starts feeding an AI coding tool proprietary data that isn’t open source, that data could become part of the algorithms LLM model, making it impossible to get back. Unfortunately, that proprietary data is now also accessible to anyone that uses the tool.

A recent report by Cyberhaven identified that 10% of employees have input sensitive and confidential data into ChatGPT. When using any new GenAI tool, it’s important to address the human aspect of security concerns before you have a big problem. This issue can affect any company! Not too long ago it was reported that employees at Samsung were using ChatGPT to debug proprietary source code and summarize meeting transcripts. Because they input the data directly into ChatGPT, they can’t get that information back and anyone that queries ChatGPT asking for it can access it fairly quickly.

Coding with AI Assistants

Because of the challenges above, it’s important to understand what tools you are using at your organization, who is using them, and how they are using 3rd party data. This requires vetting GenAI tools and coding assistants. It is not recommended to use public and free tools like ChatGPT as you’ll be feeding the LLM with training data. It is important to clarify this because ChatGPT also has a paid version, and this can even be configured on their website so that the data is not used for training purposes. OpenAI offers APIs to allow companies to connect to its LLMs which they can use to power their own AI tools. When used this way, OpenAI doesn’t use any of the inputs coming through the APIs as training data, making it more secure. There are additional methods to use that avoid this risk such as using offline models or other paid models with similar conditions to OpenAI, like those from Google.

Having AI Tools Write Large Amounts of Code

While AI tools can certainly write lots of code quickly, that doesn’t mean it will be quality code. And, since there’s a lot of it, it’s more difficult for developers to quality control for potential security vulnerabilities and errors. The huge output complicates the manual review process as it becomes very tedious for developers to manually review every line of AI-generated code. This challenge is particularly pronounced in large projects or when there are tight timelines on projects and developers want to get things done quickly. Without strict usage guidelines and robust automated testing frameworks in place, when developers use AI coding tools to produce large amounts of code, things can go wrong very quickly and errors and vulnerabilities can slip through the cracks. Leveraging AI for software development brings great benefits but also great responsibility. The best approach is to see AI as an assistant and not a replacement and leverage it responsibly.

Corrupt or Vulnerable Training Data

If you put vulnerable or inaccurate data into the AI tool, you’ll most likely get code that enhances, not improves upon the risky data. For example if you used existing code with vulnerabilities that have not been patched, the AI will keep using these older versions as it may not know better. This happens because AI tools are not actually analyzing and creating robust code, they are simply using the code given to them to imitate and build new code. If the code is insecure, the AI tool will deliver insecure code. Snyk, a security-driven developer platform, estimates that the average software development project has about 40 vulnerabilities in first-party code, one-third of which are high severity. Not very encouraging if you’re going to feed that code to an AI coding assistant to learn from to help you produce more code.

This is particularly problematic when inexperienced developers are using AI coding assistants because coding assistants don’t point out problems or vulnerabilities. Rather, the coding assistant will use the incorrect code as information for what quality to deliver, enhancing the errors. Unfortunately, this means the developer keeps missing opportunities to improve and the code base remains vulnerable.

Best Practices for Ensuring Secure Code When Using AI assistants

Familiarize AI tools with your coding standards – Ensuring that AI tools align with internal coding standards and best practices is crucial for maintaining code quality and security. Organizations must proactively configure their AI coding assistants to understand and follow the specific programming conventions, security guidelines, and architectural patterns that have been established internally. This alignment helps prevent the introduction of non-compliant code, which can create inconsistencies, reduce maintainability, and expose security vulnerabilities. To achieve this, developers and IT leaders should regularly update the training datasets and rules used by AI tools to reflect the latest coding practices and security protocols. Additionally, integrating code linters and style checkers into the development pipeline can automatically enforce these standards when AI-generated code is committed.
Use an AI assistant that has added vulnerability filtering features – As AI-generated tools become more advanced and actively address security, you can easily find ones that have filters or vulnerability scanning. While you shouldn’t rely solely on these features, it can go a long way in reducing the amount of time spent testing and checking AI-generated code.
Include precise documentation – Maintaining detailed documentation of AI-generated code’s origin, purpose, and functionality is crucial. This practice aids in future audits and quality assurance processes, ensuring that the code’s integration is both transparent and justifiable.
Reinforce the implementation of security practices, SAST, DAST, and penetration testing to identify the use of obsolete libraries with reported CVEs, bad coding practices, or potential injection of vulnerabilities in the code through the AI process.
Human expertise is required for QC! This is a non-negotiable. AI tools are just that; tools. No matter how far they have come in the past year, and continue to advance, human expertise and oversight remains 100% necessary. Developers must rigorously check and validate AI-generated code, ensuring it meets security standards and integrates seamlessly with existing systems. Ultimately, when you deply, the humans are the ones responsible for any errors or vulnerabilities discovered, and the resulting consequences.

AI coding assistants offer promising enhancements to developer productivity and software capabilities, and most developers are already using them. However, the integration of these tools into the software development lifecycle must be managed with a keen eye for security to protect intellectual property, ensure data privacy, and maintain high-quality, secure coding practices.

Ceiba’s team of experts can develop a custom cloud infrastructure strategy to enable secure, scalable IoT applications. Contact us today to discuss your specific needs and unlock your business’s full potential with IoT and cloud computing!

Let’s Talk

You may also be interested in: