Using AI to Write Tests for React Components
The article explores the efficiency of UI test writing using tools like GitHub Copilot and ChatGPT. While these AI aids can generate code snippets, human guidance is crucial for refining and validating the suggested tests. It discusses the evolution of AI tools, including GitHub's TestPilot and Copilot X, and raises privacy concerns related to cloud-based solutions. Emphasizing the importance of treating these tools as aids rather than autonomous generators, the article suggests exercising caution, especially with sensitive code.
Writing tests for UI can be a tedious task that requires a lot of attention to detail. It's easy to get bogged down in the details and lose sight of the big picture. However, it is a critical aspect of software development that ensures the quality and reliability of your code.
It's an often overlooked aspect of software development, especially in fast-paced environments like startups, which are focused on delivering products to market as quickly as possible. Lack of proper testing can result in unexpected bugs and problems that could be easily avoided. Focusing on developing new features without proper testing can only be beneficial in the short term. However, as the product—and the team—grows, fixing bugs can become more time-consuming than writing proper tests in the first place.
We will explore ways to make this process more efficient and effective, including the use of AI tools such as GitHub Copilot and ChatGPT. For testing, we will use code from the Expensify app as it’s open source.
What is GitHub Copilot?
GitHub Copilot is a powerful AI tool created by GitHub that can suggest code to you. It is based on OpenAI's Codex, which is a descendant of GPT-3. According to GitHub, it currently only knows the contents of the currently open file, related files, and possibly open tabs.
It often suggests code based on snippets that were used to train it, which are not specific to your project. Therefore, it's important to use GitHub Copilot with caution, not rely solely on it, and have a good understanding of testing principles and manually write tests to ensure that your React components work as expected.
To use GitHub's Copilot, you need a GitHub account with access to Copilot, which costs $10/month (free for Open Source contributors!). To use it in your IDE, just install the Copilot extension, and you're good to go!
Using GitHub Copilot for writing tests
Let’s see how Copilot can help with writing tests for React Native components. For that purpose, we will use the React Native component from Expensify's app repository which is called EmojiPickerButton. We will want to generate tests which will use react-native-testing-library.
Let’s create a new file called <rte-code>EmojiPickerButton.test.js<rte-code> and write a comment saying:
Got the following results:
At first glance, it looks pretty good! It created two tests, one to check if our component renders correctly (although this is a low-quality test using snapshots that should be avoided), a second one to check if it calls the <rte-code>onPress<rte-code> property when it's pressed, a basic button test.
But wait... <rte-code>EmojiPickerButton<rte-code> doesn't accept the <rte-code>onPress<rte-code> property! It also doesn't have an <rte-code>emoji-picker-button<rte-code> test ID specified anywhere. Copilot tried to create tests for this button based on typical custom button component, which is not applicable in this case.
Let's try to write a test at the end of <rte-code>EmojiPickerButton.js<rte-code>, below <rte-code>EmojiPickerButton<rte-code> component, so maybe Copilot will have a better understanding of its implementation.
The result we got is similar to the previous example, but this time the second test case is different. Copilot wants to check if <rte-code>EmojiPicker<rte-code> is rendered when <rte-code>EmojiPickerButton<rte-code> is pressed by checking if <rte-code>emoji-picker<rte-code> test ID exists. Unfortunately, it again tries to use a non-existent <rte-code>emoji-picker-button<rte-code> test ID and <rte-code>emoji-picker<rte-code> test ID.
This case confirms that even though Copilot is a powerful tool for generating code snippets, it still needs human guidance because it may generate code that is not correct or may require some additional work like adding missing test IDs. However, it is a tool that can improve a developer's work and make it much more efficient.
Using Copilot with human guidance
If tests generated by Copilot on its own were inaccurate, let's again try writing tests, but this time with a bit of guidance. I will mark Copilot suggestions as comments. Let's again create <rte-code>EmojiPickerButton.test.js<rte-code> file and start with the following code:
We want to add a test that checks if the component renders correctly.
Looks good, but note that it is still necessary to add <rte-code>emoji-picker-button<rte-code> test ID to our <rte-code>EmojiPickerButton<rte-code> component manually - let’s add it.
Now let's add a test that disables the button when <rte-code>isDisabled<rte-code> is set to <rte-code>true<rte-code>:
The provided suggestion was satisfying.
Let’s add one more test that checks if button shows emoji picker when pressing:
Notice how we shifted our focus from figuring out the implementation, to finding a better test description and defining our expectations clearly. Pretty neat, huh?
With human guidance, the quality of the tests also improved significantly. Copilot's suggestions were more accurate. It recognized props like <rte-code>onModalHide<rte-code> or <rte-code>onEmojiSelected<rte-code> and passed mocked functions to them.
But if we try to run those tests they will fail. It’s because <rte-code>withLocalize<rte-code> HOC hasn't been mocked. Let’s see if Copilot will be able to mock it:
Unfortunately, when we try to run tests with this mock they will fail because of missing mocked <rte-code>translate<rte-code> property passed into <rte-code><Component /><rte-code>.
Also, in the test <rte-code>shows emoji picker when pressing emoji picker button<rte-code> Copilot mocked functions and expects them to be called, which looks fine at first glance but is not. If we take a look at <rte-code>EmojiPickerButton<rte-code> implementation, we can see that both onModalHide and onEmojiSelected are passed as arguments into other functions and are not called directly by the component. What is more, this test doesn't check if emoji picker has been shown. This example perfectly shows that you should carefully verify the test and code suggestions given.
Case with mocking <rte-code>withLocalize<rte-code> shows very well that with the increasing complexity of the project, it is inevitable to have adequate knowledge not only of the testing principles but also of the project structure.
Copilot is a great tool for improving performance while coding, not just testing, but coding as a whole. You need to keep in mind that it will make smaller or larger mistakes due to lack of context, and it needs to be supervised to make correct suggestions.
Generative Pre-Trained Transformers (known as GPT)
What are Generative Pre-Trained Transformers?
Generative Pre-trained Transformers (GPTs) are a family of neural language models that use the transformer architecture to generate coherent text in a human-like manner. GPTs are trained on large amounts of text data using unsupervised learning techniques, such as self-supervised learning, which allows them to learn patterns and relationships in language without explicit supervision.
The easiest way to use one is to use ChatGPT by OpenAI, web application powered by GPT-3.5, which is an improved version of the GPT-3 model with additional training data and parameters (however, tools allowing GPT to be integrated directly into the IDE are created). Although this model is not designed to work directly on code, it does a good job of explaining given snippets, such as React components, and describing or creating regular expressions. It can also be used to create simple code from scratch by following simple prompts, such as <rte-code>Create Button component in React Native<rte-code>:
And then you can simply ask it to generate tests for the above component:
Without specifying, it used <rte-code>react-native-testing-library<rte-code> and covered the rendering component and checked if <rte-code>onPress<rte-code> props were called on button press.
Using ChatGPT for writing tests
Let’s use ChatGPT for the same example we did for GitHub Copilot - <rte-code>EmojiPickerButton<rte-code>. To do that, we need to copy the whole implementation of <rte-code>EmojiPickerButton<rte-code> and paste it to ChatGPT and add, e.g. <rte-code>Write tests for above component using react-native-testing-library<rte-code> prompt at the end.
This is the result:
It is immediately noticeable that ChatGPT generated better tests for our component than Copilot without any additional guidance. It correctly recognized that <rte-code>EmojiPickerButton<rte-code> accepts <rte-code>onModalHide<rte-code>, <rte-code>onEmojiSelected<rte-code> and <rte-code>isDisabled<rte-code> props. But also made the same mistake as Copilot and tried to check if mocked functions passed into these props had been called. It’s also missing <rte-code>withLocalize<rte-code> HOC mock. Let’s ask ChatGPT to add it by using prompt <rte-code>Add withLocalize<rte-code> mock. Here’s the result:
You may notice that it wants to use <rte-code>emoji-picker-button<rte-code> test ID (same as Copilot), which may seem incorrect, because it’s missing in original component implementation, but ChatGPT added note that it’s necessary to add testID prop to <rte-code>Pressable<rte-code> component:
<p-bg-col>Note that you will need to add a testID prop to the Pressable component in the EmojiPickerButton component in order to use getByTestId in the tests.<p-bg-col>
And even provided updated version of <rte-code>EmojiPickerButton<rte-code> with testID added.
ChatGPT was able to generate more accurate tests for the <rte-code>EmojiPickerButton<rte-code> component without any additional guidance (except for missing <rte-code>withLocalize<rte-code> HOC mock). It correctly identified the props that the component accepts and mocked up the required functions. It also provided an updated version of the component with the <rte-code>testID<rte-code> property implemented. Unfortunately, it created one test that incorrectly checks if mocked functions are called.
Keep in mind that results obtained for a given prompt from ChatGPT may be different every time they are generated.
Let's test some more components. We will present results we got both from GitHub’s Copilot and OpenAI’s ChatGPT.
The tested component will be Switch.js.
To make that test work, it was necessary to add <rte-code>switch<rte-code> test ID to <rte-code>Switch<rte-code> component. Besides that, it did a good job: mocked function, passed it via property and checked if it has been called. Tests work as expected.
Suggested tests look very similar to ones made by Copilot with the exception that it also checked how many times <rte-code>onToggleMock<rte-code> function has been called. It also used <rte-code>switchComponent<rte-code> constant instead of passing <rte-code>getByTestId<rte-code> directly into <rte-code>fireEvent<rte-code> or <rte-code>expect<rte-code> function. But again, tests work correctly.
Let’s also test Header.js
Copilot required a bit of help with first test as it wanted to use snapshot, so we gave it a little hint. Rest of suggested tests implementation worked as expected after adding <rte-code>environment-badge<rte-code> test ID.
ChatGPT suggested 5 tests, 4 of which overlap with Copilot example. It added one additional checking if subtitle isn’t rendered if it’s not provided, which could be checked in first test as well. It also requires to add <rte-code>environment-badge<rte-code> test ID.
It is worth mentioning that in each of the above cases, instead of using <rte-code>getBy*<rte-code>, we can use the <rte-code>screen<rte-code> API from <rte-code>react-native-testing-library<rte-code>, which will simplify the implementation tests. Let's try to use ChatGPT to refactor the above example to use <rte-code>screen<rte-code> api. In order to do this, we will use <rte-code>refactor to use screen API<rte-code> prompt. Here’s the result:
Copilot struggled a bit with this task:
What will the future hold?
Tools such as Copilot or ChatGPT can significantly improve a developer's performance when writing tests. But remember that this is a rapidly evolving technology and there will be more great tools to come. As artificial intelligence technology advances, we may see more tools like Copilot and ChatGPT to help us write tests for React components in the future.
TestPilot by GitHub Next
TestPilot is a tool from GitHub Next and powered by GitHub Copilot that will allow the generation of "readable tests with meaningful assertions". It will scan the repository and generate e.g. unit tests based on information code and documentation. The main difference from Copilot is that TestPilot has much more context, which will allow it to better understand the structure of the project's code.
GitHub Copilot X
GitHub is working on the next iteration of Copilot, called Copilot X. Currently Copilot is based on OpenAI's Codex, which uses GPT-3 internally. Copilot X will use GPT-4, which is many times more powerful than GPT-3. This will allow for much better code understanding and context sensitive suggestions. It will be a bit like ChatGPT but integrated into the IDE with knowledge about the project.
In the future, it may be possible to run an LLM (large language model) such as GPT-3 or even GPT-4 on a personal machine or private server. Currently, it is possible to run a pre-trained model locally, but this requires enormous computing power (which increases with the number of parameters in the model). The GPT-2 model (1.5 billion parameters) requires only about 4 GB of RAM, but the GPT-3 model, which has about 175 billion parameters, is estimated to require more than 300 GB of RAM.
Tabnine Test Generation
Tabnine is an AI-powered code completion tool similar to GitHub Copilot, but allows training on private code repositories and also self-hosting for enterprises. Recently it was announced that the ability to write unit tests will be added soon. It will be able to learn from the user's code to generate tests that match their coding style and patterns.
Data protection is an important issue for many organizations. The source code of an application or product often contains confidential information that, if leaked, could cause damage to the company's reputation and financial loss. It is, therefore, important that the tools you use provide adequate security.
Tools such as Github Copilot or ChatGPT are cloud-based and cannot be used offline. This may also raise some privacy concerns. Code that we want to write tests for, for example, has to be sent to third party servers where it is processed and also used to train models. It may happen that a code snippet containing, e.g. a secret API key or other confidential information is suggested to another user.
The perfect solution for this would be a local LLM fine-tuned to the selected repository, but as described in the Local LLMs section, this isn't easily possible yet due to hardware limitations. However, it's worth noting that smaller self-hosted models have recently begun to appear, like Tabby or turbopilot.
In ChatGPTs General FAQ it's mentioned that OpenAI uses the submitted data (and prompts) to improve their product and they recommend not to submit sensitive information in conversations.
The situation is different with Copilot. According to GitHub, users can configure whether or not they want their code snippets to be used for further model training. They can also choose to check if their code matches public code on GitHub.
It's important to note that no matter what the user chooses, the code snippets are still sent to GitHub's servers to be processed, as GitHub Copilot is a cloud-based service.
Cloud-based AI tools such as Github Copilot and ChatGPT offer many benefits, including being helpful when writing tests for our React components. All tested models produced significantly more accurate code with more guidance and context. At the same time, allowing to exercise a declarative way of writing software even further. I could state my expectations about the test in plain English, and the tools would figure out the “implementation details” for me.
Sometimes these tools can even suggest a test case that we might have missed. The more accurate my description was, the better the results were. Although I’m pretty sure they would be even better with more context of the project, which, e.g. GitHub is currently working on with their upcoming TestPilot. However, currently, they should be treated as a "helping hand" rather than a test generator because as the complexity of the project structure increases, so does the need for supervision.
These tools also raise concerns about source code privacy for code that’s not publicly available. In case of the Expensify app we used for testing, the problem doesn’t exist, which is a kinda unexpected benefit of keeping your code open-source. Most of the developers, however, need to be aware that their code snippets may be sent to third party servers for processing, potentially exposing confidential information. It is important to exercise caution when using these tools and to choose the settings that best suit the user's needs and privacy concerns. In addition, while local LLMs are not currently viable due to hardware costs, they will certainly be optimized for performance, making it possible to self-host without having access to a super-computer.