Hi Patrick,
Because we haven’t released it yet, the following is not a promise/commitment, but it’s our current plan. If you have questions or want to suggest we do something else, we’d love to know!
That said:
No code sharing
Can you explicitly confirm whether any source code, prompts, or project context sent to CodeBot is shared with third parties (including LLM providers), beyond what is strictly required to generate a response?
None. The only third parties currently are LLM providers plus the hosting provider. LLMs only get the source code / other content via API calls that is required for CodeBot to function as designed. That includes project content, prompts, source, sometimes metadata like paths (to locate files or know what libraries are installed), etc – but it’s all necessary for function as designed.
Data Processing Agreement (DPA)
Do you offer a Data Processing Agreement suitable for EU/enterprise customers, and if so, under which legal role does RemObjects act (data processor vs. controller)?
Not yet, but I believe we will likely need to.
Currently, we host our servers in an EU data center. We are considering having multiple geolocations both for speed and to locate data in specific regions for specific customers.
Exact data flow
Please describe the precise data flow when CodeBot is used:
What data is sent from t he IDE?
Where is it processed (geographic region)?
Which external services or model providers are in volved?
Because CodeBot is in beta, the data sent from the IDE can change in future and has changed in the past. However, as an overview, it can send info about the currently loaded project group (including any files, and their contents if required), and file listing and their contents for a constrained folder location and its subfolders based on the projects and folders. (Ie, it finds the root folders for the files the project makes use of and can read and write within those root folders.) These locations are always listed in the chat window at runtime; you can see what CodeBot is able to access. In addition, you can optionally make other folders available for CodeBot to read (at your discretion, it’s intended for, eg, locations of third party libraries.) There are also CodeBot configuration files and their contents, plus miscellaneous data with each request: IDE version, source and library paths, for example.
CodeBot does not have unconstrained access to the file system at the current time, and if we add it in future, it would be gated such that you can turn this on or add and retain control of what locations it can access.
Currently, the server is located in a data center in west EU (Amsterdam).
We currently use OpenAI models, but may use others in future. We only use providers that do not retain data or train on it.
Logging, storage, and retention
Is any code, prompt, or respons e content:
logged,
stored temporarily or permanently, or
retained for debugging, analytics, or support purposes?
Yes. The server may log some of the data communicated back and forth (it is in beta and this is useful for debugging and support.) This is retained only in the transient VM that the server runs in and is deleted either via a rolling window, or when we deploy an update and the previous server VM is destroyed. The exact data that is logged currently depends on what we are tracking or debugging, such as more data being logged for newer features we’re actively working on.
We are considering offering an enterprise option that specifically does not log, and/or that offers a dedicated server for a specific company. We’d be interested if that would be of interest to you.
Full logs of all data is placed on the user computer. For complex issues, we ask the user to send us those logs. We do not have them ourselves.
We retain data on usage: that is, user accounts, network requests, costs, number of tokens, etc. This is permanently retained and is required for billing. This data does not contain any code, prompt, or response content.
Training usage
Can you guarantee that customer code and prompts are not used for model training or fine‑tuning, now or in t he future?
Yes. We don’t do that. Even if we did in future, I believe we would always provide a SKU where this would not occur – it is essential to offer a corporate/enterprise SKU where data is private. In other words, even if we do in future, you would be able to make sure it’s not happening for you.
Enterprise controls
Are there (or will there be) options for:
enterprise/contractual guarantees,
private or dedicated mode l backends, or
configuration that disables any data ret ention?
Yes, very likely. This would probably take the form of:
- A SKU where we guarantee non-training, non-fine-tuning etc, as above
- An option for a dedicated server
- And for that server, an option for disabling all logging and other source code / related data
I noted above that complete logs, even today, only exist on the user’s local machine. If that too is something that needs to be disabled it would be straightforward for us to offer an option to do so.
I hope this helps. Because we aren’t selling it yet, as noted this is just our plans not a commitment, but of course we know it’s important to be able to meet requirements like this. It would be great to know whether the above sounds suitable for you or if we need to look into other options. As a general principle, our goal is to meet needs like this so that CodeBot can be reliably and safely used.
Regards,
David