Need clear answers on data protection and code confidentiality

Hi RemObjects team,

We are currently evaluating CodeBot for Delphi for use in an enterprise environment and need clear answers on data protection and code confidentiality before proceeding .

Could you please clarify the following points in concrete terms :

  1. No code sharing
    Can you explicitly confirm whether any source code, prompts, or project context sent to CodeBot is shared with third parties (including LLM providers), beyond what is strictly required to generate a response?

  2. Data Processing Agreement (DPA)
    Do you offer a Data Processing Agreement suitable for EU/enterprise customers, and if so, under which legal role does RemObjects act (data processor vs. controller)?

  3. Exact data flow
    Please describe the precise data flow when CodeBot is used:

    • What data is sent from t he IDE?

    • Where is it processed (geographic region)?

    • Which external services or model providers are in volved?

  4. Logging, storage, and retention
    Is any code, prompt, or respons e content:

    • logged,

    • stored temporarily or permanently, or

    • retained for debugging, analytics, or support purposes?

  5. Training usage
    Can you guarantee that customer code and prompts are not used for model training or fine‑tuning, now or in t he future?

  6. Enterprise controls
    Are there (or will there be) options for:

    • enterprise/contractual guarantees,

    • private or dedicated mode l backends, or

    • configuration that disables any data ret ention?

We are not looking for high‑level assurances, but for explicit technical and contractual statements, as this determines whether CodeBot can be used on proprietary and reg ulat ed codebases.

Thanks in advance for a clear and det ailed response.

Hi Patrick,

Thankyou! We’ll get back to you with answers soon. Because CodeBot is in beta, we don’t have pre-prepared answers for this – until I give you the exact replies though, I can assure you that as a general principle we treat your data and source code with respect. I or another staff member will reply with full answers soon.

Regards,

David

Hi Patrick,

Because we haven’t released it yet, the following is not a promise/commitment, but it’s our current plan. If you have questions or want to suggest we do something else, we’d love to know!

That said:

No code sharing
Can you explicitly confirm whether any source code, prompts, or project context sent to CodeBot is shared with third parties (including LLM providers), beyond what is strictly required to generate a response?

None. The only third parties currently are LLM providers plus the hosting provider. LLMs only get the source code / other content via API calls that is required for CodeBot to function as designed. That includes project content, prompts, source, sometimes metadata like paths (to locate files or know what libraries are installed), etc – but it’s all necessary for function as designed.

Data Processing Agreement (DPA)
Do you offer a Data Processing Agreement suitable for EU/enterprise customers, and if so, under which legal role does RemObjects act (data processor vs. controller)?

Not yet, but I believe we will likely need to.

Currently, we host our servers in an EU data center. We are considering having multiple geolocations both for speed and to locate data in specific regions for specific customers.

Exact data flow
Please describe the precise data flow when CodeBot is used:

What data is sent from t he IDE?

Where is it processed (geographic region)?

Which external services or model providers are in volved?

Because CodeBot is in beta, the data sent from the IDE can change in future and has changed in the past. However, as an overview, it can send info about the currently loaded project group (including any files, and their contents if required), and file listing and their contents for a constrained folder location and its subfolders based on the projects and folders. (Ie, it finds the root folders for the files the project makes use of and can read and write within those root folders.) These locations are always listed in the chat window at runtime; you can see what CodeBot is able to access. In addition, you can optionally make other folders available for CodeBot to read (at your discretion, it’s intended for, eg, locations of third party libraries.) There are also CodeBot configuration files and their contents, plus miscellaneous data with each request: IDE version, source and library paths, for example.

CodeBot does not have unconstrained access to the file system at the current time, and if we add it in future, it would be gated such that you can turn this on or add and retain control of what locations it can access.

Currently, the server is located in a data center in west EU (Amsterdam).

We currently use OpenAI models, but may use others in future. We only use providers that do not retain data or train on it.

Logging, storage, and retention
Is any code, prompt, or respons e content:

logged,

stored temporarily or permanently, or

retained for debugging, analytics, or support purposes?

Yes. The server may log some of the data communicated back and forth (it is in beta and this is useful for debugging and support.) This is retained only in the transient VM that the server runs in and is deleted either via a rolling window, or when we deploy an update and the previous server VM is destroyed. The exact data that is logged currently depends on what we are tracking or debugging, such as more data being logged for newer features we’re actively working on.

We are considering offering an enterprise option that specifically does not log, and/or that offers a dedicated server for a specific company. We’d be interested if that would be of interest to you.

Full logs of all data is placed on the user computer. For complex issues, we ask the user to send us those logs. We do not have them ourselves.

We retain data on usage: that is, user accounts, network requests, costs, number of tokens, etc. This is permanently retained and is required for billing. This data does not contain any code, prompt, or response content.

Training usage
Can you guarantee that customer code and prompts are not used for model training or fine‑tuning, now or in t he future?

Yes. We don’t do that. Even if we did in future, I believe we would always provide a SKU where this would not occur – it is essential to offer a corporate/enterprise SKU where data is private. In other words, even if we do in future, you would be able to make sure it’s not happening for you.

Enterprise controls
Are there (or will there be) options for:

enterprise/contractual guarantees,

private or dedicated mode l backends, or

configuration that disables any data ret ention?

Yes, very likely. This would probably take the form of:

  • A SKU where we guarantee non-training, non-fine-tuning etc, as above
  • An option for a dedicated server
  • And for that server, an option for disabling all logging and other source code / related data

I noted above that complete logs, even today, only exist on the user’s local machine. If that too is something that needs to be disabled it would be straightforward for us to offer an option to do so.

I hope this helps. Because we aren’t selling it yet, as noted this is just our plans not a commitment, but of course we know it’s important to be able to meet requirements like this. It would be great to know whether the above sounds suitable for you or if we need to look into other options. As a general principle, our goal is to meet needs like this so that CodeBot can be reliably and safely used.

Regards,

David

To clarify, this refers to the CodeBot functional back-end server(s) only. Other servers operating RemObjects Software infrastructure – including e.g. the remobjects.com account system used to authenticate with CodeBot – may and do operate outside the EU (currently mostly in AWS US East, but we reserve the right to change that at any time and without notice). These servers do not handle or receive any CodeBot content.

Also, to clarify: this applies to our own data handling and training, only. As mentioned elsewhere, under the hood, CodeBot currently uses OpenAI’s LLM model(s) using their standard generally available terms and conditions (ie same you get if you go to ai.openai.com). So OpenAI’s policies apply to how the data is processed by the LLM itself, and you need to check with (and trust) OpenAI on what their policies in regard to training and data retention are.

In the future, we may change what models or providers CodeBot uses and/or possibly give an option. If we change model providers or plans internally, that will be noted.

Yes, it is worth mentioning that for the beta program currently ongoing, we make no claims whatsoever on the privacy of data. We’re never doing anything malicious, obviously, but during the beta and while using the beta version of CodeBot, and the data you give to it, including prompts and full source code you use CodeBot may in theory be seen by us, both in the course of general debugging and support for the beta, but also because beta software will have bugs. :).

Do not use the CodeBot Private Beta with any confidential code or data.

Thanx :folded_hands:

1 Like