This is absolutely possible, but as of writing our VCL framework doesnt have a full implementation of TBitmap. I have done a similar thing myself, where i use WASM to draw to a pixelbuffer, then quickly move that over to the DOM side of things for canvas work.
In my case I have implemented a fairly low-level pixel buffer with standard primitives (line, fillrect, rect, ellipse etc etc) and use that, but its not ready for public consumption just yet. The VCL framework is being worked on, so TBitmap will no doubt appear soon.
For your second question: Yes, the canvas has a draw method that takes an imagebuffer or image reference. So you can reference the canvas element from wasm, and hook into the events, and then change or perform effects there. Normally you would use CSS for smooth transitions.
As for loading pictures, that is already a standard part of JS/DOM so that is common functionality. Even though you use WASM, you want to take advantage of the browsers support for image formats and just load the images through ordinary channels there (and then you can push the data back into the wasm module).
WASM is binary and can have data yes, but you dont want to embed data into the module, thats not how the JS/Browser world works (you want to be compatible with the infrastructure JS devs use, especially in a commercial product). WASM code can quickly become big, so unloading common tasks like this to the JS side is the best approach (imho). Implementing raw image codec’s as a part of wasm - when they already exists as a part of the browser would be time consuming and non productive.