Z3 : An LLVM backed runtime for OCaml

ulber · on Nov 17, 2012

Bad name: Z3 is an SMT solver from Microsoft and a very good one at that. I'm assuming this LLVM OCaml thing is newer.

Raphael_Amiard · on Nov 17, 2012

I wasn't aware of that. The Z3 name was imposed as part of the university project this started with.

It was supposed to be part of the VMKit suite of compilers, J3 being the Java bytecode compiler and N3 being the .Net compiler.

I'd be happy to change the name of this project if it ever gains enough traction to cause confusion in any way :)

chwahoo · on Nov 17, 2012

To backup the original poster, one of OCaml's stronger communities focuses on program analysis for C programs (using the CIL library http://www.cs.berkeley.edu/~necula/cil/) and SMT solvers like Z3 are also widely used in that community, so there's definitely some potential for confusion if your OCaml backend gains some traction.

Still, an LLVM backend for OCaml seems like a great idea. Keep up the good work.

saurik · on Nov 17, 2012

> I'd be happy to change the name of this project if it ever gains enough traction to cause confusion in any way :)

By that time it is largely too late and the damage/bother is already difficult to fix.

agumonkey · on Nov 17, 2012

Ha I didn't recognize the VMKit name, but since I googled some things, and found this project, in case someone is interested : http://vmkit2.gforge.inria.fr/index.php

tikhonj · on Nov 17, 2012

If Java gets J3 and .NET gets N3, why doesn't OCaml get O3? It's also a cute name because it looks just like an optimization level :).

agumonkey · on Nov 17, 2012

Is it in collaboration with the core ocaml team ? or at least are they aware of it ?

cpeterso · on Nov 17, 2012

If you're following J3 and N3, then why not O3?

Or OCamLLVM? ;)

fhars · on Nov 17, 2012

Not only that, but Microsoft's Z3 supports ocaml and is still the first google hit for "z3 ocaml" in my search bubble: http://www.google.com/search?q=z3+ocaml

This project should change its name as soon as possible.

Raphael_Amiard · on Nov 17, 2012

Ok, duly noted

pjmlp · on Nov 17, 2012

C++ instead of OCaml?!

Specially taking into consideration how much better OCaml is suited for compiler development and the existence of LLVM bindings for OCaml.

Raphael_Amiard · on Nov 17, 2012

Well the use of C++ instead of OCaml allowed us a certain number of things :

- Most notably reading OCaml's header files, and hence using the correct, macro declared data types in LLVM code generation. This could have been done with an hybrid of C and OCaml, but was much simpler in OCaml. - Also using OCaml runtime to read the bytecode. This could have been done in OCaml, but would have been more tedious.

Also you have to consider this project isn't treating an AST per se (the thing at which ML languages excels, by the use of sum types) but a bytecode array, which is flat in structure. We put back some structure in it, but it is predictable and not as complex as an AST.

pjmlp · on Nov 17, 2012

Don't take it personally.

I am all for using the right tool for the job, and I like C++ (except for a few things like lack of modules. :) ).

For an outsider which loves ML languages, OCaml still looks it could have been a better solution.

This is like doing bootstrap compilers, sometimes the efforts of such tasks lead to improvements on the language eco-system.

Then again, this is about using what you thought was the best tool at hand, so don't take it badly for my negative comment and good luck for the project.

Raphael_Amiard · on Nov 17, 2012

I'm not :) I'm just providing some context.

This project has more fundamental issues in my opinion if you want to consider it as a real replacement for ocamlopt, as i explained here http://news.ycombinator.com/item?id=4798320. It is not, just a replacement for OCaml's interpreter, and as such it is a very simple project and will probably remains so. I don't plan to make it evolve beyond bug-fixing and maintaining it.

As such, if i had to re-code it in another language, it would be in C, to make it integrable into OCaml's runtime easily. Probably not OCaml, because its expressive power is not needed.

If i had to start from scratch, i would do an LLVM backend for the ocamlopt compiler. And i would do it in OCaml of course :)

munin · on Nov 17, 2012

also, the caml bindings for llvm kinda suck

pascal_cuoq · on Nov 17, 2012

Just checking that you are aware of OCamlCC. If you aren't, good news! You will have plenty of notes you can exchange with Benoît.

http://oud.ocaml.org/2012/abstracts/oud2012-final10.pdf

andrewcooke · on Nov 17, 2012

how close to ocamlopt do you think you can get? (ie do you have any handle on what the returns would be for further work on this? what are the main limiting factors?)

Raphael_Amiard · on Nov 17, 2012

Ok brace yourself, this is gonna be a long answer :) Go down for the TLDR if you don't want to read everything.

It really depends on your approach. This project isn't well suited for performance, because it treats OCaml's bytecode, which is very hard to optimize for performance, because it has been designed to be interpreted by an Abstract Machine, not compiled.

For example, it is stack based, arguments are passed on the stack explicitly when a function call happens. On the other hand LLVM is register based, with function arguments. You then have two choices :

- Either you translate the bytecode very directly, by using an explicit stack (eg. an array of memory). This is the easiest approach, but it produces code that is hard/impossible to properly optimize. - Either you try to make a model translation, from stack based to register based, and translate every semantic to the LLVM model (for example, pass function arguments as LLVM function arguments instead of putting them in a stack manually). This approach is much more difficult, but promises a lot more potential for optimization.

Another problem is that OCaml's bytecode is untyped. You loose all type information.

We tried both approaches in the Z3 project. The main branch is based on the direct-translation/explicit-stack approach. It is not very fast, and hasn't a lot of promises for going faster.

There are experimental branches based on the translation model. We were able to get much better performance with them on some code. There is a lot of potential for optimization, because, this way, you write LLVM idiomatic code, and you are able to reconstruct some type information, that enables you to do further optimizations. But there are drawbacks :

- It is much more complicated. The ZAM isn't formally specified so you pretty much have to read the code of the VM to understand what's going on. Debugging is horrible.

- Once you're there you have to provide your own Garbage Collector. In the direct-translation approach, we were able to use OCaml's garbage collector directly, because we re-used the interpreter stack. But in this approach, you have at least to provide a way to scan the roots. And once you write LLVM idiomatic bytecode, you realize how ill suited it is for relocating garbage collection.

TLDR : In the end it is just not worth it to optimize this project for performance. A better approach would be to start from scratch and do a real OCaml -> LLVM compiler for ocamlopt, that would be able to use the full AST with type information.

But even if you did that you'd still have to tackle the Garbage Collection issue, that is not an easy one :)

EDIT : To provide more context, this is a very good post by Xavier Leroy on the do-ability of an OCaml->LLVM compiler, and on its interrest (keep in mind Leroy is conservative about that, but it's probably a good thing).

http://caml.inria.fr/pub/ml-archives/caml-list/2009/03/3a77b...

andrewcooke · on Nov 17, 2012

thanks. stuff like this is what makes hn still worthwhile.