Oh, oh, someone is interested in XJC internals. I might be of some help since I've probably developed more JAXB plugins than anyone else (see JAXB2 Basics for instance)
Ok, let's start. In XJC the schema compiler does approximately following
- Parses the schema
- Creates the model of the schema (CClass, CPropertyInfo etc.)
- Creates the outline (ClassOutline, FieldOutline etc.)
- Renders the code model (JClass, JDefinedClass, JMethod etc.)
- Writes the physical code (ex.Java files on the disk)
Let's start with the last two.
Java files don't need explanation, I hope.
Code model is also a relativesly easy thing. It is an API which can be used to construct Java code programmatically. You could just use string concatination instead, but it's much more error-prone. With CodeModel you're almost guaranteed to get at least grammatically correct Java code. So I hope this part is also clear. (By the way, I like CodeModel very much. I recently wrote JavaScript Code Model based on ideas from the CodeModel.)
Let's now look at the "model" and the "outline".
Model is the result of parsing the incoming schema. It models the constructs of the incoming schema, mainly in terms of "classes" which corresponds to complex types and "properties" which corresponds to elements, attributes and values (ex. when you have a complex type with simple content).
The model should be understand as a logical modelling construct close to XML and schema. As such, it just describes types and properties that they have. It's surely much more complex that how I'm describing it, there's all sorts of exceptions and caveats - starting from wilcard types (xsd:any), substitution groups, enums, built-in types and so on.
Quite interestingly, a sibling of Model
is RuntimeTypeInfoSetImpl
which is used by JAXB in the runtime. So it's also a type of model - which is however not parsed from the XML Schema but rather from JAXB annotations in classes. The concept is the same. Both Model and RuntimeTypeInfoSetImpl
implement the TypeInfoSet
interface which is a super-construct. Check interfaces like ClassInfo
and PropertyInfo
- they have implementation both for compile-time (CClassInfo
and CPropertyInfo
in XJC) and run-time (RuntimeClassInfoImpl
etc. for JAXB RI).
Ok, so when XJC parsed and analysed the schema, you've got the Model
. This Model
can't produce the code yet. There are, indeed, different strategies of producing the code. You can generate just annotated classes or you can generate interface/implementing class pair like in JAXB 1. The whole code generation thing isn't actually the task of the model. Moreover, there is a number of aspects that are relevant to the physical nature of the Java code, but aren't really relevant for the model. For instance, you have to group classes into packages. This is driven by the packing system of Java, not by the properties of the model itself.
And this is where outlines come into play. You can see outlines as step between the schema model and the code model. You can view outlines as factories for code model elements responsible for organization of the code and generation of JDefinedClass
es from CClassInfo
s.
So you're right, it is indeed very complicated. I am not a Sun/Oracle employee, I did not design it (I know the person who did it, though and respect him very much).
I can guess a couple of reasons for certain design decisions, for instance:
- Use the same interfaces for compile-time and run-time models
- Allow different strategies of code generation
- Allow plugins to manipulate the created model
I agree that this design is very complicated, but it has its reasons. One proof for that is that it was actually possible to build a mapping generator for XML-to-JavaScript mappings - basically on the same models. I just had to replace the code generation leaving schema analysis intact. (See Jsonix for that.)
Ok, hopefully I shed some light on why things in XJC are how they are. Good luck with these APIs, they're not straghtforward. Feel free to check existing open-source code, there's a lot of examples available.
ps. Really always wanted to write this. :)