[XAR2] New implementation of Blocklayout compiler
- From: "Jason" <judgej (at) xaraya.com>
- Date: Fri, 3 Nov 2006 20:00:14 -0000
"Marcel van der Boom" <marcel (at) hsdev.com> wrote in message
news:eig3mf$v8v$1 (at) newton.xaraya.com...
> Jason wrote:
>> The available entities (the ones people will use) should be defined in
>> the DTD for XHTML. Could we just pull those in?
>>
> We can, but that would be a bit wasteful (because there are LOTS for xhtml
> for example) We have to see how this one develops. To get going at least,
> i've been using ( i think two in all of core ) numeric entities.
> (  ) being the 99% one, the other i already forgot.
I guess I would need to look at the code for this one, as I am not sure how
it is handled. It would be wasteful to load the whole lot into memory, just
to access a couple of entities, but on the other hand, leaving a few out
arbitrarily could introduce problems of its own for developers. I have a
poster in front of my desk with the most-used entities (a couple of hundred)
and find it very useful when I need something special in a web page. I also
don't like to have to think about whether the browser will support it or
not. I could use numeric entities, but the names are a little more
self-documenting.
If I have any suggestions on ways of handling these, I'll certainly let you
know.
>> If someone used £ in a template say, and the output page was XHTML
>> UTF-8, then what would (or should) appear in the output page? Would it be
>> '?' or £? My guess would be the former.
>>
> Oh my, there are not many people who can answer this correctly and
> precise, there is no single correct answer either, there are so many
> factors involved.
>
> Let me say what i know, lots of gaps probably.
>
> If the input xml contains £ and assuming it is defined, either
> explictly (in the xsl later on) or by a reference to a DOCTYPE somewhere.
> Let's say it is defined as Ӓ (meaning unicode character 1234 )
> that's it, input wise. XSLT doesnt need to know more (nor that it is
> actually a pound sign) For the input, for example: € € and ?
> are *exactly* the same. (lets hope that all comes through ok :-) )
>
> XSLT does its work based on its XSL transformation and gets character 1234
> in. What it does with that from now on, becomes dependent on a number of
> things and the tools used. One thing it needs a least is the definition of
> £ either deliverd by the doctype of the input or defined in the
> transformation itself.
>
> The transform as such just puts character number 1234 in the result
> document, done. (assuming the template matches go through and all that)
>
> The last bit of the transform is (most likely) serialising the result
> document into a stream of bytes, like an XML document in some encoding.
> The output document may or may not contain a doctype, over which you have
> some control in XSL. This doesn not change what bytes are put into the
> output, the encoding decides that.
>
> That doc in turn is then sent off to its destination (say, a browser)
> which renders the bytes into something we can look at. We all know the
> last bit is very different for different browsers even if given the exact
> same set of bytes.
>
> Now the fun begins:
> - if character 1234 isnt available in the character table the browser
> uses -> boom!
> - if the character is available, but your font doesnt occupy it -> boom!
> - if the encoding of the xml document is out of reach of your browser -->
> boom!
> - if character 1234 is not in the doctype or there is no doctype, it cant
> replace &1234; with anything possibly more comfortable.
>
> Lets say all of the above is taken care of, then what is actually
> displayed when you look at a textual representation of the document (which
> in itself is a new transformation) depends again on your tool.
>
> - it can be Ӓ (browser leaves it alone)
> - it can be £ (browser looked it up in documents doctype)
> - it can be &blah; (browser looked it up, result doctype defined it
> differently)
>
> the best way to cope with entities is not to pay attention too _long_ in
> my experience. They dont exist, everything is a character number and
> entities are just labels "at that given time". Bit of a "schrödingers cat"
> situation. :-) As soon as you look, it's something else :-)
>
> In practice, i think most browsers would shouw you £ or Ӓ in
> their source, more likely if the doctype is one of the w3 doctypes (forgot
> the link)
>
> I've been to my neck in xml for a lot of years and i often get it wrong,
> still. :-)
>
> hope this helps.
Yes it does - thanks! The long answer to what-appears-to-be a simple
question reveals an aweful lot about how the how thing works.
-- JJ
_______________________________________________
Xaraya_devel mailing list
Xaraya_devel (at) xaraya.com
http://xaraya.com/mailman/listinfo/xaraya_devel