By Date: <-- -->
By Thread: <-- -->

[XAR2] New implementation of Blocklayout compiler



"Marcel van der Boom" <marcel (at) hsdev.com> wrote in message 
news:eig3mf$v8v$1 (at) newton.xaraya.com...
> Jason wrote:
>> The available entities (the ones people will use) should be defined in 
>> the DTD for XHTML. Could we just pull those in?
>>
> We can, but that would be a bit wasteful (because there are LOTS for xhtml 
> for example) We have to see how this one develops. To get going at least, 
> i've been using ( i think two in all of core ) numeric entities. &nbsp; 
> ( &#160;) being the 99% one, the other i already forgot.

I guess I would need to look at the code for this one, as I am not sure how 
it is handled. It would be wasteful to load the whole lot into memory, just 
to access a couple of entities, but on the other hand, leaving a few out 
arbitrarily could introduce problems of its own for developers. I have a 
poster in front of my desk with the most-used entities (a couple of hundred) 
and find it very useful when I need something special in a web page. I also 
don't like to have to think about whether the browser will support it or 
not. I could use numeric entities, but the names are a little more 
self-documenting.

If I have any suggestions on ways of handling these, I'll certainly let you 
know.

>> If someone used &pound; in a template say, and the output page was XHTML 
>> UTF-8, then what would (or should) appear in the output page? Would it be 
>> '?' or &pound;? My guess would be the former.
>>
> Oh my, there are not many people who can answer this correctly and 
> precise, there is no single correct answer either, there are so many 
> factors involved.
>
> Let me say what i know, lots of gaps probably.
>
> If the input xml contains &pound; and assuming it is defined, either 
> explictly (in the xsl later on) or by a reference to a DOCTYPE somewhere. 
> Let's say it is defined as &#1234; (meaning unicode character 1234 ) 
> that's it, input wise. XSLT doesnt need to know more (nor that it is 
> actually a pound sign) For the input, for example: &#x20AC; &euro; and ? 
> are *exactly* the same. (lets hope that all comes through ok :-) )
>
> XSLT does its work based on its XSL transformation and gets character 1234 
> in. What it does with that from now on, becomes dependent on a number of 
> things and the tools used. One thing it needs a least is the definition of 
> &pound; either deliverd by the doctype of the input or defined in the 
> transformation itself.
>
> The transform as such just puts character number 1234 in the result 
> document, done. (assuming the template matches go through and all that)
>
> The last bit of the transform is (most likely) serialising the result 
> document into a stream of bytes, like an XML document in some encoding. 
> The output document may or may not contain a doctype, over which you have 
> some control in XSL. This doesn not change what bytes are put into the 
> output, the encoding decides that.
>
> That doc in turn is then sent off to its destination (say, a browser) 
> which renders the bytes into something we can look at. We all know the 
> last bit is very different for different browsers even if given the exact 
> same set of bytes.
>
> Now the fun begins:
> - if character 1234 isnt available in the character table the browser 
> uses -> boom!
> - if the character is available, but your font doesnt occupy it -> boom!
> - if the encoding of the xml document is out of reach of your browser --> 
> boom!
> - if character 1234 is not in the doctype or there is no doctype, it cant 
> replace &1234; with anything possibly more comfortable.
>
> Lets say all of the above is taken care of, then what is actually 
> displayed when you look at a textual representation of the document (which 
> in itself is a new transformation) depends again on your tool.
>
> - it can be &#1234; (browser leaves it alone)
> - it can be &pound; (browser looked it up in documents  doctype)
> - it can be &blah; (browser looked it up, result doctype defined it 
> differently)
>
> the best way to cope with entities is not to pay attention too _long_ in 
> my experience. They dont exist, everything is a character number and 
> entities are just labels "at that given time". Bit of a "schrödingers cat" 
> situation. :-) As soon as you look, it's something else :-)
>
> In practice, i think most browsers would shouw you &pound; or &#1234; in 
> their source, more likely if the doctype is one of the w3 doctypes (forgot 
> the link)
>
> I've been to my neck in xml for a lot of years and i often get it wrong, 
> still. :-)
>
> hope this helps.

Yes it does - thanks! The long answer to what-appears-to-be a simple 
question reveals an aweful lot about how the how thing works.

-- JJ


_______________________________________________
Xaraya_devel mailing list
Xaraya_devel (at) xaraya.com
http://xaraya.com/mailman/listinfo/xaraya_devel