I am generating (X)HTML that contains some JavaScript, and the scripts often contain characters that are not XML. Is there a way to still use scala.xml for this stuff?
Yes there is. This task is so common nowadays that we decided to support it using a special Unparsed. We even tweaked XML syntax for it, with the xml:unparsed tag
object scriptgen { def main(args:Array[String]) = { // syntax error in Scala //val wrong = <script><!-- if(1 < 2) alert("scala.xml rocks"); --></script> // syntax error in JavaScript: the < will be escaped to < //val wrong = <script><![CDATA[ if(1 < 2) alert("scala.xml rocks"); ]]></script> // 1. solution wrap in a comment val x0 = <script><!-- if(1 < 2) alert("scala.xml rocks"); --></script> // toString(true) preserves comments Console.println(x0.toString(true)) // 2. solution, use xml.Unparsed to include verbatim val x1 = <script> { scala.xml.Unparsed("if(1 < 2) alert(\"scala.xml rocks\");") } </script> Console.println(x1.toString) // convenience for 2nd solution: use xml:unparsed val x2 = <script> <xml:unparsed> if(1 < 2) alert("scala.xml rocks"); </xml:unparsed> </script> Console.println(x2) } }
I am parsing existing XHTML pages. Whenever the pages encounter:
<script>
// <![CDATA[
foo && bar
// ]]>
</script>
Comes out:
<script>
//
foo && bar
//
</script>
Is there a way to preserve the CDATA?
No there is not. Try to either construct an appropriate Unparsed node on parsing, or make a transformation of your document such that <script>{node}</script> becomes <script>{Unparsed(node.data)}</script>.