nTranslation of information from one XML schema to another
nQuerying on XML data
nAbove two are closely related, and handled by the same tools
nStandard XML querying/translation languages
HXPath
4Simple language consisting of path expressions
HXSLT
4Simple language designed for translation from XML to XML and XML to HTML
HXQuery
4An XML query language with a rich set of features
nWide variety of other languages have been proposed, and some served as basis for the Xquery standard
HXML-QL, Quilt, XQL, …
Tree Model of XML Data
nQuery and transformation languages are based on a tree model of XML data
nAn XML document is modeled as a tree, with nodes corresponding to elements and attributes
HElement nodes have children nodes, which can be attributes or subelements
HText in an element is modeled as a text node child of the element
HChildren of a node are ordered according to their order in the XML document
HElement and attribute nodes (except for the root node) have a single parent, which is an element node
HThe root node has a single child, which is the root element of the document
nWe use the terminology of nodes, children, parent, siblings, ancestor, descendant, etc., which should be interpreted in the above tree model of XML data.
Xpath
nXPath is used to address (select) parts of documents using path expressions
nA path expression is a sequence of steps separated by “/”
HThink of file names in a directory hierarchy
nResult of path expression: set of values that along with their containing elements/attributes match the specified path
nE.g. /bank-2/customer/customer-name evaluated on the bank-2 data we saw earlier returns
<customer-name>Joe</customer-name>
<customer-name>Mary</customer-name>
nE.g. /bank-2/customer/customer-name/text( )
returns the same names, but without the enclosing tags
nThe initial “/” denotes root of the document (above the top-level tag)
nPath expressions are evaluated left to right
HEach step operates on the set of instances produced by the previous step
nSelection predicates may follow any step in a path, in [ ]
HE.g. /bank-2/account[balance > 400]
4returns account elements with a balance value greater than 400
4/bank-2/account[balance] returns account elements containing a balance subelement
nXQuery is a general purpose query language for XML data
nCurrently being standardized by the World Wide Web Consortium (W3C)
HThe textbook description is based on a March 2001 draft of the standard. The final version may differ, but major features likely to stay unchanged.
nAlpha version of XQuery engine available free from Microsoft
nXQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL
nXQuery uses a for … let … where .. result … syntax foró SQL from whereó SQL where resultó SQL select let allows temporary variables, and has no equivalent in SQL
FLWR Syntax in XQuery
nFor clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath
nSimple FLWR expression in XQuery
Hfind all accounts with balance > 400, with each result enclosed in an <account-number> .. </account-number> tag for$x in /bank-2/account let $acctno := $x/@account-number where $x/balance > 400 return <account-number> $acctno </account-number>
nLet clause not really needed in this query, and selection can be done In XPath. Query can be written as:
for $x in /bank-2/account[balance>400] return <account-number> $x/@account-number </account-number>
Path Expressions and Functions
nPath expressions are used to bind variables in the for clause, but can also be used in other places
HE.g. path expressions can be used in let clause, to bind variables to results of path expressions
nThe function distinct( ) can be used to removed duplicates in path expression results
nThe functiondocument(name)returns root of named document
HE.g. document(“bank-2.xml”)/bank-2/account
nAggregate functions such as sum( ) and count( ) can be applied to path expression results
nXQuery does not support group by, but the same effect can be got by nested queries, with nested FLWR expressions within a result clause
HMore on nested queries later
Joins
nJoins are specified in a manner very similar to SQL for $a in /bank/account,
$c in/bank/customer,
$d in /bank/depositor
where $a/account-number = $d/account-number and $c/customer-name = $d/customer-name
return <cust-acct> $c $a </cust-acct>
nThe same query can be expressed with the selections specified as XPath selections:
for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account-number = $a/account-number and customer-name = $c/customer-name]
return <cust-acct> $c $a</cust-acct>
Changing Nesting Structure
nThe following query converts data from the flat structure for bank information into the nested structure used in bank-1
<bank-1>
for $c in /bank/customer
return
<customer>
$c/*
for $d in /bank/depositor[customer-name = $c/customer-name],
$a in /bank/account[account-number=$d/account-number]
return $a
</customer>
</bank-1>
n$c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag
nExercise for reader: write a nested query to find sum of account balances, grouped by branch.
XQuery Path Expressions
n$c/text() gives text content of an element without any subelements/tags
nXQuery path expressions support the “–>” operator for dereferencing IDREFs
HEquivalent to the id( ) function of XPath, but simpler to use
HCan be applied to a set of IDREFs to get a set of results
HJune 2001 version of standard has changed “–>” to “=>”
Sorting in XQuery
nSortby clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer return <customer> $c/* </customer> sortby(name)
nCan sort at multiple levels of nesting (sort by customer-name, and by account-number within each customer)
<bank-1> for $c in /bank/customer return <customer> $c/* for $d in /bank/depositor[customer-name=$c/customer-name], $a in /bank/account[account-number=$d/account-number] return <account> $a/* </account> sortby(account-number) </customer> sortby(customer-name)
</bank-1>
Functions and Other XQuery Features
nUser defined functions with the type system of XMLSchema function balances(xsd:string $c) returns list(xsd:numeric) { for $d in /bank/depositor[customer-name = $c], $a in /bank/account[account-number=$d/account-number] return $a/balance
}
nTypes are optional for function parameters and return values
nUniversal and existential quantification in where clause predicates
Hsome $e inpathsatisfiesP
Hevery $e inpathsatisfiesP
nXQuery also supports If-then-else clauses
Application Program Interface
nThere are two standard application program interfaces to XML data:
HSAX (Simple API for XML)
4Based on parser model, user provides event handlers for parsing events
–E.g. start of element, end of element
–Not suitable for database applications
HDOM (Document Object Model)
4XML data is parsed into a tree representation
4Variety of functions provided for traversing the DOM tree
4E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), …
4Also provides functions for updating DOM tree
Storage of XML Data
nXML data can be stored in
HNon-relational data stores
4Flat files
–Natural for storing XML
–But has all problems discussed in Chapter 1 (no concurrency, no recovery, …)
4XML database
–Database built specifically for storing XML data, supporting DOM model and declarative querying
–Currently no commercial-grade systems
HRelational databases
4Data must be translated into relational form
4Advantage: mature database systems
4Disadvantages: overhead of translating data and queries
Storage of XML in Relational Databases
nAlternatives:
HString Representation
HTree Representation
HMap to relations
String Representation
nStore each top level element as a string field of a tuple in a relational database
HUse a single relation to store all elements, or
HUse a separate relation for each top-level element type
4E.g. account, customer, depositor relations
–Each with a string-valued attribute to store the element
nIndexing:
HStore values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields
4E.g. customer-name or account-number
HOracle 9 supports function indices which use the result of a function as the key value.
4The function should return the value of the required subelement/attribute
nBenefits:
HCan store any XML data even without DTD
HAs long as there are many top-level elements in a document, strings are small compared to full document
4Allows fast access to individual elements.
nDrawback: Need to parse strings to access values inside the elements
HParsing is slow.
Tree Representation
nTree representation: model XML data as tree and store using relations nodes(id, type, label, value) child (child-id, parent-id)
nEach element/attribute is given a unique identifier
nType indicates element/attribute
nLabel specifies the tag name of the element/name of attribute
nValue is the text value of the element/attribute
nThe relation child notes the parent-child relationships in the tree
HCan add an extra attribute to child to record ordering of children
nBenefit: Can store any XML data, even without DTD
nDrawbacks:
HData is broken up into too many pieces, increasing space overheads
HEven simple queries require a large number of joins, which can be slow
Mapping XML Data to Relations
nMap to relations
HIf DTD of document is known, can map data to relations
HA relation is created for each element type
4Elements (of type #PCDATA), and attributes are mapped to attributes of relations
4More details on next slide …
nBenefits:
HEfficient storage
HCan translate XML queries into SQL, execute efficiently, and then translate SQL results back to XML
nDrawbacks: need to know DTD, translation overheads still present
nRelation created for each element type contains
HAn id attribute to store a unique id for each element
HA relation attribute corresponding to each element attribute
HA parent-id attribute to keep track of parent element
4As in the tree representation
4Position information (ith child) can be store too
nAll subelements that occur only once can become relation attributes
HFor text-valued subelements, store the text as attribute value
HFor complex subelements, can store the id of the subelement
nSubelements that can occur multiple times represented in a separate table
HSimilar to handling of multivalued attributes when converting ER diagrams to tables
nE.g. For bank-1 DTD with account elements nested within customer elements, create relations
No comments:
Post a Comment