Tuesday, December 4, 2012

Database Management System Chapter 10: XML

 

 

Chapter 10: XML

 

Introduction

 

n XML:  Extensible Markup Language

n Defined by the WWW Consortium (W3C)

n Originally intended as a document markup language not a database language

H Documents have tags giving extra information about sections of the document

4 E.g.  <title> XML </title>  <slide> Introduction …</slide>

H Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML

H Extensible, unlike HTML

4 Users can add new tags, and separately specify how the tag should be handled for display

H Goal was (is?) to replace HTML as the language for publishing documents on the Web

 

n The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents.

H Much of the use of XML has been in data exchange applications, not as a replacement for HTML

n Tags make data (relatively) self-documenting

H E.g.
    
<bank>

                <account> 

               <account-number> A-101     </account-number>

               <branch-name>      Downtown </branch-name>

               <balance>              500         </balance>

                </account>

                <depositor>

               <account-number> A-101    </account-number>

               <customer-name> Johnson </customer-name>

                </depositor>

               </bank>

 

 

 

XML: Motivation

 

n Data interchange is critical in today’s networked world

H Examples:

4 Banking:  funds transfer

4 Order processing (especially inter-company orders)

4 Scientific data

  Chemistry:  ChemML, …
  Genetics:    BSML (Bio-Sequence Markup Language), …

H Paper flow of information between organizations is being replaced by electronic flow of information

n Each application area has its own set of standards for representing information

n XML has become the basis for all new generation data interchange formats

 

n Earlier generation formats were based on plain text with line headers indicating the meaning of fields

H Similar in concept to email headers

H Does not allow for nested structures, no standard “type” language

H Tied too closely to low level document structure (lines, spaces, etc)

n Each XML based standard defines what are valid elements, using

H  XML type specification languages to specify the syntax

4 DTD (Document Type Descriptors)

4 XML Schema

H Plus textual descriptions of the semantics

n XML allows new tags to be defined as required

H However, this may be constrained by DTDs

n A wide variety of tools is available for parsing, browsing and querying XML documents/data

 

 

Structure of XML Data

 

n Tag:  label for a section of data

n Element: section of data beginning with <tagname> and ending with matching </tagname>

n Elements must be properly nested

H Proper nesting

4  <account> … <balance>  …. </balance> </account>

H Improper nesting

4  <account> … <balance>  …. </account> </balance>

H Formally:  every start tag must have a unique matching end tag, that is in the context of the same parent element.

n Every document must have a single top-level element

 

 

Example of Nested Elements

 

    <bank-1>
      <customer>

         <customer-name> Hayes </customer-name>

         <customer-street> Main </customer-street>

         <customer-city>     Harrison </customer-city>

         <account>

       <account-number> A-102 </account-number>

       <branch-name>      Perryridge </branch-name>

       <balance>               400 </balance>

         </account>

          <account>

               …

          </account>

           </customer>
         .
         .

       </bank-1>

 

Motivation for Nesting

 

n Nesting of data is useful in data transfer

H Example:  elements representing customer-id, customer name, and address nested within an order element

n Nesting is not supported, or discouraged, in relational databases

H With multiple orders, customer name and address are stored redundantly

H normalization replaces nested structures in each order by foreign key into table storing customer name and address information

H Nesting is supported in object-relational databases

n But nesting is appropriate when transferring data

H External application does not have direct access to data referenced by a foreign key

 

 

n Mixture of text with sub-elements is legal in XML.

H Example:

     <account>

         This account is seldom used any more.

          <account-number> A-102</account-number>

          <branch-name> Perryridge</branch-name>

          <balance>400 </balance>
</account>

H Useful for document markup, but discouraged for data representation

 

Attributes

 

n Elements can have attributes

H          <account acct-type = “checking” >

                <account-number> A-102 </account-number>

                <branch-name> Perryridge </branch-name>

                <balance> 400 </balance>

                 </account>

n Attributes are specified by  name=value pairs inside the starting tag of an element

n An element may have several attributes, but each attribute name can only occur once

4 <account  acct-type = “checking”  monthly-fee=“5”>

 

Attributes Vs. Subelements

 

n Distinction between subelement and attribute

H In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents

H In the context of data representation, the difference is unclear and may be confusing

4 Same information can be represented in two ways

  <account  account-number = “A-101”>  …. </account>
  <account>
    <account-number>A-101</account-number> …
</account>

H Suggestion: use attributes for identifiers of elements, and use subelements for contents

 

 

More on XML Syntax

   

n Elements without subelements or text content can be abbreviated by ending the start tag with a  />  and deleting the end tag

H <account  number=“A-101” branch=“Perryridge”  balance=“200 />

n To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below

H <![CDATA[<account> … </account>]]>

4 Here, <account> and </account> are treated as just strings

 

 

 

Namespaces

 

n XML data has to be exchanged between organizations

n Same tag name may have different meaning in different organizations, causing confusion on exchanged documents

n Specifying a unique string as an element name avoids confusion

n Better solution: use  unique-name:element-name

n Avoid using long unique names all over document by using XML Namespaces

     <bank Xmlns:FB=‘http://www.FirstBank.com’>
      …

   <FB:branch>

         <FB:branchname>Downtown</FB:branchname>

  <FB:branchcity>    Brooklyn   </FB:branchcity>

   </FB:branch>

    </bank>

 

XML Document Schema

 

n Database schemas constrain what information can be stored, and the data types of stored values

n XML documents are not required to have an associated schema

n However, schemas are very important for XML data exchange

H Otherwise, a site cannot automatically interpret data received from another site

n Two mechanisms for specifying XML schema

H Document Type Definition (DTD)

4 Widely used

H XML Schema

4 Newer, increasing use

 

Document Type Definition (DTD)

                                                                                     

n The type of an XML document can be specified using a DTD

n DTD constraints structure of XML data

H What elements can occur

H What attributes can/must an element have

H What subelements can/must occur inside each element, and how many times.

n DTD does not constrain data types

H All values represented as strings in XML

n DTD syntax

H <!ELEMENT element (subelements-specification) >

H <!ATTLIST   element (attributes)  >

 

Element Specification in DTD

 

n Subelements can be specified as

H names of elements, or

H #PCDATA (parsed character data), i.e., character strings

H EMPTY (no subelements) or ANY (anything can be a subelement)

n Example

   <! ELEMENT depositor (customer-name  account-number)>

     <! ELEMENT customer-name (#PCDATA)>

   <! ELEMENT account-number (#PCDATA)>

n Subelement specification may have regular expressions

  <!ELEMENT bank ( ( account | customer | depositor)+)>

4 Notation:

   “|”   -  alternatives
   “+”  -  1 or more occurrences
   “*”   -  0 or more occurrences

 

 

Bank DTD

 

    <!DOCTYPE bank [

   <!ELEMENT bank ( ( account | customer | depositor)+)>

   <!ELEMENT account (account-number branch-name balance)>

   <! ELEMENT customer(customer-name customer-street
                                                                             customer-city)>

   <! ELEMENT depositor (customer-name account-number)>

   <! ELEMENT account-number (#PCDATA)>

   <! ELEMENT branch-name (#PCDATA)>

   <! ELEMENT balance(#PCDATA)>

   <! ELEMENT customer-name(#PCDATA)>

   <! ELEMENT customer-street(#PCDATA)>

   <! ELEMENT customer-city(#PCDATA)>

    ]>

Attribute Specification in DTD

 

n Attribute specification : for each attribute 

H Name

H Type of attribute

4 CDATA

4 ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)

    more on this later

H Whether 

4 mandatory (#REQUIRED)

4 has a default value (value),

4 or neither (#IMPLIED)

n Examples

H <!ATTLIST account  acct-type CDATA “checking”>

H <!ATTLIST customer

  customer-id   ID          # REQUIRED

  accounts       IDREFS # REQUIRED   >

 

IDs and IDREFs

 

n An element can have at most one attribute of type ID

n The ID attribute value of each element in an XML document must be distinct

H Thus the ID attribute value is an object identifier

n An attribute of type IDREF must contain the ID value of an element in the same document

n An attribute of type IDREFS contains a set of (0 or more) ID values.  Each ID value must contain the ID value of an element in the same document

 

Bank DTD with Attributes

 

n Bank DTD with ID and IDREF attribute types.

          <!DOCTYPE bank-2[

        <!ELEMENT account (branch, balance)>

        <!ATTLIST account

              account-number ID          # REQUIRED

            owners                IDREFS # REQUIRED>

         <!ELEMENT customer(customer-name, customer-street, 

                                                                          customer-city)>

         <!ATTLIST customer

             customer-id        ID          # REQUIRED

             accounts            IDREFS # REQUIRED>

          … declarations for branch, balance, customer-name,
                                    customer-street and customer-city
]>

 

XML data with ID and IDREF attributes

          <bank-2>

          <account account-number=“A-401” owners=“C100 C102”>

              <branch-name> Downtown </branch-name>

             <balance>          500 </balance>

          </account>

          <customer customer-id=“C100” accounts=“A-401”>

              <customer-name>Joe         </customer-name>

             <customer-street> Monroe  </customer-street>

             <customer-city>     Madison</customer-city>

          </customer>

          <customer customer-id=“C102” accounts=“A-401 A-402”>

              <customer-name> Mary     </customer-name>

             <customer-street> Erin       </customer-street>

             <customer-city>     Newark </customer-city>

          </customer>

    </bank-2>

Limitations of DTDs

 

n No typing of text elements and attributes

H All values are strings, no integers, reals, etc.

n Difficult to specify unordered sets of subelements

H Order is usually irrelevant in databases

H (A | B)* allows specification of an unordered set, but

4 Cannot ensure that each of A and B occurs only once

n IDs and IDREFs are untyped

H The owners attribute of an account may contain a reference to another account, which is meaningless

4 owners attribute should ideally be constrained to refer to customer elements

XML Schema

 

n XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs.  Supports

H Typing of values

4 E.g. integer, string, etc

4 Also, constraints on min/max values

H User defined types

H Is itself specified in XML syntax, unlike DTDs

4 More standard representation, but verbose

H Is integrated with namespaces

H Many more features

4 List types, uniqueness and foreign key constraints, inheritance ..

n BUT:  significantly more complicated than DTDs, not yet widely used.

 

XML Schema Version of Bank DTD

 

<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xsd:element name=“bank” type=“BankType”/>

<xsd:element name=“account”>
<xsd:complexType>
      <xsd:sequence>
            <xsd:element name=“account-number” type=“xsd:string”/>
            <xsd:element name=“branch-name”      type=“xsd:string”/>
            <xsd:element name=“balance”               type=“xsd:decimal”/>
      </xsd:squence>
</xsd:complexType>

</xsd:element>

….. definitions of customer and depositor ….

<xsd:complexType name=“BankType”>
<xsd:squence>

<xsd:element ref=“account”   minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xsd:sequence>

</xsd:complexType>

</xsd:schema>

 

Querying and Transforming XML Data

 

n Translation of information from one XML schema to another

n Querying on XML data

n Above two are closely related, and handled by the same tools

n Standard XML querying/translation languages

H XPath

4 Simple language consisting of path expressions

H XSLT

4 Simple language designed for translation from XML to XML and XML to HTML

H XQuery

4 An XML query language with a rich set of features

n Wide variety of other languages have been proposed, and some served as basis for the Xquery standard

H XML-QL, Quilt, XQL, …

 

Tree Model of XML Data

 

n Query and transformation languages are based on a tree model of XML data

n An XML document is modeled as a tree, with nodes corresponding to elements and attributes

H Element nodes have children nodes, which can be attributes or subelements

H Text in an element is modeled as a text node child of the element

H Children of a node are ordered according to their order in the XML document

H Element and attribute nodes (except for the root node) have a single parent, which is an element node

H The root node has a single child, which is the root element of the document

n We use the terminology of nodes, children, parent, siblings, ancestor, descendant, etc., which should be interpreted in the above tree model of XML data.

Xpath

 

n XPath is used to address (select) parts of documents using
path expressions

n A path expression is a sequence of steps separated by “/”

H Think of file names in a directory hierarchy

n Result of path expression:  set of values that along with their containing elements/attributes match the specified path

n E.g.       /bank-2/customer/customer-name   evaluated on the bank-2 data we saw earlier returns

<customer-name>Joe</customer-name>

<customer-name>Mary</customer-name>

n E.g.       /bank-2/customer/customer-name/text( )

        returns the same names, but without the enclosing tags

 

 

 

n The initial “/” denotes root of the document (above the top-level tag)

n Path expressions are evaluated left to right

H Each step operates on the set of instances produced by the previous step

n Selection predicates may follow any step in a path, in [ ]

H E.g.    /bank-2/account[balance > 400]

4 returns account elements with a balance value greater than 400

4 /bank-2/account[balance]  returns account elements containing a balance subelement

n Attributes are accessed using “@”

H E.g.  /bank-2/account[balance > 400]/@account-number

4 returns the account numbers of those accounts with balance > 400

H IDREF attributes are not dereferenced automatically (more on this later)

Functions in Xpath

 

n XPath provides several functions

H The function count()  at the end of a path counts the number of elements in the set generated by the path

4 E.g. /bank-2/account[customer/count() > 2]

  Returns accounts with > 2 customers

H Also function for testing position (1, 2, ..) of node w.r.t. siblings

n Boolean connectives and and or and function not() can be used in predicates

n IDREFs can be referenced using function id()  

H id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks

H E.g.  /bank-2/account/id(@owner)

4 returns all customers referred to from the owners attribute of account elements.

 

More XPath Features

 

n Operator “|” used to implement union

H E.g.  /bank-2/account/id(@owner)  |  /bank-2/loan/id(@borrower)

4 gives customers with either accounts or loans

4 However, “|” cannot be nested inside other operators.

n “//” can be used to skip multiple levels of nodes

H E.g.  /bank-2//customer-name

4 finds any customer-name element anywhere  under the /bank-2 element, regardless of the element in which it is contained.

n A step in the path can go to:

parents, siblings, ancestors and descendants

of the nodes generated by the previous step, not just to the children

H “//”, described above, is a short from for specifying “all descendants”

H “..” specifies the parent.

H We omit further details,

 

XSLT

 

n A stylesheet stores formatting options for a document, usually separately from document

H E.g. HTML style sheet may specify font colors and sizes for headings, etc.

n The XML Stylesheet Language (XSL) was originally designed for generating HTML from XML

n XSLT is a general-purpose transformation language

H Can translate XML to XML, and XML to HTML

n XSLT transformations are expressed using rules called templates

H Templates combine selection using XPath with construction of results

 

XSLT Templates

 

n Example of XSLT template with   match  and  select  part

        <xsl:template match=“/bank-2/customer”>

        <xsl:value-of select=“customer-name”/>

        </xsl:template>

        <xsl:template match=“*”/>

n The match attribute of xsl:template specifies a pattern in XPath

n Elements in the XML document matching the pattern are processed by the actions within the xsl:template element

H xsl:value-of selects (outputs) specified values (here, customer-name)

n For elements that do not match any template

H Attributes and text contents are output as is

H Templates are recursively applied on subelements

n The  <xsl:template match=“*”/> template matches all
elements that do not match any other template

H Used to ensure that their contents do not get output.

 

n If an element matches several templates, only one is used

H Which one depends on a complex priority scheme/user-defined priorities

H We assume only one template matches any element

 

 

Creating XML Output

 

n Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is

n E.g. to wrap results in new XML elements.

          <xsl:template match=“/bank-2/customer”>

          <customer>

          <xsl:value-of select=“customer-name”/>

          </customer>

          </xsl;template>

          <xsl:template match=“*”/>

H Example output:
      
<customer> Joe   </customer>
       <customer> Mary </customer>

 

 

n Note: Cannot directly insert a xsl:value-of tag inside another tag

H E.g. cannot create an attribute for <customer> in the previous example by directly using xsl:value-of

H XSLT provides a construct  xsl:attribute to handle this situation

4 xsl:attribute adds attribute to the preceding element

4 E.g.  <customer>

              <xsl:attribute name=“customer-id”>

                   <xsl:value-of select = “customer-id”/>

              </xsl:attribute>

                 </customer>

    results in output of the form  

              <customer  customer-id=“….”> ….

n xsl:element is used to create output elements with computed names

 

Structural Recursion

Joins in XSLT

Sorting in XSLT

 

n Using an xsl:sort directive inside a template causes all elements matching the template to be sorted

H Sorting is done before applying other templates   

n E.g. 
<xsl:template match=“/bank”>
      <xsl:apply-templates select=“customer”>
     
<xsl:sort select=“customer-name”/>
      </xsl:apply-templates>
</xsl:template>
<xsl:template match=“customer”>
      <customer>
                <xsl:value-of select=“customer-name”/>
                <xsl:value-of select=“customer-street”/>
                <xsl:value-of select=“customer-city”/>      
      </customer>
<xsl:template>
<xsl:template match=“*”/>

 

Xquery

 

n  XQuery is a general purpose query language for XML data

n  Currently being standardized by the World Wide Web Consortium (W3C)

H The textbook description is based on a March 2001 draft of the standard.  The final version may differ, but major features likely to stay unchanged.

n  Alpha version of XQuery engine available free from Microsoft

n  XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL

n  XQuery uses a 
      
for … let … where .. result
syntax
     for     
ó SQL from
     where
ó SQL where
     result 
ó SQL select
     let allows temporary variables, and has no equivalent in SQL

 

FLWR Syntax in XQuery

 

n For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath

n Simple FLWR expression in XQuery

H find all accounts with balance > 400, with each result enclosed in an <account-number> .. </account-number> tag
   
 for      $x in /bank-2/account
  
 let         $acctno := $x/@account-number
     where $x/balance > 400
     return <account-number> $acctno </account-number>

n Let clause not really needed in this query, and selection can be done In XPath.  Query can be written as:

          for $x in /bank-2/account[balance>400]
      return <account-number> $x/@account-number
                                                           </account-number>

 

 

 

 

Path Expressions and Functions

 

n Path expressions are used to bind variables in the for clause, but can also be used in other places

H E.g. path expressions can be used in let clause, to bind variables to results of path expressions

n The function distinct( ) can be used to removed duplicates in path expression results

n The function document(name) returns root of named document

H E.g.   document(“bank-2.xml”)/bank-2/account

n Aggregate functions such as sum( ) and count( ) can be applied to path expression results

n XQuery does not support group by, but the same effect can be got by nested queries, with nested FLWR expressions within a result clause

H More on nested queries later

 

Joins

 

n Joins are specified in a manner very similar to SQL

for
$a  in  /bank/account,

          $c  in  /bank/customer,

          $d  in  /bank/depositor

     where   $a/account-number = $d/account-number
      and $c/customer-name = $d/customer-name

     return <cust-acct> $c $a </cust-acct>

n The same query can be expressed with the selections specified as XPath selections:

       for $a in /bank/account
         $c in /bank/customer
      $d in /bank/depositor[
                      account-number = $a/account-number and
                      customer-name  = $c/customer-name
]

       return <cust-acct> $c $a</cust-acct>

 

 

 

 

 

Changing Nesting Structure

 

n The following query converts data from the flat structure for bank  information into the nested structure used in bank-1

      <bank-1>

    for $c in /bank/customer

    return

  <customer>

     $c/*

     for $d in /bank/depositor[customer-name = $c/customer-name],

           $a in /bank/account[account-number=$d/account-number]

     return $a

       </customer>

     </bank-1>

n $c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag

n Exercise for reader: write a nested query to find sum of account
balances, grouped by branch.

 

XQuery Path Expressions

 

n $c/text() gives text content of an element without any
subelements/tags

n XQuery path expressions support the “–>” operator for dereferencing IDREFs

H Equivalent to the id( ) function of XPath, but simpler to use

H Can be applied to a set of IDREFs to get a set of results

H June 2001 version of standard has changed   “–>” to “=>”

 

Sorting in XQuery

 

n Sortby clause can be used at the end of any expression.  E.g. to return customers sorted by name
   
for $c in /bank/customer
    return <customer> $c/* </customer> sortby(name)

n Can sort at multiple levels of nesting (sort  by customer-name, and by account-number within each customer)

        <bank-1>
   for $c in /bank/customer
   return
      <customer>
          $c/*
          for $d in /bank/depositor[customer-name=$c/customer-name],
                $a in /bank/account[account-number=$d/account-number]
         return <account> $a/* </account> sortby(account-number)
      </customer> sortby(customer-name)

        </bank-1>

 

Functions and Other XQuery Features

 

n User defined functions with the type system of XMLSchema
 
function balances(xsd:string $c) returns list(xsd:numeric) {
     for $d in /bank/depositor[customer-name = $c],
           $a in /bank/account[account-number=$d/account-number]
     return $a/balance

       }

n Types are optional for function parameters and return values

n Universal and existential quantification in where clause predicates

H some $e in path satisfies P    

H every $e in path satisfies P

n XQuery also supports If-then-else clauses

 

Application Program Interface

 

n There are two standard application program interfaces to XML data:

H SAX (Simple API for XML)

4 Based on parser model, user provides event handlers for parsing events

  E.g. start of element, end of element
  Not suitable for database applications

H DOM (Document Object Model)

4 XML data is parsed into a tree representation

4 Variety of functions provided for traversing the DOM tree

4 E.g.:  Java DOM API provides Node class with methods
         
getParentNode( ), getFirstChild( ), getNextSibling( )
          getAttribute( ), getData( ) (for text node)
          getElementsByTagName( ), …

4 Also provides functions for updating DOM tree

 

 

Storage of XML Data

 

n XML data can be stored in

H Non-relational data stores

4 Flat files

  Natural for storing XML
  But has all problems discussed in Chapter 1 (no concurrency, no recovery, …)

4 XML database

  Database built specifically for storing XML data, supporting DOM model and declarative querying
  Currently no commercial-grade systems

H Relational databases

4 Data must be translated into relational form

4 Advantage:  mature database systems

4 Disadvantages: overhead of translating data and queries

 

Storage of XML in Relational Databases

 

n Alternatives:

H String Representation

H Tree Representation

H Map to relations

 

String Representation

 

n Store each top level element as a string field of a tuple in a relational database

H Use a single relation to store all elements, or

H Use a separate relation for each top-level element type

4 E.g.  account, customer, depositor relations

  Each with a string-valued attribute to store the element

n Indexing:

H Store values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields

4 E.g. customer-name or account-number

H Oracle 9 supports function indices which use the result of a function as the key value.

4 The function should return the value of the required subelement/attribute

 

n Benefits:

H Can store any XML data even without DTD

H As long as there are many top-level elements in a document, strings are small compared to full document

4 Allows fast access to individual elements.

n Drawback: Need to parse strings to access values inside the elements

H Parsing is slow.

 

Tree Representation

 

n Tree representation:  model XML data as tree and store using relations
       
nodes(id, type, label, value)
        child  (child-id, parent-id)

 

 

 

 

 

 

 

 

 

 

n Each element/attribute is given a unique identifier

n Type indicates element/attribute

n Label specifies the tag name of the element/name of attribute

n Value is the text value of the element/attribute

n The relation child  notes the parent-child relationships in the tree

H Can add an extra attribute to child  to record ordering of children

 

n Benefit: Can store any XML data, even without DTD

n Drawbacks:

H Data is broken up into too many pieces, increasing space overheads

H Even simple queries require a large number of joins, which can be slow

   

 

 

 

Mapping XML Data to Relations

 

n Map to relations

H If DTD of document is known, can map data to relations

H A relation is created for each element type

4 Elements (of type #PCDATA), and attributes are mapped to attributes of relations

4 More details on next slide …

n Benefits:

H Efficient storage

H Can translate XML queries into SQL, execute efficiently, and then translate SQL results back to XML

n Drawbacks: need to know DTD, translation overheads still present

 

n Relation created for each element type contains

H An id attribute to store a unique id for each element

H A relation attribute corresponding to each element attribute

H A parent-id attribute to keep track of parent element

4 As in the tree representation

4 Position information (ith  child) can be store too

n All subelements that occur only once can become relation attributes

H For text-valued subelements, store the text as attribute value

H For complex subelements, can store the id of the subelement

n Subelements that can occur multiple times represented in a separate table

H Similar to handling of multivalued attributes when converting ER diagrams to tables

 

n E.g. For bank-1 DTD with account elements nested within customer elements, create relations

H customer(id, parent-id, customer-name, customer-stret, customer-city)

4 parent-id can be dropped here since parent is the sole root element

4 All other attributes were subelements of type #PCDATA, and occur only once

H account (id, parent-id, account-number, branch-name, balance)

4 parent-id keeps track of which customer an account occurs under

4 Same account may be represented many times with different parents

 

 

No comments:

Post a Comment