Tuesday, December 4, 2012

Database Management System Chapter-8

 

 

Chapter 8:  Object-Oriented Databases

 

n Need for Complex Data Types

n The Object-Oriented Data Model

n Object-Oriented Languages

n Persistent Programming Languages

n Persistent C++ Systems

 

Need for Complex Data Types

 

n Traditional database applications in data processing had conceptually simple data types

é Relatively few data types, first normal form holds

n Complex data types have grown more important in recent years

é E.g.  Addresses can be viewed as a

Ø Single string, or

Ø Separate attributes for each part, or

Ø Composite attributes (which are not in first normal form)

é E.g. it is often convenient to store multivalued attributes as-is, without creating a separate relation to store the values in first normal form

n Applications

é computer-aided design, computer-aided software engineering

é multimedia and image databases, and document/hypertext databases.

 

Object-Oriented Data Model

 

n Loosely speaking, an object corresponds to an entity in the E-R model.

n The object-oriented paradigm is based on encapsulating code and data related to an object into single unit.

n The object-oriented data model is a logical data model (like the E-R model).

n Adaptation of the object-oriented programming paradigm (e.g., Smalltalk, C++) to database systems.

 

Object Structure

 

n An object has associated with it:

é A set of variables that contain the data for the object.  The value of each variable is itself an object.

é A set of messages to which the object responds; each message may have zero, one, or more parameters.

é A set of methods, each of which is a body of code to implement a message; a method returns a value as the response  to the message

n The physical representation of data is visible only to the implementor of the object

n Messages and responses provide the only external interface to an object.

n The term message does not necessarily imply physical message passing.  Messages can be implemented as procedure invocations.

 

Messages and Methods

 

n Methods are programs written in general-purpose language with the following features

é only variables in the object itself may be referenced directly

é data in other objects are referenced only by sending messages.

n Methods can be read-only or update methods

é Read-only methods do not change the value of the object

n Strictly speaking, every attribute of an entity must be represented by a variable and two methods, one to read and the other to update the attribute

é e.g., the attribute address is represented by a variable address and two messages get-address and set-address.

é For convenience, many object-oriented data models permit direct access to variables of other objects.

 

Object Classes

 

n Similar objects are grouped into a class; each such object is called an instance of its class

n All objects in a class have the same

é Variables, with the same types

é message interface

é methods

The may differ in the values assigned to variables

n Example:  Group objects for people into a person class

n Classes are analogous to entity sets in the E-R model

 

 

 

 

Class Definition Example

    class employee {
      /*Variables */
                string    name;
               
string    address;
               
date      start-date;
               
int         salary;
        
/* Messages */
                int         annual-salary();
                string    get-name();
                string    get-address();
                int         set-address(string new-address);
               
int         employment-length();
};

n Methods to read and set the other variables are also needed with strict encapsulation

n Methods are defined separately

é E.g.  int employment-length() { return today() – start-date;}
        int set-address(string new-address) { address = new-address;}

 

Inheritance

 

n E.g., class of bank customers is similar to class of bank employees, although there are differences 

é both share some variables and messages, e.g., name and address. 

é But there are variables and messages specific to each class e.g., salary for employees and credit-rating for customers.

n Every employee is a person; thus employee is a specialization of person

n Similarly, customer  is a specialization of person.

n Create classes person, employee and customer

é variables/messages applicable to all persons associated with class person.

é variables/messages specific to employees associated with class employee; similarly for customer

 

n Place classes into a specialization/IS-A hierarchy

é variables/messages belonging to class person are inherited by class employee as well as customer

n Result is a class hierarchy

 

 

 

 

 

 

 

 

 

 

 

 

 


Note analogy with ISA Hierarchy in the E-R model

 

 

Class Hierarchy Definition

 

          class person{
                string          name;
               
string          address:
               
};
      class customer isa person {
                int credit-rating;
               
};
      class employee isa person {
                date start-date;
               
int salary;
               
};
      class officer isa employee {
                int office-number,
               
int expense-account-number,
               
};               

 

n  Full variable list for objects in the class officer:

é office-number, expense-account-number:  defined locally

é start-date, salary:  inherited from employee

é name, address: inherited from person

n  Methods inherited similar to variables.

n  Substitutabilityany method of a class, say person, can be invoked equally well with any object belonging to any subclass, such as subclass officer of person.

n  Class extent:  set of all objects in the class. Two options:

1. Class extent of employee includes all officer, teller and  secretary objects.

H  Class extent of employee includes only employee objects that are not in a subclass such as officer, teller, or secretary

H  This is the usual choice in OO systems

H  Can access extents of subclasses to find all objects of
subtypes of
employee

 

Example of Multiple Inheritance

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Class DAG for banking example.

 

Multiple Inheritance

 

n  With multiple inheritance a class may have more than one superclass.

é The class/subclass relationship is represented by a directed acyclic graph (DAG)

é Particularly useful when objects can be classified in more than one way, which are independent of each other

Ø E.g. temporary/permanent is independent of  Officer/secretary/teller

Ø Create a subclass for each combination of subclasses

  Need not create subclasses for combinations that are not possible in the database being modeled

n  A class inherits variables and methods from all its superclasses

n  There is potential for ambiguity when a variable/message N with the same name is inherited from two superclasses A and B

é No problem if the variable/message is defined in a shared superclass

é Otherwise, do one of the following

Ø  flag as an error,

Ø  rename variables (A.N and B.N)

Ø choose one.

 

More Examples of Multiple Inheritance

 

n Conceptually, an object can belong to each of several subclasses

é A person can play the roles of student, a teacher or footballPlayer, or any combination of the three

Ø  E.g., student teaching assistant who also play football

n Can use multiple inheritance to model “roles” of an object

é That is, allow an object to take on any one or more of a set of types

n But many systems insist an object should have a most-specific class

é That is, there must be one class that an object belongs to which is a subclass of all other classes that the object belongs to

é Create subclasses such as student-teacher and
student-teacher-footballPlayer for each combination

é When many combinations are possible, creating
subclasses for each combination can become cumbersome

 

Object Identity

 

n An object retains its identity even if some or all of the values of variables or definitions of methods change over time.

n Object identity is a stronger notion of identity than in programming languages or data models not based on object orientation.

é Value – data value; e.g. primary key value used in relational systems.

é Name – supplied by user; used for variables in procedures.

é Built-in – identity built into data model or programming language.

Ø no user-supplied identifier is required.

Ø Is the form of identity used in object-oriented systems.

 

 

 

Object Identifiers

 

n Object identifiers used to uniquely identify objects

é Object identifiers are unique:

Ø  no two objects have the same identifier

Ø each object has only one object identifier

é E.g., the spouse field of a person object may be an identifier of another person object.

é can be stored as a field of an object, to refer to another object.

é Can be

Ø system generated (created by database) or

Ø external (such as social-security number)

é System generated identifiers:

Ø Are easier to use, but cannot be used across database systems

Ø May be redundant if unique identifier already exists

 

Object Containment

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


n Each component in a design may contain other components

n Can be modeled as containment of objects.  Objects containing; other objects are called composite objects.

n Multiple levels of containment create a containment hierarchy

é  links interpreted as is-part-of, not is-a.

n Allows data to be viewed at different granularities by different users.

 

 

Object-Oriented Languages

 

n  Object-oriented concepts can be used in different ways

é Object-orientation can be used as a design tool, and be encoded into, for example, a relational database

H  analogous to modeling data with E-R diagram and then converting to a set of relations)

é The concepts of object orientation can be incorporated into a programming language that is used to manipulate the database.

Ø  Object-relational systems – add complex types and object-orientation to relational language.

Ø  Persistent programming languages – extend object-oriented programming language to deal with databases by adding concepts such as persistence and collections.

 

Persistent Programming Languages

 

n Persistent Programming languages allow objects to be created and stored in a database, and used directly from a programming language

é allow data to be manipulated directly from the programming language

Ø No need to go through SQL.

é No need for explicit format (type) changes

Ø format changes are carried out transparently by system

Ø Without a persistent programming language, format changes becomes a burden on the programmer

  More code to be written
  More chance of bugs

é allow objects to be manipulated in-memory

Ø  no need to explicitly load from or store to the database

  Saved code, and saved overhead of loading/storing large amounts of data

 

n Drawbacks of persistent programming languages

é Due to power of most programming languages, it is easy to make programming errors that damage the database.

é Complexity of languages makes automatic high-level optimization more difficult.

é Do not support declarative querying as well as relational databases

 

 

Persistence of Objects

 

n Approaches to make transient objects persistent include establishing

é Persistence by Class – declare all objects of a class to be persistent; simple but inflexible.

é Persistence by Creation – extend the syntax for creating objects to specify that that an object is persistent.

é Persistence by Marking – an object that is to persist beyond program execution is marked as persistent before program termination.

é Persistence by Reachability - declare (root) persistent objects; objects are persistent if they are referred to (directly or indirectly) from a root object.

Ø Easier for programmer, but more overhead for database system

Ø Similar to garbage collection used e.g. in Java, which
also performs reachability tests

 

Object Identity and Pointers

 

n A persistent object is assigned a persistent object identifier.

n Degrees of permanence of identity:

é Intraprocedure – identity persists only during the executions of a single procedure

é Intraprogram – identity persists only during execution of a single program or query.

é Interprogram – identity persists from one program execution to another, but may change if the storage organization is changed

é Persistent – identity persists throughout program executions and structural reorganizations of data; required for object-oriented systems.

 

n In O-O languages such as C++, an object identifier is actually an in-memory pointer.

n Persistent pointer – persists beyond program execution

é can be thought of as a pointer into the database

Ø E.g. specify file identifier and offset into the file

é Problems due to database reorganization have to be dealt with by keeping forwarding pointers

 

 

 

 

 

Storage and Access of Persistent Objects

 

How to find objects in the database:

 

 

n Name objects (as you would name files)

é Cannot scale to large number of objects.

é Typically given only to class extents and other collections of objects, but not objects.

n Expose object identifiers or persistent pointers to the objects

é Can be stored externally.

é All objects have object identifiers.

n Store collections of objects, and allow programs to iterate over the collections to find required objects

é Model collections of objects as collection types

é Class  extent - the collection of all objects belonging to the class; usually maintained for all classes that can have persistent objects.

 

Persistent C++ Systems

 

n C++ language allows support for persistence to be added without changing the language

é Declare a class called Persistent_Object with attributes and methods to support persistence

é Overloading – ability to redefine standard function names and operators (i.e., +, –, the pointer deference operator –>) when applied to new types

é Template classes help to build a type-safe type system supporting collections and persistent types.

n Providing persistence without extending the C++ language is

é relatively easy to implement

é but more difficult to use

n Persistent C++ systems that add features to the C++ language have been built, as also systems that avoid changing the
language

ODMG C++ Object Definition Language

 

n The Object Database Management Group is an industry consortium aimed at standardizing object-oriented databases

é  in particular persistent programming languages

é Includes standards for C++, Smalltalk and Java

é ODMG-93

é ODMG-2.0 and 3.0 (which is 2.0 plus extensions to Java)

Ø Our description based on ODMG-2.0

n ODMG  C++ standard avoids changes to the C++ language

é  provides functionality via template classes and class libraries

 

ODMG Types

 

n Template class d_Ref<class> used to specify references (persistent pointers)

n Template class d_Set<class> used to define sets of objects.         

é Methods include insert_element(e) and delete_element(e)

n Other collection classes such as d_Bag (set with duplicates allowed), d_List and d_Varray (variable length array) also provided.

n d_ version of many standard types provided, e.g. d_Long and d_string

é Interpretation of these types is platform independent

é Dynamically allocated data (e.g. for d_string) allocated in the database, not in main memory

 

ODMG C++ ODL:  Example

 

    class Branch  :  public d_Object {

      ….

    }

    class Person  :  public d_Object {
   public:
      d_String    name;       // should not use String!

          d_String    address;
};

    class Account : public d_Object {
   private:
      d_Long      balance;
   public:
      d_Long      number;
     
d_Set <d_Ref<Customer>> owners;

          int            find_balance();
      int            update_balance(int delta);
};

 

class Customer  :  public Person {
public:
      d_Date              member_from;
      d_Long              customer_id;
      d_Ref<Branch> home_branch;
      d_Set <d_Ref<Account>> accounts; };

 

Implementing Relationships

 

n Relationships between classes implemented by  references

n Special reference types enforces integrity  by adding/removing inverse links.

é Type d_Rel_Ref<Class, InvRef> is a reference to Class, where attribute InvRef  of Class is the inverse reference.

é Similarly, d_Rel_Set<Class, InvRef> is used for a set of references

n Assignment method (=) of class d_Rel_Ref is overloaded

é Uses type definition to automatically find and update the inverse link

é Frees programmer from task of updating inverse links

é Eliminates possibility of inconsistent links

n Similarly, insert_element() and delete_element() methods of d_Rel_Set use type definition to find and update the inverse link automatically

 

Implementing Relationships

n E.g.

    extern const char _owners[ ],   _accounts[ ];
class Account : public d.Object {
       ….
      d_Rel_Set <Customer, _accounts> owners;
}
  // .. Since strings can’t be used in templates …
const char _owners= “owners”;
const char _accounts= “accounts”;

 

ODMG C++ Object Manipulation Language

 

n Uses persistent versions of C++ operators such as new(db)
     

    d_Ref<Account> account = new(bank_db, “Account”) Account;

é new allocates the object in the specified database, rather than in memory.

é The second argument (“Account”) gives typename used in the database.

n Dereference operator -> when applied on a d_Ref<Account>  reference loads the referenced object in memory (if not already present) before continuing with usual C++ dereference.

n Constructor for a class – a special method to initialize objects when they are created; called automatically on new call.

n Class extents maintained automatically on object creation and deletion

é Only for classes for which this feature has been specified

Ø Specification via user interface, not C++

é Automatic maintenance of class extents not supported in
earlier versions of ODMG

 

ODMG C++OML: Database and Object Functions

 

n Class  d_Database provides methods to

é open a database:             open(databasename)

é give names to objects:      set_object_name(object, name)

é look up objects by name:  lookup_object(name)

é rename objects:                rename_object(oldname, newname)

é close a database (close());

n Class d_Object is inherited by all persistent classes.

é provides methods to allocate and delete objects

é method mark_modified() must be called before an object is updated. 

Ø Is automatically called when object is created

 

ODMG C++ OML: Example

 

int create_account_owner(String name, String Address){

    Database bank_db.obj;
Database * bank_db= & bank_db.obj;
bank_db =>open(“Bank-DB”);
d.Transaction Trans;
Trans.begin();

d_Ref<Account> account = new(bank_db) Account;
d_Ref<Customer> cust = new(bank_db) Customer;
cust->name - name;
cust->address = address;
cust->accounts.insert_element(account);
... Code to initialize other fields

Trans.commit();

}

 

n Class extents maintained automatically in the database.

n To access a class extent:
   
d_Extent<Customer> customerExtent(bank_db);

n Class d_Extent provides method
         
d_Iterator<T> create_iterator()
to create an iterator on the class extent

n Also provides select(pred) method to return iterator on objects that satisfy selection predicate pred.

n Iterators help step through objects in a collection or class extent.

n Collections (sets, lists etc.) also provide create_iterator() method.

 

ODMG C++ OML: Example of Iterators

 

int print_customers() {
Database bank_db_obj;
Database * bank_db = &bank_db_obj;
bank_db->open (“Bank-DB”);
d_Transaction Trans; Trans.begin ();

d_Extent<Customer> all_customers(bank_db);
d_Iterator<d_Ref<Customer>> iter;
iter = all_customers–>create_iterator();
d_Ref <Customer> p;

    while{iter.next (p))
                print_cust (p);  // Function assumed to be defined elsewhere

    Trans.commit();

}

 

ODMG C++ Binding: Other Features

 

n Declarative query language OQL, looks like SQL

é Form query as a string, and execute it to get a set of results (actually a bag, since duplicates may be present)

    d_Set<d_Ref<Account>> result;
d_OQL_Query q1("select a
                          from Customer c, c.accounts a
                          where c.name=‘Jones’
                                     and a.find_balance() > 100");
d_oql_execute(q1, result);

n Provides error handling mechanism based on C++ exceptions, through class d_Error

n Provides API for accessing the schema of a database.

 

Making Pointer Persistence Transparent

 

n Drawback of the ODMG C++ approach:

é Two types of pointers

é Programmer has to ensure mark_modified() is called, else database can become corrupted

n ObjectStore approach

é Uses exactly the same pointer type for in-memory and database objects

é Persistence is transparent applications

Ø Except when creating objects

é Same functions can be used on in-memory and persistent objects since pointer types are the same

é Implemented by a technique called pointer-swizzling which is described in Chapter 11.

é No need to call mark_modified(), modification detected automatically.

 

Persistent Java Systems

 

n ODMG-3.0 defines extensions to Java for persistence

é Java does not support templates, so language extensions are required

n Model for persistence:  persistence by reachability

é Matches Java’s garbage collection model

é Garbage collection needed on the database also

é Only one pointer type for transient and persistent pointers

n Class is made persistence capable by running a post-processor on object code generated by the Java compiler

é Contrast with pre-processor used in C++

é Post-processor adds mark_modified() automatically

n Defines collection types DSet, DBag, DList, etc.

n Uses Java iterators, no need for new iterator class

 

 

ODMG Java

 

n Transaction must start accessing database from one of the root object (looked up by name)

é finds other objects by following pointers from the root objects

n Objects referred to from a fetched object are allocated space in memory, but not necessarily fetched

é Fetching can be done lazily

é An object with space allocated but not yet fetched is called a hollow object

é When a hollow object is accessed, its data is fetched from disk.

 

 

 

 

 

 

End of Chapter

 

No comments:

Post a Comment