ORACLE8 THE EURECKA FOR VLDB (VERY LARGE DATABASE)

ORACLE8 THE EURECKA FOR VLDBs (VERY LARGE DATABASE)

& A DATABASE WIZARD

Two things are certain about corporate database today: They are growing bigger and they are becoming more complex. Bigger, not just in the amount of the data they store¾ data warehouses growing hundreds of gigabytes to hundreds of terabytes and even beyond¾ but also in the number of users the applications are expected to support simultaneously. As some online transaction processing (OLTP) databases expand from departmental applications to enterprise applications, user population can skyrocket from dozens to hundreds or even thousands.

As databases grow larger, the task of managing and maintaining them increases at least proportionally, if not exponentially. Activities such as backing up, reorganizing, loading, and purging become more complex and time consuming as the amount of data increases. Often it becomes impossible to perform these maintenance operations within the window of time provided by the customer.

Complex, not only because size naturally adds complexity to managing databases, but also because the variety of applications they are now called on to support have expanded from data warehousing and decision support and OLTP to Web servers, the Internet, Intranets and extranets. And the types of data they are expected to process organize, and store has increased as well.

It is precisely these two issues¾ rapid growth and complex data types¾ that are most often on the minds of information (IT) managers in large enterprises today, database – consulting firm that advises large corporations on how to plan, implement, and manage very large databases (VLDBs) for the enterprise. Size and complexity are frequently the top issues many IT managers and database administrators (DBAs) face today.

Let’s see what are the pragmatic needs of VLDBs while facilitating the Decision support system or Data warehousing:

Physical layer data distribution
Optimum throughput for large and complex queries
High availability
Easy manageability

Physical layer data distribution: As the tendency of VLDBs is to imbibe a mammoth information base on a large resourceful platform, it is the profound mantra of database design to distribute the data across the resources. Such exercise can leverage the database throughput to a great extent. Now the question is how to perform such a classical exercise? This question mark has been the most contemplative factor for all the database Gurus. Although the architectural experience enlighten the way but the core technology of the database engine has a major role to play.

Optimum throughput for large complex queries: It is evident to have large and complex queries in the applications that are running against VLDBs. Such queries can strangulate the database even though they are optimized. Again the question here is how to eradicate such chronic bottlenecks. Don’t you think the database engine has to be robust? Let’s see what ORACLE8 has to offer for this question.

High availability: In today’s high-tech environment, database systems form such an integral parts of many business transactions that if the database is unavailable, the business comes to a halt. For example banks, airlines, and stock exchanges simply cannot operate without access to their data. Many other industries use databases for back - office operations where downtime is more tolerable; but even these businesses need to minimize downtime since it leads to wasted Resources and inefficiencies. Beyond the dollar costs, downtime also causes anger and frustration in both customers and employees who are prevented from accomplishing their goals. Transitioning from a client/server architecture to a network-computing architecture will only make high availability that much more important. Let’s see what ORACLE8 high availability architecture has to provide for minimizing the length and disruptiveness of unplanned database outages.

Easy manageability: Managing the enterprise database has been a paramount concern for all the DBAs and IT managers. Precisely with the increasing development in multi–tire architecture and distributed database, management and maintenance has become a "hard nut to crack". Its not only the data warehouse but also the magnitude of OLTP databases have gone leaps and bounds. Ironically speaking the time needs a Pandora’s Box. So what is the solution¾ is that a healthy and large number of database crew or an enterprisewide tools. ORACLE8 says "think globally and act locally".

The needs are raising the bar on database technology and providing benchmarks for the design of the next generation of databases, such as Oracle’s ORACLE8. Designed to address the largest, most – demanding data warehouse and OLTP application enterprise today, the newest version of Oracle Server breaks all previous size limits, both in database size and in user population. ORACLE8 has emerged with a new paradigm of computing that can meet all the needs of VLDBs. Viz.

Data Partitioning
Star Schemas and Queries
Parallel Queries and Servers
High – Availability Architecture
Strategic Database management

PARTITIONING STRATEGIES – DIVIDE AND CONQUER

One of the core features of Oracle8 is the ability to physically partition a table and its associated indexes. By partitioning tables and indexes into smaller components while maintaining the table as a single database entity, the management and maintenance of data in the table becomes more flexible. The data management now can be accomplished at the finer-grained, partition level, while queries still can be performed at the table level. For example, applications do not need to be modified to run against the newly partitioned tables.

The divide-and-conquer principle allows data to be manipulated at the partition level, reducing the amount of data per operation. In most cases, this standard also allows partition-level operations to be performed in parallel with the same operations on other partitions of the same table, speeding up the entire operation.

BENEFITS FROM TABLE PARTITIONING

The greatest benefits from Oracle8 partitioning are the ability to maintain and administer very large databases. These gains far outweigh any performance benefits. The following dimensions of scalability can be found in Oracle8:

Higher Availability
Greater Manageability
Enhanced Performance

Higher Availability

Using intelligent partitioning strategies, Oracle8 can help meet the increasing availability demands of VLDBs. Oracle8 reduces the amount and duration of scheduled downtime by providing the ability to perform downtime operations when the database is still open and in use. The key to higher availability is partition autonomy, the ability to design partitions to be independent entities within the table. For example, any operation performed on partition X should not impact operations on a partition Y in the same table.

Greater Manageability

Managing a database consists of moving data in and out, backing up, restoring and rearranging data due to fragmentation or performance bottlenecks. As data volumes increase, the job of data management becomes more difficult. Oracle8 supports table growth while placing data management at a finer-grain, partition level. In Oracle7, the unit of data management was the table, while in Oracle8 it is the partition. As the table grows in Oracle8, the partition need not also grow; instead the number of partitions increases. All of the data management functions in Oracle7 still exist in Oracle8. But with the ability to act against a partition, a new medium for parallel processing within a table now is available. Similar rules apply to designing a database for high availability. The key is data-segment size and to a lesser degree, autonomy.

Enhanced Performance

The strategy for enhanced performance is divide-and-conquer through parallelism. This paradigm inherently results in performance improvements because most operations performed at the table level in Oracle7 now can be achieved at the partition level in Oracle8. However, if a database is not designed correctly under Oracle8, the level of achievable parallelism and resulting performance gains will be limited. The table partitioning strategy used in Oracle7 may not necessarily be adequate to make optimal use of the Oracle8 features and functionality.

How Partition is done?

The pivotal feature of Oracle8 is explicit data partition. Orcale8 allows you to divide the storage of tables into smaller units of disk storage called partitions. Each partition of a table has the same logical attributes. For example, each table partition contains the same columns with the same data types and integrity constraints. But each table partition can have different physical characteristics. For example, Oracle8 can store each partition of a table in a separate tablespace i.e. located on different physical disk space, and each partition can have different storage settings for PCTFREE, PCTUSED, and so on.

Partitioned Tables

Oracle8 supports only range-partitioned tables. A row’s partition key value determines in which partition Oracle8 stores a row. A table’s partition key is a column or order set of columns (as many as 16) that characterizes the physical partitioning of table rows. See examples on the following page (Customers Table).

CREATE TABLE usa_custmers

( id NUMBER(5) PRIMARY KEY,

lastname VARCHAR2(50) NOT NULL,

firstname VARCHAR2(50) NOT NULL,

address VARCHAR2(100),

city VARCHAR2(50),

state VARCHAR2(2),

zipcode VARCHAR2(15),

phone VARCHAR2(15),

fax VARCHAR2(15),

email VARCHAR2(100) )

PARTITION BY RANGE ( state )

(PARTITION p1 LESS THAN (‘H’)

TABLESPACE data01

(PARTITION p2 LESS THAN (‘MI’)

TABLESPACE data02

(PARTITION p3 LESS THAN (‘NM’)

TABLESPACE data03

(PARTITION p4 LESS THAN (‘S’)

TABLESPACE data04

(PARTITION p5 LESS THAN (MAXVALUE)

TABLESPACE data05

Rows in Data Partitions:

To demonstrate Oracle8 placing rows into a range partitioned table, visualize what transpires when an application inserts a customer from the state Arkansas. Oracle8 performs an ANSI VARCHAR2 comparison with the first character in the state code AR with the value that defines the first table partition p1. Since A precedes F, Oracle8 places the new row into partition p1, which is physically stored as data of tablespace DATA01. In other words, Oracle8 creates the buckets for the different range of states. Therefore, while accessing the customers of a particular state or a range of states, Oracle8 performs searches in the bucket related to the state rather than in full table. The access is incredibly faster compared to a conventional table implementation.

MAXVALUE

Of particular interest in this example is the declaration of partition P5. This partition stores all rows with a value greater than the upper bound of the previous partition, P4. For example, when you insert a new customer from Wyoming, Oracle8 places the new row into partition P5, which is physically stored in a data file of tablespace DATA05. If the USA_CUSTOMER table did not include a partition with an upper bound MAXVALUE, Oracle8 would not allow applications to insert customers from SC, SD, TN, TX, UT, VA, VT, WA, WV, WI, and WY.

Partitioned Indexes

Oracle8 also supports range-partitioned indexes for non-clustered tables. Just as with tables, each partition of an index has the same logical attributes (index columns) but can have different physical characteristics (tablespace placement and storage settings). An index’s partition key determines in which partition Oracle8 stores index entries. An index’s partition key must include one or more of the columns that define the index. The declaration of index partition ranges is identical to declaring table partitions.

The data partitioning can incredibly accentuate the throughput of any large and complex queries since the queries against a partitioned table perform a full scan on a small partition instead of the whole table. Data partitioning is a broad new feature of Oracle8 – this article just scratches the surface. There are many other related issues to understand about table and index partitioning. Partition maintenance operations, such as moving splitting, rebuilding, and dropping partitions, are challenging topics for a DBA to master. My endeavor here is to pave the way but it is always said "last but not least".

STAR SCHEMAS

Almost all of the data in a data warehouse originated from an operational system. However, even if the data exists in a relational schema in an operational system, a transaction-oriented schema is not necessarily appropriate for a data-warehousing application. A typical operation in on-line transaction process (OLTP) is a "transaction." In most applications, a transaction consists of the retrieval, insertion, or update of only a handful of records. For example, a new purchase order requires information about the customer, and the products being ordered. Operational systems are typically optimized to allow the application to quickly locate these individual records.

The typical data warehousing operation involves a much larger number of records. The end-user of a data warehouse is not interested in a single purchase order. Instead, the end-user will want to examine all of the purchase orders for a given week, or for a given region. Data warehouses are used for a variety of purposes, from simple reporting to more complex analysis and sophisticated forecasting. But each of these data warehousing functions share the same characteristic of analyzing large amounts of data.

A highly normalized schema offers superior performance and efficient storage for OLTP schemas; and star schema provides similar benefits for data warehouses.

What is a Star Schema?

A "star schema" is a natural data representation for many data warehousing applications. A star schema contains one very large table (known as the FACT table), and multiple smaller tables (called DIMENSION tables).

Image4.gif (3318 bytes)

Example star schema

The SALES table contains information about the sales for each product at each store on a given day (e.g., the date of the transaction and the amount of total sales). The SALES table is very large, since a grocery-store chain could easily have millions of such entries per day.

In contrast, the TIME, PRODUCT, STORE, and PROMOTION tables are very small relative to the SALES table. In this schema, SALES is the FACT table; product sales are the basic units of analysis for this data warehouse. The other tables (TIME, PRODUCT, STORE, and PROMOTION) are the DIMENSION tables. These DIMENSION tables provide more details of the sales transactions, describing information about which products were sold, when and where the sales occurred, and what promotions might have influenced the sale.

This star schema is an intuitive representation of multidimensional ‘datacube’ in a relational environment. Each DIMENSION table represents a business dimension while the FACT table contains the contents of the ‘datacube,’ which in this case is the sales information for each product. For example, an end user may want to examine ‘sales by store by week,’ where STORE and TIME are dimensions of the query.

STAR QUERY PROCESSING IN ORACLE8

Oracle8’s star query algorithm represents a significant improvement over previous star query algorithms. Rather than relying on a join index, Oracle8’s algorithm utilizes single-table bitmap indexes, and integrates this bitmap technology seamlessly into a new star query join method. This new algorithm provides excellent performance, while utilizing less storage (bitmap indexes are highly compressed) and offering more flexibility in both the types of schemas and the types of queries that can be optimized. Let’s see how it works, Oracle8 processes a star query using two basic phases. The first phase retrieves exactly the necessary rows from the fact table. Because this retrieval utilizes bitmap indexes, it is very efficient. The second phase joins this "result set" from the fact table to the dimension tables. Star queries augment the throughput to great extent.

Oracle Parallel Query & Server Option

The key feature of Oracle8 is parallel query option (i.e. built in to database). This feature can accentuate the large complex queries to a great extent especially when they are running on a multi-processor server. The result is remarkable since the parallel options invoke multiple processes to utilize spare system resources to reduce the time required to complete tasks. The parallel option does not reduce the amount resources required by the processing; rather, they spread the processing burden across multiple CPUs. As the data warehouse server is comprised of 6 CPUs and a large number of disk arrays, the parallel option can have a significant impact on the performance of the database and queries. Let’s see how SMP (symmetric multiprocessors) react to OPQ:

SMP architecture entails multiple processors sharing common memory and disk storage and running under a single operating system image. The operating system leverages the available pool of CPUs and allocates tasks so that processes work in parallel to accomplish a set of tasks. The SMP architecture allows a degree of parallelism in execution. Because multiple tasks can run simultaneously, a separate processor manages each one.

The Oracle parallel query option takes advantage of the mechanism best by spreading more than one Query Server onto different CPUs so that multiple query servers and CPUs accomplish a single database request. The number of query servers is decided by the degree of parallelism defined at the object level and in the query. The degree of parallelism in this scenario is of a single dimension i.e. the single Oracle instance coordinating with multiple Query Servers. The degree of parallelism can be made two-dimension by implementing the Oracle Parallel Server. The Oracle parallel server extends tremendous leverage over multi- processor servers. More than one Oracle instance contributes computing power to the large data warehouse database. Let’s see how OPS works.

Oracle Parallel Server, an option to Oracle8 Enterprise Edition, stands alone in providing Database Management Systems functionality and performance for loosely coupled systems. Oracle Parallel Server allows users on multiple nodes to access the same database simultaneously. Users benefit from the increased transaction processing power and higher availability of multiple computer systems. Through the use of parallel cache management, Oracle Parallel Server takes advantage of high availability, high performance, and incremental growth made possible by loosely coupled systems. Parallel Database technology can make it possible to overcome the memory and CPU limitations of a particular node, enabling a single clustered configuration to support a much larger user base. The Oracle Parallel Server is designed to take full advantage of all the benefits offered by loosely coupled systems. The diagram below displays an example of an Oracle Parallel Server implementation.

xpsf3t26.gif (18844 bytes)

Oracle Parallel Server Cluster

Both the OPQ and OPS add tremendous muscle power to VLDBs.

Other New Parallel Server Performance Features for Oracle8: Introducing an extra level of DML locking via partitioned tables may affect performance of short transactions. To improve performance in the Oracle Parallel Server environment, you can turn off DML locking on selected tables with the ALTER TABLE DISABLE TABLE LOCK command. This disables both table and partition level DML locks. This allows extra traffic to the Integrated Distributed Lock Manager to be bypassed, thereby providing a performance benefit. DDL statements are not allowed on the selected tables when DML locking is disabled.

Disk affinity is further enhanced in Oracle8, providing performance improvements. Partition to node affinity information determines slave allocation and work assignment. This results in a fairly deterministic assignment of partitions to Oracle instances. For parallel DML, this results in an increased likelihood of buffer cache hits. For shared-nothing MPP systems, the Oracle Parallel Server assigns partitions to instances by taking the disk affinity into account. Affinity information that persists across transaction statements improves buffer cache hit ratios and reduces block pings between instances.

Oracle Parallel Server is a highly reliable system designed to provide high availability and scalable performance for applications, while minimizing communication overhead between cluster nodes and maximizing the performance of each cluster through parallel cache management and internal concurrency control. Oracle Parallel Server brings a number of very important benefits to the high-end and mid-range of enterprise computing. A number of major clustering architectures have emerged in the last decades and Oracle has been working closely with the key platform vendors in this field, to help the vendors provide the necessary high-speed network and shared disks required by the Oracle Parallel Server. The combination of the Oracle Parallel Server and the variety of clustering architectures in DEC Open VMS, all the major UNIX platforms, and Windows NT cluster vendors offer ideal solutions for many classes of applications which require scalability, high availability and high performance.

Oracle8 High – Availability Architecture:

The Oracle8 high availability architecture provides sophisticated features for minimizing the length and disruptiveness of unplanned database outages. By using these features, customers can greatly improve the total availability of their business systems. Oracle8 supports the following high availability features:

Standard recovery on a single node

Automatic recovery of changes made by committed transactions
Automatic rollback of uncommitted transactions

Failover to a spare node

Oracle Failsafe detects node failures and transfers database execution to a spare node

Failover to a peer node

Oracle Parallel Server allows continued data access from peer nodes after a node failure

Optimized roll-forward recovery

The checkpoint interval parameter limits the redo between checkpoints
The checkpoint timeout parameter limits the time between checkpoints
The max dirty buffers parameter limits IOs at recovery time
Incremental checkpoints minimize redo application during recovery

Optimized roll-back recovery

Deferred transaction recovery takes roll-back out of the critical path of crash recovery
Row level locks are preserved through crashes
Consistent read of unrecovered data lets readers see unrecovered data
Timeslicing of rollback across all transactions minimizes the time to release locks

Optimized recovery of bulk operations

Direct writes bypass the effects of bulk operations on checkpoints
The redo for bulk operations is skipped at recovery time
Freeing of block ranges makes the roll-back for bulk operations fast

Scalable recovery

Redo application is automatically parallelized
Undo application is automatically parallelized

Preservation of ongoing work by users

Transparent application failover makes node failures invisible to end users

Distributed caching of all data

Data skew after fail over is eliminated
Read intensive data is always accessible during recovery
Peer nodes have pre-warmed caches after failover

Failover to replica site

Snapshots allow creation of read only replicas
Symmetric replication allows creation of updateable replicas
Standby database replicates entire database using redo logs

The standby-database is a powerful feature of Oracle8 that can be used to maintain a remote copy of the database. This feature is somewhat like snapshots except that the standby database applies to the entire database instead of individual tables, and the redo log is used to propagate changes instead of a snapshot log. A disadvantage of this scheme is that both uncommitted and committed transactions are propagated to the copy database, so crash recovery must be applied at the copy site before the data can be accessed.

Strategic Database Management:

Users need systems management solutions that permit them to reduce the complexity of managing their heterogeneous environments, and help them to decrease the training, consulting and administrative costs inherent in having products from multiple vendors. The ideal solutions need to provide a rigorous combination of integration and openness: integration is required for centralized control, while openness is needed for future growth and flexibility. Administrators want to be able to guarantee their users a base level of reliability, availability and security, as well as a high level of performance. As databases become more prevalent, trends such as Very Large Databases (VLDB) and Very Many Databases (VMDB) are also causing administrators to search for specialized tools that are optimized for these specific management scenarios.

ORACLE’S SOLUTION: ORACLE ENTERPRISE MANAGER, is the cornerstone of the strategy which makes Oracle’s universal data servers the most manageable databases for the enterprise. The Oracle Enterprise Manager product family consists of next generation systems management tools designed to efficiently manage the complete Oracle environment, including systems, databases, networks and applications. Oracle Enterprise Manager supports Oracle8 databases.

Oracle Enterprise Manager: A Powerful and Comprehensive Product Suite

Oracle Enterprise Manager provides an integrated solution for managing your heterogeneous environment, with an open, extensible architecture. The architecture, which is both scalable and lightweight, consists of a centralized console, common services, and intelligent agents running on the managed nodes. Various applications reside on top of the common services, performing comprehensive system management tasks. These applications can be provided by Oracle, or by customers and other third-party software vendors. This is made possible by the product’s openness, resulting in the availability of multiple application programming interfaces (APIs) for both customer and third party integration.

Oracle Enterprise Manager is designed to fulfill the system management needs of Oracle customers, and is optimized for VLDB scenarios through caching and multi-threaded processing. It is also designed to provide the flexibility and customization required by administrators of VMDB environments, easily scaling upwards to maintain performance and automate routine tasks. In a VMDB environment, the ratio of nodes per administrator increases rapidly, requiring tools that can automate tasks through "lights out" management and proactive event monitoring. The key features of OEM are stated below.

Oracle Enterprise Manager Key Features:

Single, unified architecture for managing the complete Oracle environment
Centralized console for single point of management
Scalable console-agent architecture
Sophisticated job scheduling and event management system
Ability to perform operations on multiple objects with a single action
Common look and feel
Open APIs at every level

Oracle Enterprise Manager Standard Applications

Oracle Software Manager
Instance Manager
Schema Manager
Security Manager
Storage Manager
SQL Worksheet
Backup/Recovery Manager
Import/Export Manager
SQL*Loader

Oracle Enterprise Manager Extended Applications

Replication Manager
Mobile Agents Gateway Manager
Video Server Manager
Web Server Manager
Mail Server Manager
Parallel Server Manager

Oracle Enterprise Manager Performance Pack Applications

Oracle Performance Manager
Oracle TopSession
Oracle Tablespace Manager
Oracle Events
Oracle Trace
Oracle Expert
SNMP Monitor
Oracle System Sizer

Isn’t it, Oracle8 is the discovery of the era? Yes this is the Pandora’s Box. So let’s not wait¾ take the plunge and fly with Oracle8. It is said, "Hit the nail when it is hot" and that’s the right technology.