Thursday, March 29, 2007

Replace tape and tape drive using Veritas NetBackup

To replace a tape drive:
  • Physically replace it
  • Synchronize the S/N of the new one with the database
# tpautoconf -report_disc
# tpautoconf -replace_drive drive_name -path drive_path

To replace a tape:
  • Expire it manually first
# bpexpdate -m A00001 -d 0 -h host
  • Delete the volume using Veritas Netbackup GUI
  • Remove it's label and replace it with a new one attached to the old label
  • Import the new tape back into the library

It's difficult to do things in a proactive way

As things get older, it becomes more & more difficult to identify what is the root cause of the problem whenever there is a problem. The problem could happen at any point or multiple points down to the whole path. You have to isolate them from each other and fix them one by one. Sometimes it's just not possible to figure out the root cause. You don't know which is caused by which since they are all correlated. It's pretty much like us, human-beings. It may take several days or even several weeks to fix all the issues and get the problematic system back to normal. The system availability drops a lot during this period, of course. So proactive is always much better than reactive. Sounds like everybody knows this.

However, in a real world, usually it's hard to actively replace things until it has a problem. People would say "why do we need to replace it as there is no problem at all? look, it's running well!". Or sometimes they would say "I understand your concern. But you know, we currently have a tight budget. It's hard to get approved if we raise the request. After all, things are still running well. Let's play by ear!" But, when a serious problem happens and the system becomes unavailable, these guys just get nervous. "You must fix it ASAP! You must bring the system back by today! I don't care how you do it, you gotta get this done!" Sounds pretty similar? This happens in our life almost every day.

Wednesday, March 07, 2007

The storage representation for number zero

Yesterday I found out something interesting in an Oracle 9205 database. One column called is_mandatory is defined as NUMBER(1,0). Based on the business rule, only 0 or 1 can be stored in the column. The strange things happen to 0. There are 2 different storage representations for the number zero!
09:18:50 > select is_mandatory, dump(is_mandatory) storage from t where is_mandatory != 1;

------------ ------------------------------
0 Typ=2 Len=1: 128
0 Typ=2 Len=1: 128
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1

9 rows selected.

Elapsed: 00:00:00.01
09:19:00 > select is_mandatory, dump(is_mandatory) storage from t where is_mandatory = 0;

------------ ------------------------------
0 Typ=2 Len=1: 128
0 Typ=2 Len=1: 128

2 rows selected.

Elapsed: 00:00:00.01
In a normal case, the 0 should be always stored in Oracle like:

11:14:16 > select dump(0) from dual;

Typ=2 Len=1: 128

Elapsed: 00:00:00.08

However, there is another storage scheme for 0 in the table
Typ=2 Len=2: 193,1
This value should be zero as well based on Oracle storage internal. However, I couldn't fetch these rows using is_mandatory = 0 since 0 will be always interpreted as 'Typ=2 Len=1: 128'. Even I couldn't reproduce this scenario simply using SQL. Originally, these values are inserted/updated by a database stored procedure which takes a set of values as parameters from a front-end Java program via JDBC thin client. I am wondering under what kind of condition 0 would be stored this strange way. It seems like 'Typ=2 Len=2: 193,1' is something between 0 and 1.

11:42:02 > select is_mandatory from t where is_mandatory > 0 and is_mandatory <>
7 rows selected.
Elapsed: 00:00:00.00

I raised a question about this to ASKTOM. So this post will be updated once I got the answer from Tom.

The problem is caused by the incompatible JDBC and RDBMS version. We are running an Oracle 9205 database as we are using JDBC 8174 in the middle tier to access the database. After I point to JDBC 9205 driver and re-run the program, number zero can be stored correctly with 'Len=1: 128'. According to what one said:

'Len=2: 193,1' is definitely an invalid NUMBER; in :
SQL> select dump ( utl_raw.cast_to_number ('C101') ) from dual;

Typ=2 Len=2: 193,1

SQL> select 1 / utl_raw.cast_to_number ('C101') from dual;
select 1 / utl_raw.cast_to_number ('C101') from dual
ERROR at line 1:
ORA-03113: end-of-file on communication channel

Just an illegal bit representation that doesn't map to any NUMBER, and so produces unpredictable results (including process termination) if used.

I also did some tests by myself in Oracle SQL Developer:

select utl_raw.cast_to_number('c101') from dual


1 rows selected

select utl_raw.cast_to_number('C101')/1 from dual


1 rows selected

If the 2nd SQL is executed in SQL*Plus, seems like it will never end. A different result! Completely unpredictable, indeed!

Friday, March 02, 2007

Finally Oracle SQL Developer 1.1 patch can be applied

Oracle SQL Developer 1.1 'check for updates ...' had been pending for over a month. Seems like the option only was made available for a short while. So even there had been a 1.1 patch 1, I was not able to apply it via 'Check for updates ...'. The only way to apply the patch is to install the whole SQL Developer which I don't like. I've been waiting for the option to be made available again.

This morning, I tried to run 'Check for updates ...' again and found that the option was made available! I upgraded my SQL Developer to the latest version finally. Meanwhile, Oracle released a new patch for SQL Developer 1.1, it's called patch 2. However, there is still some problem regarding the patch application via 'Check for updates ...'. Is it the reason Oracle disabled the option for over one month? The problem is that SQL developer still tells you there is a new patch found to be applied even you've already applied it. This gets me confused. I don't know what will happen if I reapply the patch 2. Maybe nothing will happen. It's just repeating this application itself over and over again. Oracle should sort it out as soon as possible.

Thursday, March 01, 2007

XML audit trail and RMAN

Oracle 10g introduces a new feature concerning the audit. In 10g, we are able to write the audit trail into XML files on the operating system level.

alter system set audit_trail=XML;

This sounds very exciting since we could do a lot of things with XML. At least, the audit trail is well structured, isn't it?

But, there seems a problem using RMAN with this auditing feature enabled.

alter system set audit_sys_operation=true;

I only enable auditing for the privileged account 'SYS' in the statement above. Now, when I try to connect RMAN, I will get the errors below:

swong@mars:flyhorse > rman target / nocatalog

Recovery Manager: Release - Production on Thu Mar 1 10:28:10 2007

Copyright (c) 1982, 2005, Oracle. All rights reserved.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-09817: Write to audit file failed.

I encountered this error in Oracle This is because of an unpublished bug 4600757. You can refer to Oracle DOC ID: Note:391787.1 to get more about this problem.