隐式类型转换简介
通常ORACLE数据库存在显式类型转换(Explicit Datatype Conversion)和隐式类型转换(Implicit Datatype Conversion)两种类型转换方式。如果进行比较或运算的两个值的数据类型不同时(源数据的类型与目标数据的类型),而且此时又没有转换函数时,那么ORACLE必须将其中一个值进行类型转换,使其能够运算。这就是所谓的隐式类型转换。其中隐式类型转换是自动进行的,当然,只有在这种转换是有意义的时候,才会自动进行。
Data Conversion
Generally an expression cannot contain values of different datatypes. For example, an expression cannot multiply 5 by 10 and then add 'JAMES'. However, Oracle supports both implicit and explicit conversion of values from one datatype to another.
关于隐式类型转换,建议翻看官方文档“Data Type Comparison Rules”章节,下面是官方文档中的隐式类型转换矩阵。从下面这个表格,我们就能对哪些数据类型能进行转换一目了然。
隐式转换的规则:
其实隐式类型转换发生在很多地方,只是我们很多时候没有留意罢了,不打算一一举例,自行翻阅官方文档的介绍,摘抄隐式类型转换的一些常见的规则如下:
The following rules govern implicit data type conversions:
- During INSERT and UPDATE operations, Oracle converts the value to the data type of the affected column.
- During SELECT FROM operations, Oracle converts the data from the column to the type of the target variable.
- When manipulating numeric values, Oracle usually adjusts precision and scale to allow for maximum capacity. In such cases, the numeric data type resulting from such operations can differ from the numeric data type found in the underlying tables.
- When comparing a character value with a numeric value, Oracle converts the character data to a numeric value.
- Conversions between character values or NUMBER values and floating-point number values can be inexact, because the character types and NUMBER use decimal precision to represent the numeric value, and the floating-point numbers use binary precision.
- When converting a CLOB value into a character data type such as VARCHAR2, or converting BLOB to RAW data, if the data to be converted is larger than the target data type, then the database returns an error.
- During conversion from a timestamp value to a DATE value, the fractional seconds portion of the timestamp value is truncated. This behavior differs from earlier releases of Oracle Database, when the fractional seconds portion of the timestamp value was rounded.
- Conversions from BINARY_FLOAT to BINARY_DOUBLE are exact.
- Conversions from BINARY_DOUBLE to BINARY_FLOAT are inexact if the BINARY_DOUBLE value uses more bits of precision that supported by the BINARY_FLOAT.
- When comparing a character value with a DATE value, Oracle converts the character data to DATE.
- When you use a SQL function or operator with an argument of a data type other than the one it accepts, Oracle converts the argument to the accepted data type.
- When making assignments, Oracle converts the value on the right side of the equal sign (=) to the data type of the target of the assignment on the left side.
- During concatenation operations, Oracle converts from noncharacter data types to CHAR or NCHAR.
- During arithmetic operations on and comparisons between character and noncharacter data types, Oracle converts from any character data type to a numeric, date, or rowid, as appropriate. In arithmetic operations between CHAR/VARCHAR2 and NCHAR/NVARCHAR2, Oracle converts to a NUMBER.
- Most SQL character functions are enabled to accept CLOBs as parameters, and Oracle performs implicit conversions between CLOB and character types. Therefore, functions that are not yet enabled for CLOBs can accept CLOBs through implicit conversion. In such cases, Oracle converts the CLOBs to CHAR or VARCHAR2 before the function is invoked. If the CLOB is larger than 4000 bytes, then Oracle converts only the first 4000 bytes to CHAR.
- When converting RAW or LONG RAW data to or from character data, the binary data is represented in hexadecimal form, with one hexadecimal character representing every four bits of RAW data. Refer to "RAW and LONG RAW Data Types" for more information.
- Comparisons between CHAR and VARCHAR2 and between NCHAR and NVARCHAR2 types may entail different character sets. The default direction of conversion in such cases is from the database character set to the national character set. Table 2-9 shows the direction of implicit conversions between different character types.
对上面官方文档资料的翻译如下,如有不对或不够确切的地方,敬请指出
1. 对于INSERT和UPDATE操作,ORACLE会把插入值或者更新值隐式转换为对应字段的数据类型。
2. 对于SELECT语句,ORACLE会把字段的数据类型隐式转换为变量的数据类型。
3. 当处理数值时,ORACLE通常会调整精度和小数位,以实现最大容量。在这种情况下,由此类操作产生的数字数据类型可能与在基础表中找到的数字数据类型不同。
4. 当比较一个字符型和数值型的值时,ORACLE会把字符型的值隐式转换为数值型。
5. 字符值或NUMBER值与浮点数值之间的转换可能不准确,因为字符类型和NUMBER使用十进制精度表示数字值,而浮点数则使用二进制精度。
6. 将CLOB值转换为字符数据类型(例如VARCHAR2)或将BLOB转换为RAW数据时,如果要转换的数据大于目标数据类型,则数据库将返回错误。
7. 当timestamp类型转换为DATE时(按照第三条,隐式转换不应该把timestamp转换为date,除非insert这样的),timestamp后几位会被truncated忽略,至于忽略几位,取决于数据库版本。
8. 从BINARY_FLOAT到BINARY_DOUBLE的转换是准确的。
9. 从BINARY_DOUBLE到BINARY_FLOAT的转换是不精确的,因为BINARY_DOUBLE精度更高。
10. 当比较字符型和日期型的数据时,ORACLE会把字符型转换为日期型。
11. 如果调用函数(过程)或运算符操作时,如果输入参数的数据类型与函数(存储过程)定义的参数数据类型不一致或不是可接受的数据类型时,则ORACLE会把输入参数的数据类型转换为函数或者过程定义的数据类型。
12. 当使用赋值符号(等号)时,右边的类型转换为左边的类型
13. 当连接操作(concatenation,一般为||)时,ORACLE会隐式转换非字符型到字符型
14. 如果字符类型的数据和非字符类型的数据(如number、date、rowid等)作算术运算,则ORACLE会将字符类型的数据转换为合适的数据类型,这些数据类型可能是number、date、rowid等。
如果CHAR/VARCHAR2 和NCHAR/NVARCHAR2之间作算术运算,则ORACLE会将她们都转换为number类型的数据再做比较。
15. 比较CHAR/VARCHAR2 和NCHAR/NVARCHAR2时,如果两者字符集不一样,则默认的转换方式是将数据编码从数据库字符集转换为国家字符集
下面简单举两个例子,看看隐式转换发生的场景:
例子:
SQL> create table test(object_id varchar2(12), object_name varchar2(64));
Table created.
SQL> insert into test
2 select object_id, object_name from dba_objects;
63426 rows created.
SQL> commit;
Commit complete.
SQL> create index ix_test_n1 on test(object_id);
Index created.
SQL> select count(*) from test where object_id=20;
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID 4bh7yzj5ma0ks, child number 0
-------------------------------------
select count(*) from test where object_id=20
Plan hash value: 1950795681
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 45 (100)| |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
|* 2 | TABLE ACCESS FULL| TEST | 3 | 24 | 45 (20)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_NUMBER("OBJECT_ID")=20)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
23 rows selected.
如上所示,这个发生隐式转换是因为这个规则: “当比较一个字符型和数值型的值时,ORACLE会把字符型的值隐式转换为数值型”(对于SELECT语句,ORACLE会把字段的数据类型隐式转换为变量的数据类型。似乎这个规则也对),此时由于隐式转换发生在OBJECT_ID字段上(TO_NUMBER("OBJECT_ID")),导致执行计划走全表扫描。如果我们稍微修改一下SQL的写法,就会发现执行计划会走INDEX RANGE SCAN。 如下所示:
SQL> select count(*) from test where object_id='20';
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7800f6da7c909, child number 0
-------------------------------------
select count(*) from test where object_id='20'
Plan hash value: 4037411162
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | INDEX RANGE SCAN| IX_TEST_N1 | 1 | 6 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"='20')
19 rows selected.
下面再介绍一个案例(当比较字符型和日期型的数据时,ORACLE会把字符型转换为日期型。),这种转换虽然大部分情况下都是正常的,但是有时候会成为一个隐藏的逻辑炸弹,当NLS_DATE_FORMAT环境变量改变时,则有可能出现错误或逻辑错误。
SQL> SELECT *
2 FROM scott.emp
3 WHERE hiredate between '01-JAN-1981' and '01-APR-1981';
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID czyc76busj56d, child number 0
-------------------------------------
SELECT * FROM scott.emp WHERE hiredate between '01-JAN-1981' and
'01-APR-1981'
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
|* 1 | TABLE ACCESS FULL| EMP | 2 | 74 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("HIREDATE"<=TO_DATE(' 1981-04-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "HIREDATE">=TO_DATE(' 1981-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')))
21 rows selected.
隐式类型转换问题
Implicit and Explicit Data Conversion
Oracle recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:
· SQL statements are easier to understand when you use explicit datatype conversion functions.
· Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.
· Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR2 value may return an unexpected year depending on the value of the NLS_DATE_FORMAT parameter.
· Algorithms for implicit conversion are subject to change across software releases and among Oracle products. Behavior of explicit conversions is more predictable.
虽然隐式转换在很多地方自动发生,但是不推荐使用隐式类型转换,Oracle官方建议指定显式类型转换,而不要依赖隐式或自动转换,主要有下面一下原因:
使用显式类型转换函数时,SQL语句更易于理解。
隐式类型转换可能会对性能产生负面影响,尤其是如果将列值的数据类型转换为常量而不是相反的数据类型转换操作时。
隐式转换取决于发生这种转换的上下文,在不同的情况下,隐式转换的工作方式可能不同。例如,从日期时间值到VARCHAR2值的隐式转换可能会返回错误(意外)的年份,具体取决于NLS_DATE_FORMAT参数的值。
隐式转换算法可能会在软件版本之间以及Oracle产品之间发生变化。明确转换的行为更容易预测。否则有可能埋下一个大坑。
如果在索引表达式中发生隐式类型转换,则Oracle数据库可能不使用索引,因为它是pre-conversion data type.。这可能会对性能产生负面影响。
Tom Kyte的这篇博文On Implicit Conversions and More,还总结了隐式数据类型转换会带来的一些问题:
The resulting code typically has logic bombs in it. The code appears to work in certain circumstances but will not work in others.
- The resulting code relies on default settings. If someone changes the default settings, the way the code works will be subject to change. (A DBA changing a setting can make your code work entirely differently from the way it does today.)
- The resulting code can very well be subject to SQL injection bugs.
- The resulting code may end up performing numerous unnecessary repeated conversions (negatively affecting performance and consuming many more resources than necessary).
- The implicit conversion may be precluding certain access paths from being available to the optimizer, resulting in suboptimal query plans. (In fact, this is exactly what is happening to you!)
隐式转换可能会阻止某些访问路径无法用于优化器,从而导致查询计划不理想。 (实际上,这正是您数据库当中正在发生的事情!)
- Implicit conversions may prevent partition elimination.
其实上面已经有相关例子介绍,下面介绍一个例子,主要用来说明,隐式类型转换不一定导致执行计划不走索引,只有当隐式转换函数出现在查询条件中的索引字段上,而且左值的类型被隐式转为了右值的类型时才会出现严重性能问题。
SQL> drop table test;
Table dropped.
SQL> create table test
2 as
3 select * from dba_objects;
Table created.
SQL> create index ix_test_n1 on test(object_id);
Index created.
SQL> select count(*) from test where object_id='20';
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 29jmhh43kkrv4, child number 0
-------------------------------------
select count(*) from test where object_id='20'
Plan hash value: 4037411162
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | INDEX RANGE SCAN| IX_TEST_N1 | 10 | 130 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=20)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
23 rows selected.
SQL>
其实SQL语句发生了隐式转换,而且转换的地方在字符串’20'上面,转换为数字20。这样的变化没有发生在OBJECT_ID列上面。其次,这种转换没有发生在左值列上面,没有影响到IX_TEST_N1的路径。
所以以后,如果遇到”隐式转换一定不走索引吗?”或”隐式类型转换一定导致索引失效吗?”这类问题,你都要辩证的来分析,不能一概而论。
下面介绍一个绑定变量发生隐式类型转换的例子:
SQL> create table test
2 as
3 select * from dba_objects;
Table created.
SQL> commit;
Commit complete.
SQL> create index ix_test_object_name on test(object_name);
Index created.
SQL> variables v_object_name nvarchar2(30);
SP2-0734: unknown command beginning "variables ..." - rest of line ignored.
SQL>
SQL> variable v_object_name nvarchar2(30);
SQL> exec :v_object_name :='I_OBJ1';
PL/SQL procedure successfully completed.
SQL> select count(*) from test where object_name=:v_object_name;
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID ft05prnxtpk9u, child number 0
-------------------------------------
select count(*) from test where object_name=:v_object_name
Plan hash value: 1950795681
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 113 (100)| |
| 1 | SORT AGGREGATE | | 1 | 66 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | TABLE ACCESS FULL| TEST | 10 | 660 | 113 (11)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("OBJECT_NAME")=:V_OBJECT_NAME)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
23 rows selected.
这里发生隐式类型转换,是因为隐式类型规则:“比较CHAR/VARCHAR2 和NCHAR/NVARCHAR2时,如果两者字符集不一样,则默认的转换方式是将数据编码从数据库字符集转换为国家字符集” ,而此时是借助内部函数SYS_OP_C2C实现的
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
如何找出存在隐式转换的SQL?
有些公司可能对发布的SQL进行全面审计,能够从源头上杜绝大多数存在隐式类型转换的SQL,但是大多数公司可能没有这个能力或资源来实现这个目标,那么,最重要的就是如何找出数据库中存在隐式转换的SQL,关于如何找出存在隐式数据类型转换的SQL,一般有下面两个SQL:
SELECT
SQL_ID,
PLAN_HASH_VALUE
FROM
V$SQL_PLAN X
WHERE
X.FILTER_PREDICATES LIKE '%INTERNAL_FUNCTION%'
GROUP BY
SQL_ID,
PLAN_HASH_VALUE;
SELECT
SQL_ID,
PLAN_HASH_VALUE
FROM
V$SQL_PLAN X
WHERE
X.FILTER_PREDICATES LIKE '%SYS_OP_C2C%'
GROUP BY
SQL_ID,
PLAN_HASH_VALUE;
但是需要注意的是,即使执行计划中存在INTERNAL_FUNCTION,也不一定说明SQL语句出现了隐式数据类型转换,关于这个问题,参考我的博客“ORACLE数据库中执行计划出现INTERNAL_FUNCTION一定是隐式转换吗?”。 所以还必须对找出的相关SQL进行仔细甄别、鉴定。
另外,这篇博客“ORACLE中内部函数SYS_OP_C2C和隐式类型转换”,也值得对隐式类型转换了解不深的同学看看。
如何避免隐式类型转换呢?
1:在数据库设计阶段和写SQL期间,尽量遵循一致的原则,避免不必要的数据类型转换。
在建模时,要统一字段类型,尤其是和其它表进行关联的相关字段必须保证数据类型一致。这样可以避免不必要的隐式数据类型转换。
查询SQL中条件与字段类型保持一致,另外,确保绑定变量的数据类型。使其与对应字段的数据类型一致
2:使用转换函数,进行显示类型转换。
例如有下面一些常见的类型转换函数:
· TO_CHAR:把DATE或NUMBER转换成字符串;
· TO_DATE:把NUMBER、CHAR或VARCHAR2转换成DATE。当用到时间戳时,可以用到TO_TIMESTAMP或TO_TIMESTAMP_TZ。
· TO_NUMBER: 把CHAR或VARCHAR2转换成NUMBER。
3:创建带有SYS_OP_C2C的函数索引。
这种方法比较少用,不过确实也是特殊场景下的一种优化方法。
参考资料:
https://blogs.oracle.com/oraclemagazine/on-implicit-conversions-and-more
https://docs.oracle.com/cd/E21764_01/apirefs.1111/e12048/cql_elements.htm#CQLLR290
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Type-Comparison-Rules.html#GUID-98BE3A78-6E33-4181-B5CB-D96FD9DC1694