Wednesday, November 14, 2012

HOW TO SQL - NULL values


NULL values often generate unexpected results when they appear in SQL queries. This is because not always these NULL values are understood. So, what about those NULL  values?
LNNVL
NULLIF
NVL
NVL2

It’s probably best to begin with a few words about what NULL is NOT:
·         NULL is not the number 0
·         NULL is not the empty string
NULL is a special placeholder in SQL used to indicate that the value does not exist in the database, that the value is unknown.
Now let’s explore some unknown NULL values!
·         Comparison with null values is done with special syntax IS NULL or IS NOT NULL, and doesn’t use comparison operands like =, <>, >, <.

select employee_id from employees where manager_id=NULL;

no rows selected

SQL> select employee_id from employees where manager_id IS NULL;

EMPLOYEE_ID
-----------
        100

·         Because the value of NULL values is unknown, unknown is also the result of any operation with NULL.

select 5+7+null+9 as suma from dual;

      SUMA
----------
       NULL

·         The truth table is also influenced by NULL values as follows, an interesting case being FALSE or NULL = NULL:

Bool1           Bool2           AND              OR
TRUE             TRUE             TRUE             TRUE
TRUE             FALSE           FALSE           TRUE
TRUE             NULL            NULL            TRUE
FALSE           TRUE             FALSE           TRUE
FALSE           FALSE           FALSE           FALSE          
FALSE           NULL            FALSE           NULL
NULL            TRUE             NULL            TRUE
NULL            FALSE           FALSE           NULL
NULL            NULL            NULL            NULL
 
·         To view how unknown NULL values are treated in databases, let’s take the example of DEPARTMENTS table, where there are some departments with no manager (manager_id is NULL).

SELECT * FROM DEPARTMENTS WHERE MANAGER_ID <= 200
UNION ALL
SELECT * FROM DEPARTMENTS WHERE MANAGER_ID > 200;

This SQL, which is a UNION ALL between 2 sets of data that covers all values for MANAGER_ID will not return all departments, as someone would expect, but only those with a known value for manager_id.  

10     Administration       200    1700
30     Purchasing           114    1700
50     Shipping             121    1700
60     IT                   103    1700
80     Sales                145    2700
90     Executive            100    1700
100    Finance              108    1700
20     Marketing            201    1700
40     Human Resources      203    2700
70     Public Relations     204    2700
110    Accounting           205    1700

·         Another thing that is related to the unknown character of NULL values is that NULL values aren’t stored in INDEXES. This is why queries having condition IS NULL don’t use index, even if there is one defined for the column specified in condition. On the contrary, the index is used in queries having conditions IS NOT NULL.

select * from employees where manager_id is null;

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    69 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES |     1 |    69 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

select * from employees where manager_id is not null;

--------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   107 |  7383 |     9  
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES      |   107 |  7383 |     9  
|*  2 |   INDEX FULL SCAN           | EMP_MANAGER_IX |   107 |       |     1  
--------------------------------------------------------------------------------

·         In sorting operation, NULL values appear LAST in ASC order and FIRST in DESC order.

select * from departments order by manager_id ASC;

           90 Executive                             100        1700
           60 IT                                    103        1700
          …….
          110 Accounting                            205        1700
          230 IT Helpdesk                           NULL       1700
          240 Government Sales                      NULL       1700
          ……
          140 Control And Credit                    NULL       1700


select * from departments order by manager_id DESC;

          140 Control And Credit                    NULL       1700
          ………
          160 Benefits                              NULL       1700
          110 Accounting                            205        1700
          ……
           90 Executive                             100        1700

·         All aggregate functions except COUNT(*) and GROUPING ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null. COUNT never returns null, but returns either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.

SELECT COUNT(*) FROM DEPARTMENTS;

  COUNT(*)
----------
        27

SQL> select count(manager_id) from departments;

COUNT(MANAGER_ID)
-----------------
               11

select min(manager_id) from departments;

MIN(MANAGER_ID)
---------------
            100

SQL> select max(manager_id) from departments;

MAX(MANAGER_ID)
---------------
            205

·         NULL does not equal NULL or other value

select * from a;

        C1         C2
---------- ----------
         1          1
         1          2
         1      NULL
      NULL      NULL

select * from a where c1=c2;

        C1         C2
---------- ----------
1             1

·         The NULL-related functions facilitate null handling. The NULL-related functions are:
COALESCE

·         Null have different behavior in NOT  IN and NOT EXISTS.
Take, for example, the case of returning all employees that has no managers:

SELECT * FROM HR.EMPLOYEES WHERE EMPLOYEE_ID NOT IN ( SELECT MANAGER_ID FROM HR.EMPLOYEES );

no rows selected

SELECT * FROM HR.EMPLOYEES E1 WHERE NOT EXISTS ( SELECT 1 FROM HR.EMPLOYEES E2 WHERE E2.EMPLOYEE_ID = E1.MANAGER_ID);

        100 Steven               King
SKING                     515.123.4567         17-JUN-98 AD_PRES         24000
                                     90
                or departments with no employees:

SELECT department_id FROM hr.departments
WHERE department_id
NOT IN (SELECT department_id FROM HR.EMPLOYEES);

            no rows selected
           
SELECT DEPARTMENT_ID FROM HR.DEPARTMENTS d
WHERE NOT EXISTS (SELECT 1
FROM HR.EMPLOYEES e where e.department_id = d.department_id);

DEPARTMENT_ID
-------------
          120
          130
          140
………

As you can see, this happens because one or more values in NOT IN clause are NULL (UNKNOWN), and the predicate  department_id NOT IN (x,y,NULL) evaluates to neither TRUE, nor FALSE, but to UNKNOWN (NULL).
This is why, NOT IN can be used if all the values that must be evaluated differ from NULL value.

As a conclusion of this short intro, the existence of NULL values into a database introduces a new degree of uncertainty. If not understood, a lot of guessing must be done by an SQL programmer to counter for erroneous results of NULL values in a database.