Topic
10 replies Latest Post - ‏2013-06-14T15:27:02Z by David Birmingham
NV
NV
61 Posts
ACCEPTED ANSWER

Pinned topic Oracle to Netezza Data Loads (POC)

‏2013-03-15T12:53:05Z |
Hi All,
 
We have to do a POC (Netezza is up and running, TF12) where we need to take few tables from Oracle and load it into Netezza and test some queries/reports for performance.The Biggest Fact table has about 70M records and 15 columns
 
What is the best approach  to move this data, the options available are Ab-Initio/PL-SQL code?
Or can we create pipe delimited csv files and load it to Netezza via External Tables..in this case where should this file(s) reside?
 
Appreciate if some one can help with what are minimum requirements for doing this POC from
 
1. Linux/DB Access Privileges perspective 
2. What client tool(s) are needed to connect to the Netezza box
3. Best approach to load the data from Oracle to Netezza..
4. What Driver Set up would be needed for this or are these already available as part of the initial install
5. Anything else that would be helpful/Imp to know
 
Thanks in advance, 
NV 
Updated on 2013-04-10T14:12:32Z at 2013-04-10T14:12:32Z by David Birmingham
  • nz_newb
    nz_newb
    2 Posts
    ACCEPTED ANSWER

    Re: Oracle to Netezza Data Loads (POC)

    ‏2013-03-16T00:02:41Z  in response to NV
     We also got into NZ very recently.
    3.We tried FastReader, Oracle datapump, and Informatica mappings to load data from Oracle to Netezza.
    We decided to go with Informatica mappings as it is cost effective for us.  
    2. You can use any tool that can connect to ODBC to connect to Netzza box. We use WinSQL, Data Analyst from Toad, or Aginity (exclusive for Netezza database).
    4. With Informatica, we have to load ODBC drivers for netezza. I don't know how it would work for abinitio.
    5. It was all learn as you go for us. They may not be applicable for your application.
     
    • NV
      NV
      61 Posts
      ACCEPTED ANSWER

      Re: Oracle to Netezza Data Loads (POC)

      ‏2013-03-18T14:03:10Z  in response to nz_newb
       Thanks nz_newb
       
      Since we cannot change the distribution key once we create for a table, how does it work in Dev when we are testing various combinations...Do we drop the table and recreate it every time we need to try a different distribution key?
       
      Also for initial testing, is this approach the right one
      1. Get the DDL for the existing tables in Oracle, remove everything except the table creation script and Primary/Foreign key (remove the partitioning, indexing..) and create tables in Netezza with relevant Distribution key
      2. Create External tables and load the pipe delimited files (data from the oracle tables) in to Netezza 
       
      Thanks, 
      NV 
      • David Birmingham
        David Birmingham
        391 Posts
        ACCEPTED ANSWER

        Re: Oracle to Netezza Data Loads (POC)

        ‏2013-06-14T15:21:59Z  in response to NV

        Use nzload and not external tables

        Initially load with random distribution - this will avoid unexpected skew on the initial load. Then profile the data to find the best distribution. You don't have to throw the table away, just rename it. One of our sites had made the BATCH_ID the distribution key. Just one problem- the BATCH_ID is the same value for all records, so it was loading everything to a single dataslice on the machine. Until you know what your data looks like, don't distribute the data just yet. Distribution is a physical effect and your queries have to use the keys to get benefit from it. If you pick a key that none of your queries use, then you are really no better than random distribution anyhow.

        Also to test a distribution you can just pick a single column and use a test table like:

        create table test as select thiscolumn from mytable distribute on (thiscolumn);

        then check the nzadmin tool for skew, or use:

        select count(*) , datasliceid from test group by datasliceid order by datasliceid ;

        and see if any of the dataslice record counts are way out of line. they should all be hovering around the same record count give or take some minor slog.

        As you note, it is best to use delimited files, whether the pipe actually appears in your data only you would know. The nzload is easier to program, load and configure. External tables have specific usages and loading is not one of them. Too much infrastructure for a simple task.

         

  • Norbert Kremer
    Norbert Kremer
    21 Posts
    ACCEPTED ANSWER

    Re: Oracle to Netezza Data Loads (POC)

    ‏2013-03-18T17:45:08Z  in response to NV
    Hi,  There are a few things to help you get started.

    Your data volumes are not too large, so do whatever is fastest in terms of your development effort, and do not worry about performance, at least on the Netezza side. NZ will take in your data volumes in seconds to minutes at most.

    As for client tools, you can install Netezza drivers and client tools on a windows system. On Windows, you will have nzload but not nzsql. Get the drivers and client software from IBM Fix Central.

    If you want to use nzsql, you will either need to install nz client tools on a unix or linux system, or you can log in to the nz host as user nz, and use the nzsql command line there.  Note that using the nzhost long-term for data loading and any other tasks not directly related to running NPS is not recommended or supported by IBM, but is often done during POC's.
     
    For database access, you'll need the ADMIN account, or else set up new uses with ADMIN that have sufficient privileges to load data. See Database User's Guide for more info. Documentation is installed when you install the client tools (at least on Windows), or you can get doc pack separately on Fix Central.

    There are several ways to migrate schema and or data from Oracle to Netezza. If you're on Windows, the Aginity Workbench for Netezza can often do the trick and is very easy to use. This is free to use, after registration.  See Tools | Import | From External Database. Make sure you have a 64 bit Oracle client installed (oracle instant client is OK.)

    The AWBN will also handle the DDL and data separately, if you wish. See Tools | Migrate.

    Also, a user named xgo posted a way of converting Oracle DDL to Netezza DDL here:
    http://www.enzeefrenzy.com/message/8132#8132
    I'm not sure if xgo has moved over to the new forum.

    If AWBN does not work, or if you're not on Windows, then you can either dump data from Oracle to flat files (typically delimited) and use nzload to bring into Netezza. Or you can avoid landing to disk at all by sending output from Oracle to a named pipe. There are many posts on how to do this at the old site: http://www.enzeefrenzy.com/threads

    You should probably use RANDOM distrituion to start with. Your goal is to simply get the data moved into Netezza. Once in NZ, you can make copies of your tables with different distribution keys, using a very fast CTAS operation.
     
    Try it out, and post your results here. 

    • NV
      NV
      61 Posts
      ACCEPTED ANSWER

      Re: Oracle to Netezza Data Loads (POC)

      ‏2013-03-18T18:14:22Z  in response to Norbert Kremer
       Hi Norbert,
       
      Thanks for this detailed info, it is extremely helpful and highly appreciated.
       
      Couple of follow Up Questions, 
       
      when you say
      "f AWBN does not work, or if you're not on Windows, " 
      You mean Windows Client right?
       
      when you say
      "If you want to use nzsql, you will either need to install nz client tools on a unix or linux system, or you can log in to the nz host as user nz, and use the nzsql command line there" 
      I am planning to use Putty to connect to the NZ Command Line and then use nzsql..this is what you mean right?
       
      Thanks, 
      NV 
      • Norbert Kremer
        Norbert Kremer
        21 Posts
        ACCEPTED ANSWER

        Re: Oracle to Netezza Data Loads (POC)

        ‏2013-03-18T18:47:34Z  in response to NV
        Yes, for client side tools,   AWBN is the Aginity Workbench for Netezza. It is available only for Windows, although it works well on Mac too if you have Windows running as a VM.  As this tool is excellent and free (after registration) it is what most people use.
         
        IBM supplies a GUI client tool called NZAdmin, also only for Windows. The full name of this tool is "IBM Netezza Server Administrator"  On Fix Central, look for Windows client software to match the version of the NPS Server that you have. In the zip file, you will find 3 folders for NZAdmin, drivers and documentation.
         
        For AWBN, you must install either ODBC or OLEDB drivers, or both. For NZAdmin, you need not install any drivers, it seems to use nz wire protocol (I'm 99% sure about this, someone please confirm.)   JDBC driver is also there if you want to use Squirrel SQL or other Java-based clients.

        When you install NZAdmin, you also get some command line tools, but they may not be on the PATH.  Look in C:\Program Files (x86)\IBM Netezza Tools\Bin for these.  You will not find nzsql there.  For nzsql you must use a unix or linux client.
         
        Yes, you may use putty to connect to the NZ host system. This is fine for POC.

        Longer-term, you may want to have a separate unix or linux system in your environment to handle data loading, unless you are planning to use an ETL tool.  Netezza client tool bundles are available for Solaris, linux and a couple of other unix flavors, all on Fix Central.  If you are using an ETL server, you will need to install at least the drivers and probably the full client package onto the ETL server systems.  On unix or linux, this means using ODBC or JDBC, depending on the tool.  If ODBC, you may have to install a Driver Manager as well as the driver itself. This is covered in detail in the manual Netezza_odbc_jdbc_guide.pdf
         
        • NV
          NV
          61 Posts
          ACCEPTED ANSWER

          Re: Oracle to Netezza Data Loads (POC)

          ‏2013-03-18T19:09:16Z  in response to Norbert Kremer
           Thanks Once again Norbert..This is very helpful.
           
          We are planning to use to Ab-Initio for the ETL piece going forward and I believe it does provide the connectivity to Netezza, although I am not an expert in that.
           
          Maybe someone who has used Ab-Initio with Netezza can help/confirm/suggest
           
          Thanks, 
          NV 
          • David Birmingham
            David Birmingham
            391 Posts
            ACCEPTED ANSWER

            Re: Oracle to Netezza Data Loads (POC)

            ‏2013-04-10T14:12:32Z  in response to NV
            Yes Ab Initio can load into Netezza and can do it transparently in-parallel
             
            For your initial push, just extract the data from the Oracle platform into delimited files onto a file server that is visible to the Netezza machine. Barring that, stand up a Linux server (or use one you already have) and install the nz client on it. You will want to use nzsql and nzload. nzsql will help you set up the databases in plain vanilla form - use the nz account with ADMIN privileges. (this is all for a POC, right?)
             
            Use nzload to load up the tables - don't bother with external tables, if at all - they have very specific utility and general loading is not one of them . nzload is more flexible, simpler and less overhead.  
             
            Then as noted above, make yourself a staging database (in Netezza, the database is a schema) and put the tables in it that are identical to the Oracle tables, with RANDOM distribution. You will want to use random for the intial loading to avoid the possibility of designating a distribution key that might have high skew and slow the loading process. Once the data is inside, profile it by finding highs/lows/ counts-of-distincts etc. You can validate the load with simple summaries like count-of-distinct dates/varchars, summary of numeric etc.
             
            Then set up a target database with the schema that you want to move into. This can be a scratchpad for the purposes of the POC. But you don't have to drop-and-recreate each time to test a distribution. Just build out a table with one distribution and put a view on it with "select * from table name" - when you want to build out another table just use another tablename and then use a create-or-replace view to the new table. But keep the old table in place for now as you might want to go back to it for comparison. Keeping it around does not hinder performance.
             
            As for experimenting with a distribution, find out which ones have a good skew by just using the given key candidate column and not the whole table. For example: 
             
            create table mytest as select candidate_column from that_table distribute on (candidate_column);  
             
            Then check the skew on the table. Always center on distributions that are useful for your data integration and avoid manufacturing one that you will never use in a join phrase. Distribution should evenly distributed data, but is only useful for co-located joins if the participating  tables share a distribution key and this key is actually used in the join.
             
            also avoid using multiple distribution keys. They are hashes, not indexes. If you use multiple keys on one table you will have to use the same multiple keys on another table to get a co-located join.  
             
            Fact tables should share the distribution of your largest (or most active) dimension. That is, if your customer dimension is the largest and is distributed on customer_id, then your fact table should also be distributed on customer_id to get co-located joins between them. Never, ever put a fact table on RANDOM distribution or a manufactured surrogate/sequence that will never be used in a join.
             
            Never, ever distribute on a date. For any reason whatsoever. Doing so would require a significant business or technical justification because dates invariably end up with process-skew (focusing all the work on one CPU - where all the dates are - rather than spreading out the work)
             
             
            From the Enzee Universe Best Practice Sessions - we assembled these four presentations for Enzee users as a focused presentation: 
             
             
             
             
              
          • David Birmingham
            David Birmingham
            391 Posts
            ACCEPTED ANSWER

            Re: Oracle to Netezza Data Loads (POC)

            ‏2013-06-14T15:27:02Z  in response to NV

            Yes Ab Initio does connect with Netezza through ODBC. Our experience with it was unfortunate in that IBM helped with the ODBC setup but Ab Inito support was not helpful at all. My client basically "sat in the water" for an unnecessarily  protracted period of time and then when I came on site we got everyone into the "room" so to speak and hammered it out in a couple of days. ODBC is straightforward on Netezza but with AbI it seemed a bit harder than it needed to be. I would say make no assumptions and if you hit a wall, do not presume that it's simple and you just haven't figured it out yet. Get both vendors together and start a conversation because email exchanges will only delay this eventuality.

  • Glenn Steffler
    Glenn Steffler
    18 Posts
    ACCEPTED ANSWER

    Re: Oracle to Netezza Data Loads (POC)

    ‏2013-04-09T20:52:26Z  in response to NV
     IBM provides solutions for data movement to replicate only the "changed data" from the Oracle system into Netezza.