When I started using Salesforce.com, honestly, the API really scared me because I was completely unfamiliar with how it works or what exactly I needed to do. I new that using the APEX data loader to download daily data was not the best way to make a backup of my organizations data and that CSV files were not the optimal way to accomplish this.
Most large companies use Oracle for the database needs and it really works well with Oracle SQL*Loader. I use this along with the APEX CLI data loader on Linux to create an automated data process which runs on a CRON job every 4 hours to dump the tables which I have designated in a local Oracle database which I created using an outstanding tool called the
Developers SnapShot Pro:
Developers' SnapShot Pro, from Upside Outcomes, is a client application for people making enhancements to a Salesforce.com implementation. SnapShot provides a means by which users may produce a detailed report on their specific salesforce.com schema in text, HTML, Microsoft Word or Oracle SQL formats.
I worked with
Gareth Davies, the founder of
Upside-Outcomes, and explained what I was looking for and he made the necessary changes that would connect to Salesforce.com's API and change the data types into a valid Oracle scheme, which in turn would allow the creation of the table needed to hold your organizations data from your instance of Salesforce.
First I installed the Salesforce Data Loader on Linux and created some shell scripts to login to the API and connect to salesforce. Then using and modifying the example from the data loader example, I was able to use their SOQL (salesforce SQL Language) to create the queries needed to download the data. Once I troubleshooted a few small issues I was able to set the process on a CRON job and schedule it to run as many times as necessary.
Once this was running without any errors I worked with a DBA to create SQL loader process which would take the data and populate the tables each time the process was completed. This part of the process truncates the table in question and then uses the data from the CSV files and inserts all the records into the your local Oracle database. Again I had to troubleshoot a few problems, but those were minimal. Once this part of the process was complete, I added it to the shell script. SQL *Loader cannot work with CSV files, so you can just rename the CSV files to DAT files (e.g. accounts.csv to accounts.dat) and now the loader can use the files to populate your local copy.
So the process works like this every 4 hours: 1) it connects to Salesforce.com using the API and data loader. 2) It downloads the data into CSV/DAT files. 3) SQL *Loader then truncates the tables and re-populates the data from the files into your local database.
Now not only do you have a backup of your data, but you can use this data to compare and write queries to compare what is in salesforce and your other organizations data, which in my companies case, has internal systems which are the database of record to populate our salesforce instance. We use this data to do inserts, updates, and upserts to our instance of salesforce.com