I tried building a table 200 or so columns wide (that took a while!), but MySQL choked, complaining that the total row width exceeds 65536 bytes. LOAD DATA runs extremely quickly, but things don’t work smoothly with this many columns. I would still need to do a second phase of moving the imported data to normalized tables anyway. A two hundred column table sounds like awfully bad data design to me anyway!
I’m also finding data errors in the import data, which need to be accounted for. So, line by line brute force is looking like the best solution in terms of reliability. It’s also looking like the specific columns may change over time, which makes the 200 column table sound even more unwieldy.
For what it’s worth, my changed data design has a main table with very few columns, about 200,000 rows per import, and then a second table (one-to-many relationship) with key/value pairs representing each of the columns for that row, about 41,000,000 rows per import.
The imports are an overnight batch process. From that I generate various reports. The report generation is also a batch process. Anything we might want to look at interactively (i.e., Yii-based web pages), I can wrangle the data into reporting tables. In effect, any "view" of interest gets built as a separate table.
That’s the cost of storing everything as key/value pairs. It’s hard to do joins, etc.! But only a very few data items are of interest for the summaries, so it’s easy enough to extract that data into tables designed for that purpose.
That’s the luxury of doing the batch process overnight. I can wrangle the data once, and read-only interaction during the day can be fast and responsive.
For what it’s worth, I’ve been driving my development via PHPUnit. The batch functionality is in components/, and can be called by PHPUnit directly. The actual commands (./yiic mystuff mystuff) can be called in PHPUnit via exec(), allowing PHPUnit to check the exit code but not much else.