I would not get into that I would simply create a simple mapping in the database to store the original filename and then use md5(filename) as actual filename and use that as lookup php is not really good at unicode encoding at the moment although its getting better there are some filesystem issues as well
Basically, users could have access directly on the folder. (through http, ftp). According to your suggestion I have to re-generate all the folder-structure with its files in original name (using maps-datatable) right ?
Is there another way? (The simplest is upload the file with the same name but how ?
As I said in the start of this topic there are issues of encoding…
This is quite normal and I don’t see anything wrong. You told (with first line in example) PHP, that it should do conversion from Greek to UTF-8 (or the other way, it doesn’t matter here), so it was able to properly handle only English (no special characters) and Greek special characters filenames. There is no way, this could handle Chineese or any other special characters (like Polish for example). I would be very surprised, if that would work.
If you would use the same line, but gave Chineese code as second parameter for iconv, your script would handle English and Chineese characters without any problems, but would fail on Greek and any other. This seems to be logic (at least for me).
First of all, storing files on server in any character set other than English is a hellish idea and a complete madness! Your PHP and Apache supports UTF-8, but your file system certainly not. You’ll end up with doubled files, files with incorrect filename, not-downloadable files etc. etc. You’re asking yourself for a real troubles. Are you boring and looking for some challenges? :]
Even, if you can assure, that your server’s (Linux?) file system is 100% UTF-8 ready and can write UTF-8 encoded filenames, HTTP upload protocol will cause another large bunch of troubles (a piece of which you have already tasted), if you attempt to transfer files with non-English characters in names.
If you would like to support all the languages, with above (iconv) method, you would have to:
find the way to determine, in which language or alphabet filename of file is written (is it possible at all?),
transfer this language setting along with transmitted file,
set iconv second parameter according to transmitted value of detected language.
This is madness, let me underline this again. For example, pilot is a word valid in English, Polish and probably many other languages. The same as stop. You can name hundreds of such examples. How you’re going to detect language of filename correctly in this case? Take some time and test Google Translate with language autodetection option enabled, to see how often it made mistakes.
You can ask user to set language of his file’s filename (using some combobox for example). But, what, if he made a wrong selection? This is even bigger madness.
My advice: the only reasonable solution here is to store filenames in English only and break file transfer, if you detect, that it contains non-English (non-Latin actually) characters in filename.
If you really need to support non-Latin character names and there is no other way (kill the project manager, tell the customer, that implementation of this will cost a million dollars and hire someone to write a new PHP for you), you maybe can consider letting users to transfer files directly via FTP (no HTTP file upload) and somehow bind files transfered this way with your application. This is also a madness. Take some time and test Total Commander (which has quite good FTP client on-board) to see how many times it gets wako, if you try to upload or download any file with non-Latin filename.
I had so many problems with simple French “e” (they’ve got five different of them there, with and without accents pointed to left, right, etc.), which made may server to go completely wako and to generate two separate files (one with “e” with accent and one with “e” without accent) and to do a lot more stupid things. I said then to myself: no f*ing way! Hell is going to get frozen earlier than I’m going to let users upload files with non-Latin characters in filenames.
I suppose that was more easy than it!.. Database (using utf8_general_ci) is compatible with any character.
It seems the file-system is much more complex (according Linux-windows operating system, url’s etc)
If the server file-system is in utf-8 then has the same problems ?
I just test it with my gmail attaching files with no-latin characters. The name files download correctly (with original filename) but the url’s is not directly to the server url path. (http header is used)
So the solution is using map between original filename and stored files (using encoded) ?
I can’t tell you! For me, even thinking about this causes a headache! :]
First of all, this Google. This looks like mapping, alirz23 been talking about. But, it actually only looks like mapping, while in fact can have nothing to do with this. The entire file you attach, can be stored in database and nothing on server. They have endless free space and insane big databases.
Second of all, this is Google. They store, measure, count, track, write anything. And have special tool for that, so they upload tool is most likely something sophisticated and you can’t compare anything to it.
Third of all, this is Google. I don’t like them. Which doesn’t change the fact, that I can’t live without their tools. World is sad! :[
I’ve recalled why I myself followed the way alirz23 has suggested.
The biggest con about using the original file names for saving is that the user can easily overwrite the existing file with the uploaded file because of the file name collision.
After all, it looked me that saving files with unique names (using hash, timestamp, serial number or something else) and storing the original file names in the file management database would be more simple, robust and flexible.
very difficult to implement even if you get it working there are going to be others problems too filesystem for one your php, apache, headers … this is a nightmare I would rather a write a client to handle the file downloads and uploads just avoid the ftp completely