Downloading UTF-8 filenames from an FTP-server that does not support them, with System.Net.FtpWebRequest

I have written a windows service in .Net that monitors an FTP location for new files, which it then downloads to a local folder for processing. To be precise, the tool first renames the file before downloading it. It uses the .Net standard FtpWebRequest class from System.Net. Every once in a while, a file could not be downloaded, and the FtpResponse would be 550 – Action not taken, file unavailable. This happened especially if the file name had “special characters” in it, one example being a file named ANDRÉ.XML. The workaround was simple: I’d launch an FTP application, rename the file to ANDRE.XML, and it would quickly be picked up by my tool again.

It bothered me that somehow the FTP application was able to rename the file, while FtpWebRequest was not. (It also bothered me that every once in a while I had to fix a stuck file like this, but let’s not go down that road here.) So I attached a trace log to see what was going on.

Attaching a trace log

Now, how do you do that? First, make sure you have the DEBUG and TRACE constants set in your project options. They are set by default if you build in Debug mode and haven’t customized anything. Then open up the application’s config file and added a source named “System.Net” to the (existing) system.diagnostics / sources section:

<system.diagnostics>
  <sources>
    <!-- added lines below -->
    <source name="System.Net">
      <listeners>
        <add name="System.Net" />
      </listeners>
    </source>

A bit down, in the switches section, add System.Net below DefaultSwitch:

<switches>
  <add name="DefaultSwitch" value="Information"/>
  <!-- added line below -->
  <add name="System.Net" value="Verbose" />

Now start the application again. Set a breakpoint on the line the 550 was received. As soon as it hits, switch to the Output debug window and check what you see.

Slightly edited for brevity, it looked a bit like this:

System.Net Verbose: 0 : [6052] WebRequest::Create(ftp://myftpserver/in/ANDRÉ.XML)
System.Net Information: 0 : [6052] FtpWebRequest#33506938::.ctor(ftp://myftpserver/in/ANDRÉ.XML)
System.Net Verbose: 0 : [6052] Exiting WebRequest::Create()     -> FtpWebRequest#33506938
System.Net Verbose: 0 : [6052] FtpWebRequest#33506938::GetResponse()
System.Net Information: 0 : [6052] FtpWebRequest#33506938::GetResponse(Method=RENAME.)
...
System.Net Information: 0 : [6052] Associating FtpWebRequest#33506938 with FtpControlStream#56641426
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [RNFR in/ANDRÉ.XML]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [550 RNFR command failed.]
System.Net Information: 0 : [6052] FtpWebRequest#33506938::(Releasing FTP connection#56641426.)
System.Net Error: 0 : [6052] Exception in the FtpWebRequest#33506938::GetResponse - The remote server returned an error: (550) File unavailable (e.g., file not found, no access).

Not much new information here. So I scrolled up a bit to the part where the connection was made and the directory listing retrieved.

System.Net Information: 0 : [6052] FtpWebRequest#33163964::GetResponse(Method=NLST.)
...
System.Net Information: 0 : [6052] Associating FtpWebRequest#33163964 with FtpControlStream#56641426
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [220 (vsFTPd 2.0.5)]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [USER username]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [331 Please specify the password.]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [PASS ********]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [230 Login successful.]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [OPTS utf8 on]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [501 Option not understood.]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [PWD]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [257 "/"]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [TYPE I]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [200 Switching to Binary mode.]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [PASV]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [227 Entering Passive Mode (...)]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [NLST in]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [150 Here comes the directory listing.]
System.Net Verbose: 0 : [6052] Exiting FtpWebRequest#33163964::GetResponse()
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [226 Directory send OK.]
System.Net Information: 0 : [6052] FtpWebRequest#33163964::(Releasing FTP connection#56641426.)

One remarkable thing here:

System.Net Information: 0 : [6052] FtpControlStream#56641426 - Sending command [OPTS utf8 on]
System.Net Information: 0 : [6052] FtpControlStream#56641426 - Received response [501 Option not understood.]

The server does not seem to support UTF-8. We expected that much already, but it is nice to see a confirmation. It looks like the FtpWebRequest is not accommodating for that, since it still requests the file in UTF-8 file name style. Still, other clients could rename the file without problem, so they are probably requesting the file in a different way. Hmmm… so, what if I fake it?

Fixing the download

In binary, ANDRÉ looks like this: (65, 78, 68, 82, 195, 137). I wrote a small function to make that binary representation literal and convert it back to string:

private string usableFileName(string remoteFileName)
{
	byte[] utf8array = System.Text.Encoding.UTF8.GetBytes(remoteFileName);
	string result = "";
	foreach (byte b in utf8array) {
		result += (char)b;
	}
	return result;
}

When printed, that looks like ANDRÉ, but we don’t care, as long as the FTP understands it when we are asking it to rename the file. I also made a tiny function to replace anything we don’t like in a file name with an underscore, to have valid file name to rename to.

In place, that now looks like:

FtpWebRequest renameRequest = WebRequest.Create(New Uri(Request.RequestUri.AbsoluteUri + "/" + usableFileName(Candidate)));
renameRequest.RenameTo = cleanupFileName(Candidate) + ".busy";

And try again.
Hey, now it works! The log now looks like this:

System.Net Verbose: 0 : [4052] WebRequest::Create(ftp://myftpserver/in/ANDRÉ.XML)
System.Net Information: 0 : [4052] FtpWebRequest#49762782::.ctor(ftp://myftpserver/in/ANDRÉ.XML)
System.Net Verbose: 0 : [4052] Exiting WebRequest::Create()     -> FtpWebRequest#49762782
System.Net Verbose: 0 : [4052] FtpWebRequest#49762782::GetResponse()
System.Net Information: 0 : [4052] FtpWebRequest#49762782::GetResponse(Method=RENAME.)
...
System.Net Information: 0 : [4052] Associating FtpWebRequest#49762782 with FtpControlStream#3314626
System.Net Information: 0 : [4052] FtpControlStream#3314626 - Sending command [RNFR in/ANDRÉ.XML]
System.Net Information: 0 : [4052] FtpControlStream#3314626 - Received response [350 Ready for RNTO.]
System.Net Information: 0 : [4052] FtpControlStream#3314626 - Sending command [RNTO in/ANDR__.XML.busy]
System.Net Information: 0 : [4052] FtpControlStream#3314626 - Received response [250 Rename successful.]
System.Net Information: 0 : [4052] FtpWebRequest#49762782::(Releasing FTP connection#3314626.)
System.Net Verbose: 0 : [4052] Exiting FtpWebRequest#49762782::GetResponse() 

But, if you want, instead of renaming it first like i did, you can also download it directly. It will even display a correct filename on your local windows (ntfs?) file system.

The bottom line: if your FTP doesn’t understand UTF-8, just spell it out for him.


Reacties

Geef een reactie

Je e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *