Sat, Mar 22, 2014

Using NodeJS and FTP with Promises

I’ve played with node in the past but as of the new year I decided to try and make a more concerted effort to get stuck into node properly. I decided to go back to the beginning to try and get a better appreciation for the language so read “JavaScript: The Good Parts by Douglas Crockford”. I found that exercise fulfilling and resulted in a few light bulb moments that made some dots join up so I’d recommend reading it if you haven’t already.

Real World App

As I stated earlier I have already played with node in the past using Express and have read quite a bit on node and read many examples but I wanted to write a non-web app as I felt this would give me a better opportunity to get to grips with the language and Node. Using Express allows you to get up and running very quickly without to much head scratching so I felt a standalone script would give me more exposure to things.

During the previous couple of weeks at work I wrote a console app that downloaded zip file from a FTP server, extract the contents, read data in a XML file that was in the zip, do some string matching and upload the zip to another FTP server. I figured this would be a good app to replicate in node so off I went.

After a bit of npm research I found the modules I needed and managed to get to the point of downloading files pretty easily with the below code:

var path = require('path');
var fs = require('fs');
var Promise = require('bluebird');
var Client = require('ftp');

var c = new Client();

var connectionProperties = {
    host: "myhost",
    user: "myuser",
    password: "mypwd"
};

c.on('ready', function () {
    console.log('ready');
    c.list(function (err, list) {
        if (err) throw err;
        list.forEach(function (element, index, array) {
            //Ignore directories
            if (element.type === 'd') {
                console.log('ignoring directory ' + element.name);
                return;
            }
            //Ignore non zips
            if (path.extname(element.name) !== '.zip') {
                console.log('ignoring file ' + element.name);
                return;
            }
            //Download files
            c.get(element.name, function (err, stream) {
                if (err) throw err;
                stream.once('close', function () {
                    c.end();
                });
                stream.pipe(fs.createWriteStream(element.name));
            });
        });
    });
});

c.connect(connectionProperties);

However, I originally had that code in a function and wanted to call it and then call another function to read the files that I had downloaded but what I found was callback hell.

Enter Promises

I needed to know that all the files had downloaded and then I could read the files in a directory ready for zip extraction but I couldn’t work out how. I discovered promises and probably didn’t read enough about all the ins and outs of them but I remember Glenn Block giving a talk about async programming in node so I pestered him on Twitter and he kindly helped and me out and also pointed me towards his code and slides where I decided to use Bluebird, the promise library. Unfortunately I just couldn’t get the files downloaded. It would download one file but not the other and closed the streams.

Here is a snippet of what I had (brace yourself)

var processListing = function (directoryItems) {
    var itemsToDownload = [];
    directoryItems.forEach(function (element, index, array) {
        //Ignore directories
        if (element.type === 'd') {
            console.log('directory ' + element.name);
            return;
        }
        //Ignore non zips
        if (path.extname(element.name) !== '.zip') {
            console.log('ignoring ' + element.name);
            return;
        }
        //Download zip
        itemsToDownload.push({
            source: element.name,
            destination: element.name
        });
    });
    return itemsToDownload;
};

var processItem = function (object) {
    return aFtpClient.getAsync(object.source);
};

var downloadFiles = function () {
    console.log('downloading files');
    aFtpClient.
    listAsync().
    then(processListing).
    map(function (object) {
        return processItem(object).then(function (processResult) {
            return {
                input: object,
                result: processResult
            };
        });
    }).
    map(function (downloadItem) {
        downloadItem.result.pipe(fs.createWriteStream(process.cwd() + "/zips/" + downloadItem.input.destination));
        return new Promise(function (resolve, reject) {
            downloadItem.result.once("close", function () {
                console.log('closed');
                resolve();
            });
        });
    }).done()
};

Not only is that a tad complicated but I could not for the life of me understand what the hell was happening and why it wasn’t downloading all the files. I reached out to @PrabirShrestha who agreed it was a tad over complicated and tried to help but recommended I take a look at Reactive Extensions, maybe I will in the future but at this point my frustration had kicked in and I wanted to give up. I went through a mixture of emotions from frustration, which led to anger, fuming anger, denial, then apathy. Although these emotions went by and after a couple of questions on stackoverflow that helped but didn’t give the solution I explained the issue to a colleague and we both took a look. I went through a few iterations with no luck and after a bit more reading I think we were closing in on it individually but I was beaten to it. All hail @iamnerdfury who produced this:

var connect = function() {
    c.connect(connectionProperties);
    return c.onAsync('ready');
};

var getList = function() {
    return c.listAsync();
};

var zipFiles = function(element) {
    return element.type !== 'd' && path.extname(element.name) === '.zip';
};

var current = Promise.resolve();

var downloadFiles = function(file) {
    current = current.then(function() {
        return c.getAsync(file.name)
    }).then(function(stream) {
        stream.pipe(fs.createWriteStream(file.name));
        console.log(file.name + ' downloaded..');
    });
    return current;
};

connect().then(getList).filter(zipFiles).map(downloadFiles).done();

I think the previous issues I had was I was returning resolve() after the first file downloaded which is not what you want to do when multiple calls to it are executed as a promise can only resolve once. I needed to find some way of concatenating a promise somehow for each file that is downloaded. I looked at the all() command but I couldn’t get it to fit but @iamnerdfury found that you could do this via creating an instance of a promise by calling resolve and then assign to it on each file that needed to be downloaded.

Now I know the files are downloaded I can chain more functions to read the file system, extract the zip for each one, read the XML and upload to a new server.

I hope this helps someone else because it wound me up something chronic and whilst I get pissed off with JavaScript when things like this happen I will keep at it because I think node is now becoming a serious contender and us developers need to keep a finger in many pies.

(If you think there is a way to improve the solution above I’d love to hear it)