WillemSleegers.com

Streets ahead.

Website update

13 April 2016

A few website updates:

Next up: My thoughts on the new iPhone 5 SE.


R Markdown

29 March 2016

I discovered R Markdown! This amazing feature of R allows you to write documents in Markdown* and add R code snippets in the text so you can code and write at the same time. It’s not yet nearly as user-friendly as I would like to it to be, but it did allow me to do one thing that I needed to do for a long time: update my website.

This website is now made possible by R Markdown. The index page is created in R, the new Archive page (yay!) is created in R, and each post has been converted to Markdown.

I think I will devote some future posts on how to make a website using R (some information can be found here. It wasn’t an easy process, although it can be a lot easier if you simply want a static (i.e., not a blog) website. For now, I’ve succeeded, and I hope this motivates me to write more posts!

*Fun fact: Markdown was created by John Gruber, a huge Apple fan.


Eye tracking part 1 - Reading in data

November 11, 2015

As mentioned in my first post, some of the posts on this website will be about how to work with eye tracker data. This post is the first in a series of posts on how to go from raw, untouched, eye tracker data to a prepared data set; ready for data visualization or statistical analysis.

The first step in this process is to somehow read in your data into a statistical computing and visualization environment. My preferred program is RStudio. Rstudio is a powerful IDE for the programming language R. It’s free, flexible, and increasing in popularity, which means there is a lot of support available when you need to figure something out. The only relevant downside is, in my opinion, that R has a steep learning curve. If you do not yet know R, it will be quite challenging at the beginning to use it; especially if you’re used to working in a different statistics environment. This post will be relatively difficult if you are new to R. Perhaps I will write some tutorial posts in the future in order to make this less of a problem. For now, I hope you can bear with me!

Let’s begin by looking what our data files looks like.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")
list.files() # Prints all the names of the files in the working directory to the console
##  [1] "file 1.gazedata"  "file 10.gazedata" "file 2.gazedata" 
##  [4] "file 3.gazedata"  "file 4.gazedata"  "file 5.gazedata" 
##  [7] "file 6.gazedata"  "file 7.gazedata"  "file 8.gazedata" 
## [10] "file 9.gazedata"

We see a total of 10 files with ‘.gazedata’ as the file extension. Usually we have more files than that, but this should be enough for illustrative purposes.

The eye tracker that we use in our lab is a Tobii T60 eye tracker. When you collect data with this device, the likely outcome is that you will have multiple files that contain the eye tracker data that you are interested in. With R it is easy enough to load in multiple files, but the challenge resides in the fact that the size of your data quickly adds up. The Tobii T60 records at a rate of 60Hz. This means you get 60 rows of data every second, or 1 row every 16.67ms. It will not be out of the ordinary to end up with a data set that contains several million rows of data. Because we will be dealing with large amounts of data, we want to know what the most efficient way is to read in this data so that we do not waste time whenever we need to do so.

There are multiple functions in R that you can use to read in data. Let’s take a look at these functions and see which one is the fastest.

Method 1: read.table()

Let’s start with the standard function to read data into R: using the read.table() function. We start by defining a data frame to store the data in, then we use R’s system.time() function to see how long it takes to read in all the data. We read in the data by using a for loop that loops over all the files in the working directory, reads them into a temporary data frame, which is then added to our overall data frame, called ‘data’.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")

data <- NULL

system.time(
    for (file in list.files()) {
        temp <- read.table(file, header = TRUE, sep = "\t")
        data <- rbind(data, temp)
    }
) -> method1_time

method1_time
##    user  system elapsed 
##  36.884   2.809  41.415

It took 41.415 seconds to load in all the data files using read.table().

Method 2: import()

There is a very nifty package in R called rio. It is a Swiss Army knife package for reading in data. It supports many different file formats, thereby making it an interesting method to check out. To read in data, you use the import() function. Usually you only need to give the function the location of the data, but in this case we need to specify the file format because it does not understand .gazedata as an extension. To remedy this, we specify the format by saying format = "tsv". “tsv” stands for tab separated values, which is exactly how values are separated in our .gazedata files.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")

library(rio)

data <- NULL

system.time(
    for (file in list.files()) {
        temp <- import(file, format = "tsv")
        data <- rbind(data, temp)
    }
) -> method2_time

method2_time
##    user  system elapsed 
##   5.925   1.533   7.877

It took 7.877 seconds to load in all the data files using rio’s import().

Method 3: fread()

Using rio’s import() we saw a huge decrease in the time needed to load in the data, but perhaps we can do better. Another useful package to know about is a package called data.table. This package has a function, fread(), that is a very fast function for reading in data. Let’s check it out.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")

library(data.table)

data <- NULL

system.time(
    for (file in list.files()) {
        temp <- fread(file)
        data <- rbind(data, temp)
    }
) -> method3_time

method3_time
##    user  system elapsed 
##   2.333   0.618   3.207

It took 3.207 seconds to load in all the data files using data.table’s fread(). Curiously, the import() function from the rio package is supposed to use fread() when reading in data files such as the one we have, but the data shows that using fread() directly has a huge benefit over using import().

At this point we see that we should simply use fread() to read in our data. There are a few more optimizations that can be performed but I have not found these to be worth it for the data that I have. Nonetheless, let’s take a look at what these are.

Method 4: fread() with colClasses

The first optimization is to specify the classes of each variable before reading in all the data. This should speed things up, at least when each separate data file is very large, and there is the added benefit of making sure that each variable is exactly the class you want it to be.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")

library(data.table)

data <- NULL

temp <- fread(list.files()[1])
classes <- sapply(temp, class)

system.time(
    for (file in list.files()) {
        temp <- fread(file, colClasses = classes)
        data <- rbind(data, temp)
    }
) -> method4_time

method4_time
##    user  system elapsed 
##   2.377   0.502   2.972

It took 2.972 seconds to read in all the data files.

Method 5: fread() with colClasses and rbindlist()

A final optimization relates to how we bind the data of each file together. Up until now we have been using R’s rbind() function to combine the data files. This is not an optimized function. Fortunately, data.table’s rbindlist() function is.

setwd("~/Dropbox/Website/R willemsleegers.com/R/data/reading eye tracker data/")

library(data.table)

data <- NULL

temp <- fread(list.files()[1])
classes <- sapply(temp, class)

system.time(
    for (file in list.files()) {
        temp <- fread(file, colClasses = classes)
        data <- rbindlist(list(data, temp), use.names = FALSE, fill = FALSE)
    }
) -> method5_time

method5_time
##    user  system elapsed 
##   2.365   0.538   3.108

Now it took 3.108 seconds. We see that these latter optimizations do not improve the reading times radically (or even seem worse!), but with larger and larger data sets these small changes can have substantial benefits.

In conclusion, it seems that the fastest way to read in multiple large data sets is to use data.table’s fread() function, in which you specify the classes of each variable, and use rbindlist() to combine the data files.

Updated on April 12, 2016.


Progress!

November 6, 2015

The About section is up! Now you can find out a little bit about me.


Apple brings Harry Potter magic to the real world

September 24, 2015

I did not realize it before but a new feature of the iPhone 6S, Live Photos, very closely resembles some Harry Potter magic!

Harry Potter magic

Apple ‘Hey Siri’ Keynote thoughts

September 22, 2015

I am a bit late to write a post on what I think about the latest keynote from Apple, but those who know me know it must be done. The reason for this delay, to jump straight into the conclusion, is that I simply wasn’t that enthused about the latest keynote. In fact, I stopped watching somewhere in the iPhone section; but more on that later.

Apple Watch

More colors and bands. Apple simply continues their positioning of the watch as a fashion product. Personally I can’t care, but it seems like a smart strategy. Apple is in the high end market and this fits perfectly with that strategy. watchOS 2 was mentioned again, with some cool examples of real life use by doctors. I have yet to see that actually happening on a relatively large scale, but seeing Apple put effort in showing how their devices can be used in important areas of life, well, that’s just great.

iPad Pro

A new iPad that is a lot bigger, heavier than the original iPad, and with a new input device that was eskewed by Apple before this product. I am not a big fan. My biggest reason for not being a fan is that this product seems so specifically tailored to a specific group of users—designers and artists—while they haven’t yet succeeded in positioning the iPad as the laptop replacement for the masses.

The iPad is a great device. It’s an absolute joy to use, on the couch, in bed, anywhere but a desk that isn’t outside. Yet it seems that its sales are slipping a bit and instead of doubling-down on making this the device for normal people (i.e., non-nerds), they go for the professional market. To me, Apple is about creating products that remove the every-day annoyances that we put up with in order to have fun. They greatly improved MP3 players with the iPod and phones with the iPhone. Laptops and desktops have been annoying people ever since their inception, as evidenced by people continuously asking their computer-savvy friends to help with their computer problems. Hence, Apple created the iPad. It is this strategy that I like seeing them continue, but the iPad Pro seems to go against that. I hope that this is merely a momentary distraction, since they do have markets to disrupt. That brings me to the next product.

Apple TV

And their big move in the TV market is the new Apple TV. Its biggest feature is the App Store which makes it possible for third parties to develop apps for the Apple TV. According to Apple, the future of TV is apps. Personally, I am not entirely sure yet. Apps on an Apple-developed platform is a huge step up from whatever exists at the moment. Current TVs are garbage, software-wise; in fact, I have not heard of a single TV that is not despised by someone who actually cares about having a user experience. So, having apps that adhere to Apple’s human interface guidelines will be great. But it won’t solve the actual problem.

Third party apps will bring content that we want to the TV, in a less horrible way, but I am not assured that the eventual result will not still be horrible by non-relative standards. Take one of the most popular TV content companies: Netflix. Their website is atrociously slow and browsing is almost useless. How can we make sure that these companies actually make good apps? Apple can only go so far with rejecting shitty apps; they can’t reject apps for not being as great as they can be. I predict there will be dozens of apps, one for each TV service, and a great portion of them will be annoying in one way or another. And slow, they will find a way to be slow.

This also brings up another limitation. There are dozens of TV content services. That means we need to be aware of what each of them offers so that we know where to go when we want to see TV show X. Apple is apparently trying to combine multiple services into one great convenient overview, but I’m afraid this is nothing more than a shell, and that once you make your selection you will be thrown into that service’s app, and the experience goes to die.

Additionally, as someone from the Netherlands, the Netflix library here is only a small proportion of the US Netflix library. At this point in time, I would say that is just ridiculous. I’m sure this is a problem outside of their control, but I was hoping that Apple has become big enough to put some pressure on these kinds of issues.

So, I think the TV experience will be somewhat improved with the new Apple TV, but we still have a long way to go.

As a final note on the Apple TV: the App Store brings games to the Apple TV. This should be fun, but it will have zero impact on the console market. Except maybe for Nintendo, since they are still living in the early 2000s.

iPhone 6S

At this point I was getting sick of the keynote. It was running long. Eddy Cue was making mistakes. The stream was failing. I was about to give up.

I managed to see that they introduced 3D Touch to the phone and thus introduced a new level of complexity to the phone. In some ways, I think this is great. The app shortcuts when you press on the icon of an app is fantastic. However, I also think I saw a lot of delayed responses and useless functionality, such as previewing e-mails. I’m curious to see whether it will pan out.

After that, the stream died and I stopped watching.


Tumblr Release Notes

September 7, 2015

Their release notes are amazingly funny:

‘4.3? roared David Karp across the boardroom table. He spun on his heels, turning his back to the board. His shoulder muscles rippled through his gingham shirt.’

‘4.3? We can do better than that. We HAVE to do better than that.’


First post!

September 5, 2015

Welcome to my very first personal website! On this website you will, in the hopefully very near future, find a section about who I am, my resume and publications, and posts on various topics. These topics will range from random snippets on things I find on the internet to more extensive posts on, for example, how to analyze eye tracker data. For now this website does not contain much (such as who I am), but I will be adding more to it, bit by bit (pun?).

I am coding the website myself because I want to learn HTML and CSS. I use various resources for this, such as this great book by Jon Duckett and the usual Google searches, which often leads me to the awesome website that is Stack Overflow. My rule is that I need to understand every single line of code I use to create this website. As a result, I might be able to write a few posts on how I solve different issues in website design, such as having a responsive menu that takes into account the available screen real estate. Hopefully these will be useful to others who try to code their own website.

So, expect to these regular posts and an increasingly complete personal website. Next step: Create the ‘About’ section.