ZJU Technology

June 12th, 2013, 10:59 pm PST by Greg

I have been taking my own laptop to lecture. There are computers in every room, but I haven’t had any use for them. I decided to have a look at one the other day. I took this screenshot (click through for the full-size version):

screenshot

What you see there is:

  • Windows XP SP2, as released in 2004.
  • Internet Explorer 6, the most hated browser in the world, which even Microsoft thinks should be long dead.
  • Old school Active Desktop, long-since retired.
  • Acrobat Reader 8, from 2006 and three versions out of date.
  • MS Office 2003, three versions out of ate.
  • Adobe CS3 from 2007, three versions out of date.
  • R 2.15 from 2012.

To get the screenshot off, I inserted a clean USB key. When I got the key back to my computer, a clamav virus scan had this to say:

----------- SCAN SUMMARY -----------
Known viruses: 2381770
Engine version: 0.97.8
Scanned directories: 1
Scanned files: 14
Infected files: 11
Data scanned: 0.95 MB
Data read: 6.05 MB (ratio 0.16:1)
Time: 5.086 sec (0 m 5 s)

It looks like there are four distinct viral infections on the key.

My best guess is that they imaged the machines when they installed them, possibly when the building was built. Since then, everyone has logged in as an administrator (one installing his own version of R, which explains the one piece of new software), and whatever has happened has happened. Even though there is a learning technology office 20 metres from the classroom, they can’t seem to be bothered with the computers.

So why does IE6 have such a huge market share in China? There is a Chinese expression “随遇而安” which Google translates as “go with the flow”. I blame too much 随遇而安.

The Chinese Web Market

April 30th, 2013, 9:19 am PST by Greg

Since I have been in China, I have been thinking a lot about the web in China (and of course, living with a Chinese Internet connection). I know my share of web entrepreneurs, so one of the things that has been sitting in the back of my head is the question “how can foreign web companies expand into the Chinese market?” I know some excellent people thinking about doing just that.

After a few months, my honest advice to any web company thinking about China is: Don’t. It’s not worth the risk.

The unavoidable danger is being blocked by the firewall and completely losing any investment in China. The other is being cloned by a Chinese developer: shanzhai or copy-to-China. I propose to convince you here that these aren’t two independent risks, but are highly correlated.

Let’s look at the history of some prominent shanzhai sites and their foreign inspiration:

The pattern is clear here: a foreign company does something innovative, a Chinese company clones them and grows to be a viable competitor in the Chinese market. The foreign site becomes immoral and is blocked.

In each of these cases, the Chinese market moved quickly to the shanzhai site. I hear a little grumbling about the degraded Google service, and see the occasional Chinese kid on Facebook, but mostly the users moved smoothly to the Chinese-owned clone.

The reasons given for Internet censorship in China are generally prohibition of illegal material and promotion of national unity. There seems to be a clear side benefit: to neutralize foreign competitors when they become inconvenient to a local company.

I don’t think there is any quid pro quo there. I don’t think the Baidu founders went to the government and asked if they wouldn’t mind eliminating his competition, but the outcome seems identical to if they did.

So my advice on China is that there is too much danger of your entire investment being lost to the throw of an administrative switch on the firewall. Too much danger of a local clone. Too much danger of them both happening simultaneously.

Edit 03-2014: There has been some quid pro quo, at least on the scale of posts.

Network Sanitization

April 23rd, 2013, 11:48 pm PST by Greg

I have been spending a fair amount of time working in coffee shops in Hangzhou. The culture seems to be that buying a coffee also buys me several hours sitting in a table doing whatever I damned well please. It’s a nice change of scenery from my apartment. They usually have wi-fi, but it would be pragmatic to assume that whatever traffic goes over that connection is beamed directly to a billboard outside. I generally feel the same way about hotel internet, free airport wi-fi, and other dodgy connections: I just don’t trust that they have any interest in protecting my privacy.

I really want to encrypt all of my traffic over those links. I always encrypted my mail client connections anyway, and SSH is inherently encrypted. That really leaves my browser as the weak link in my average-day networking.

After considering some options, I ended up with just about the simplest solution, although it does take touch of technical know-how to get going. The basic idea is that SSH can provide an encrypted SOCKS server. Using it basically involves setting my browser to use the SOCKS tunnel for everything, and starting up the SOCKS tunnel with a command like this:

ssh -C -D 1080 userid@someserver.example.com

It’s also possible to do this on Windows with PuTTY and on a Mac from the Terminal.

In theory, this can speed up a slow connection a little. It removes the TCP handshake from their network, and the compression (-C) might help for the right kind of traffic.

Of course, you need a server to SSH to. If I’m working, I use a computer in the department at SFU. I figure that’s kosher. Another option is Amazon: a Amazon Web Services free tier should stay free if you use a micro instance and keep the bandwidth under control. As I recall, I just used their most generic looking Ubuntu image and changed just about nothing.

You privacy is, of course, only as good as your endpoint. Sooner or later, your unencrypted web traffic has to get out there into the big-bad internet. It’s not that I particularly trust Amazon, but I don’t trust any other provider much more.

I have also experimented with sshuttle. It pushes your entire network interface over the SSH connection. That’s technically better, but the SOCKS tunnel usually passes the “good enough” bar for me.

Edit: …and Proxy Selector to flip the SOCKS proxy on when I need it.

ZJU Internet

February 26th, 2013, 6:21 am PST by Greg

After more than a week, I have Internet in my apartment. Until now, I was subsisting on my phone’s data plan.

Here is how I seem to have to connect to the Internet in ZJU residences (and I think elsewhere on campus):

  1. My netbook doesn’t have a built-in Ethernet port, but came with a USB ethernet dongle. Plug it in.
  2. Plug an ethernet cable into the wall jack (not the one on the side of the room close to my computer, the other one).
  3. Manually set an IP address that was given to me by someone in a residence office and is linked to my hardware (MAC) address. It took a week to get this, because everyone was on vacation. It took overnight to activate it once it was assigned to me.

    At this point, I can access an utterly nonsensical collection of sites that are perhaps whilelisted somehow. These include SFU’s web server, google.com.hk, and this site, but not SFU Connect or Renren. These connections seem to be HTTP only, so probably no easy way to tunnel through.

  4. Connect to a campus VPN server over IPSEC.
  5. Log in to the IPSEC layer with a campus VPN username/password. I don’t have one of these, and nobody seems to know how to get me one. Luckily, someone was kind enough to lend me theirs.
  6. The VPN server with tunnels over L2TP. This provides Internet access that is as complete as one might reasonably hope in the current locale.

Even given the national demand to keep track of who accesses what, there is at least one layer too many in there. There’s some crazy design-by-committee going on to think of all that. Can anyone spot the weak point?

Those last three steps are supported in Windows only, and early attempts to get the VPN working in Linux have failed. There is also a campus proxy server that can be accessed without the VPN, but it seems to use some entirely different account and I can’t log into it.

Thus my proposed further steps may be:

  1. Open a virtual machine running Linux in Windows. Let the VM’s network magic bridge the Windows network into the VM.
  2. Probably use sshuttle or similar to secure the whole stack back to a host I trust. There are too many moving parts and possible points of privacy loss in there.
  3. Internet.

So that’s about 8 steps between my computer and some Internet. Any bets on the fraction of the time all of those actually work?

Idea 1: The Useless Traveler

January 29th, 2013, 8:25 pm PST by Greg

[While traveling, I have thought of a couple of things that need to exist. This is #1.]

I’m sure I’m not the only one who has this ideal of what would happen when I get to a new country: I would know a few words of the language, and be able say at least “yes”, “no”, “good”, “bad”, “the cheque please”, etc.

So, I end up buying a traveler’s phrasebook, and I hate them sooooo much. The southeast asian phrasebook beside me has, on page one “yes”, “no”, “please”, “thank you”, “goodbye”. Fine, but worth poor phonetic transcriptions. On page two it has “I am a businessman/businesswoman/doctor/journalist/manual worker/administrator/scientist/student/teacher.”

If I didn’t know “hello” a page ago, how is that useful? Even if I had to convey my profession, I wouldn’t say that: I’d point at myself and say “teacher” and everyone on the planet would understand. The extra grammar is just there to give me something to screw up, and the nine-way alternation makes it impossible to actually use the translation.

A few pages later, “Do you accept travellers cheques/credit cards?” Once again, nobody in the world needs that translation: hold up your credit card and see if they take it. Also, the translations use “krub/ka” without explaining that which you use depends on the gender of the speaker.

What I want is like a spreadsheet with columns like “English”, “Thai (written)”, “Phonetic”. I want to be able to select words/phrases to populate the page, and then print it so I can either study it our point to it, as the situation dictates. I might reasonably learn the words for “hello” and “thank you”, but I want “Can you please help me order?” written so I can just point at it and hope the waitress has a sense of humour.

Thus I propose a web site with:

  • Crowdsourced translations of words/phrases that users want.
  • A nice interface to select the columns for each user’s needs. For example, in Chinese I’d want a column “Pinyin” since “xièxie” is useful to me, but many english speakers would want a “Rough Phonetics” column with “shay shay”.
  • A similarly-nice interface to build a collection of phrases that you’re interested in.
  • The ability to export that as a PDF for printing.
  • A non-free phone app where you can export the table of translations for use electronically.

As far as I know, this is an unfilled niche. I do know that I probably don’t have the time to do it. Somebody make it for me, okay?

Need a CMPT 470 instructor

August 25th, 2011, 3:20 pm PST by Greg

As many of you know, I’m the most frequent instructor for CMPT 470, Web-based information systems, at SFU. I love teaching the course, but it’s just not possible to do it every semester. In particular, it’s not possible for me to teach it in the spring (Jan-Apr 2012).

So, it will likely be posted as a sessional (contract) instructor position. We have had sessionals do the course in the past, but I reckon I can make things a little more interesting: there’s a good web development community in Vancouver, and there a lot of people who would do a good job with the course. There is also an increasingly-large group of CMPT 470 alumni who have been out there in the world for a few years getting some experience: some of them would be good at this too.

I just have to find somebody and get them to actually apply. So, I’m putting the call out: anybody interested or know anybody who’d be good?

The course (as I approach it) is a survey of web development topics: markup and style, HTTP, server-side programming, client-side programming, architecture/speed/backend stuff, and whatever else I feel like talking about that semester. The big piece of work for the students is a group-based project which makes up a big chunk of their final mark.

Officially the appointment requires a masters degree, but a case can be made for somebody with industrial (and even better, teaching) experience. The course is scheduled in the evenings (Mondays 5:30-8:30) on the Burnaby campus, so shouldn’t interfere too directly with a day job. Pay is around $8500 plus benefits.

Of course, anybody teaching the course is welcome to my lecture notes, assignments, web materials, and anything else I have that would be useful. There’s no official posting yet, but I figure it’s a good time for people to start thinking about it. I’m happy to talk to anybody about the course.

Edit: I should point out that I’m not the one making the hiring decisions. I’m just an interested third party.

Version Control conundrum

June 4th, 2011, 12:27 am PST by Greg

As most of you know, the School’s new course management system is my baby. It keeps track of many things, but what I care about right now is (1) who is in a course, and (2) what groups have been formed for assignments/projects/whatever.

Given those things, I have had this idea: It would make perfectly good sense for each of those things (every student in a course; every group in a course) to have a version control repository automatically created for them. The instructor and TAs would also have access, but wouldn’t have to set anything up. Students could use the repositories even in courses where the instructor doesn’t know what technology is.

I have used Subversion repositories for the project groups in CMPT 470 for years. The benefits from my point of view:

  1. Groups can collaborate in that way that version control systems allow.
  2. Students can work on code (even individually) in multiple locations and with versions kept.
  3. All of their code is safely backed-up on a server that we kind of trust.
  4. I can review what members of the group contributed what code.
  5. It’s a nice and easy way to submit code: just give me the SVN URL.

When contemplating technologies to implement my scheme, I went first to GIT (or possibly some other distributed version control system, since they’re all the rage). GIT also has a pile of nice management tools like gitolite that make creating thousands of repositories surprisingly easy.

But while experimenting, I realized that GIT inherently trusted the user-provided information about who they are. If I claim to be “Barack Obama <president @whitehouse.gov>” in my commits, then GIT lets me push those commits just fine, no matter who I have authenticated as at the central server. So, I pretty much lose benefit (4) in the worst cases (which are the cases I’m usually concerned with), which is pretty much a deal-breaker by itself.

The “distributed” nature of any DVCS gets me this problem one way or another—anybody could push the whole group’s work since they could be working for weeks without touching the central server. And having made that realization, I have to admit that (3) also disappears: they don’t have to push to the server very often, so a crash on their end could lose a lot of work.

Finally, knowing students the way I do, (5) is gone too. I’d give a lot to not have this conversation five times a semester: “I got a zero.” “You didn’t submit any code.” “Yes, I committed it.” “You committed it, but did you push it to the server?” “Yes, I pushed it.” “You typed the command ‘git push’?” “No, I use ‘git commit’. That puts the code on the server.” “No it doesn’t. You didn’t put any code on the server where I can get it.” “Yes I did… I committed it.”

Also, it’s my understanding that it’s not possible to give a URL to a subtree of a GIT repository: the only URL is to the project itself. That makes submitting with GIT much harder.

So, I’m left with this: distributed version control is at least as good for developers, but it’s very bad for instructors.

According to Wikipedia’s comparison of revision control software, the only open source, “actively-developed”, “client-server” VCS is Subversion. So it looks like I’m back to the totally-uncool and old-fashioned SVN?

Does anybody want to refute any of that?

What programming language should I learn?

May 20th, 2011, 11:56 am PST by Greg

I recently had a former CMPT 165 student email me and ask essentially if Python was the best language to learn [first] from a practical/employment standpoint. This was my response, that I think was good and would like to expand on here:

Certainly the traditional view of the world is “C/C++/Java for big projects or where speed matters; higher-level languages like Python/Perl/VB for smaller projects or automation.” Certainly many of my colleagues continue to see the world in this way.

The programming language world has changed in some subtle ways in the last few years and I don’t think that attitude is really valid anymore. If I was starting a big project today (like writing a word processor or something), I would probably start with Python (or something similar): it’s easier to write and get things done and it’s possible to bridge to code in Java or C if you need to.

If I had to honestly summarize the world today, I’d say “C++/Java/C# for big companies who want to make a ‘safe’ choice of programming language; Python/Ruby/JavaScript/Scala/etc on smaller projects where the developers make the choice and want to get things done and enjoy their lives.”

A few footnotes on that: (1) the result is there are probably more Java/C# jobs in the world than other languages; (2) the Python/Ruby/Javascript jobs tend to be in smaller companies and are probably more fun; (3) after you learn to program, learning a new language isn’t nearly as big a deal as learning your first–most of the concepts are always the same.

By “the programming language world has changed in some subtle ways”, I mean mostly:

  1. Languages we always though of as “slow” have been made shockingly fast by just-in-time compilers like V8 and PyPy.
  2. Mixing languages in a project (e.g. calling C from Python, or using one language’s standard library from another) seems, to me at least, to be an easier and more mainstream thing to do if you need to.
  3. Frameworks/libraries are used much more heavily. If you spend 90% of your time calling some GUI library, the speed of your code doesn’t matter much: the speed of the GUI library is what matters. (And, who’s to say the library is written in the same language you’re writing? See 2.)
  4. C isn’t the “fast” language anymore. That’s probably more controversial, but basically, C is really good at single-threaded performance, but multithreading and heterogeneous processor environments are a real pain. Today’s reality is that new processors aren’t improving single-treaded speed by very much. Those who want their computation to happen really, really fast seem to be increasingly reaching for computation-specialized tools like Go, OpenCL, or Hadoop. It turns out that the explicitness of C starts to become a burden if you have to smack the mutexes around by hand.

My assertion that “C++/Java/C# for big companies who want to make a ‘safe’ choice” is really just a gut feeling. I understand and even agree with the desire for static typing in a huge project, but I honestly don’t think that’s why companies choose Java or C#. Companies choose these language because they are enterprisey: they are the kind of language that checks all of the CIO’s boxes and have comforting professional certifications that the HR department can look for.

Also, big companies probably think Oracle’s ownership of Java is a good thing. They haven’t reached the conclusion that I (and I suspect many others have): Oracle will slowly strangle the life out of Java until it truly becomes the new Cobol.

So where does that leave us?

You might as well look for a language that’s (1) fun to write, and (2) easy to actually get shit done with. For me that’s Python, but I can certainly accept Ruby, Lua, Scheme, and friends. I could accept PHP and VB (if I had enough drinks in me) or even C# and Java (if you had an explanation grounded in the language design/features and didn’t contain the words “enterprise” or “corporate”).

Wackiest spam ever

May 11th, 2011, 11:19 pm PST by Greg

So I got this random email reporting a broken link on my CMPT 470 web site. It’s a little unusual to get an email like that from someone who was apparently not a student, but not totally crazy.

From: angela.hill88@gmail.com
Subject: Found a broken link on your page

Hey Greg,

I found a broken link on
[web page on my course site] and since I
was researching computer science and needed the page, I found an updated
article online. 

The broken link to "Howto for Python" is
[the once-working link] and I found an article on the front
page of [some not-totally-related web site] if you wanted to fix it.
Click the tab "Beginner Python Tutorials" to get to the article.

Figure I'd send you an email because others may need that link for the same
reasons!

Thanks!

Angela Hill
angela.hill88@gmail.com

I googled the sender, and found this essentially-identical broken link report with the same “correct” URL. There are a few other examples a Google away. It’s freakin’ link spam!

If anybody really wants to find the target page (that I’m not going to link to prevent bumping their pagerank for any reason), it’s “onlinecomputersciencedegree” with a “www” and a “com”. The site itself is entirely content-free: all external links to other pages.

Somebody’s plan must be:

  1. Crawl tech link pages.
  2. Link-checking all of their links. (The examples I have are actually broken links.)
  3. Finding the creator’s email and first name on the page. (or accessible somewhere else nearby?)
  4. Emailing that address with the spam link come-on.
  5. Hoping they blindly link to your site without noticing that it’s entirely worthless.
  6. Profit?

How is this possibly a thing? Am I missing something?

P ≠ NP

August 7th, 2010, 8:21 pm PST by Greg

An email I was recently forwarded (a couple of steps removed) from Vinay Deolalikar from HP Labs:

Dear Fellow Researchers,

I am pleased to announce a proof that P is not equal to NP, which is attached in 10pt and 12pt fonts.

The proof required the piecing together of principles from multiple areas within mathematics. The major effort in constructing this proof was uncovering a chain of conceptual links between various fields and viewing them through a common lens. Second to this were the technical hurdles faced at each stage in the proof.

This work builds upon fundamental contributions many esteemed researchers have made to their fields. In the presentation of this paper, it was my intention to provide the reader with an understanding of the global framework for this proof. Technical and computational details within chapters were minimized as much as possible.

This work was pursued independently of my duties as a HP Labs researcher, and without the knowledge of others. I made several unsuccessful attempts these past two years trying other combinations of ideas before I began this work.

Comments and suggestions for improvements to the paper are highly welcomed.

The paper is about 100 pages, and looks serious (but being a decade away from last thinking about complexity, I am unable to give any more useful evaluation than that). I’ll refrain from posting the paper itself.

Deciding P ≠ NP is a Millennium Prize Problem and I don’t think I’d get much argument to say it is the biggest open problem in computing science.

Update: I see someone else Deolalikar has uploaded the paper. I should point out that in the email thread I got, Stephen Cook said “This appears to be a relatively serious claim to have solved P vs NP.”

Update: Huh, slashdotted. I think “broke” the story is a little strong, but anyway… any media wanting comment on this story, I’d suggest my colleagues David Mitchell (whose work was cited by Deolalikar in this paper), Valentine Kabanets, or Pavol Hell (who also do research in this area).

Update 08/09: Richard Lipton is posting excellent commentary in his blog.

« Previous Entries