My latest project: web lint

October 15th, 2009, 11:30 pm PDT by Greg

I have alluded to this in a status update, but I think it’s time to look more widely for feedback…

A while ago, I started thinking about all of the annoying things my CMPT 165 students do in their HTML, and then started thinking about ways to get them to stop. I started working on an automated checker to give them as much personalized feedback as possible without me actually having to talk to them.

They already use an HTML validator which checks documents against the HTML/XHTML syntax, but it’s amazing what kind of things actually pass the validator. In the list: resizing images with width/height on <img />; saving their source as UTF-16 (no idea how they do it); putting spaces in their URLs; using class names like “red” instead of “important”; not specifying the natural language/character encoding of the document; etc.

As the list became longer, the thing became sort of a general HTML lint: the thing you go to after your code is valid to check for other common problems, annoyances, and omissions. The more I look at it, the more I think it’s a useful tool for CMPT 165 students as well as a good way to make others think a little more about the code they are producing.

I’m now at the point of wanting some feedback. There are still some missing strings and help text, but hopefully you get the idea. I don’t want to guarantee that this link will exist forever, but have a look at my web lint.

As with any “lint”, the goal here probably isn’t for authors to get zero warnings, but just to think about why they are ignoring the warnings that remain. (No, I don’t need you to tell me that some of my pages produce some warnings.)

At this point, I’m most interested in:

  • Links to input that causes an exception (500 Internal Server Error) or other truly broken behaviour.
  • Feedback on the warnings presented and their “level”. I have deliberately hidden levels 4 and 5 in the default display: I’m aware that the tool is pretty anal-retentive.
  • Are there things you can thing of (that could be automatically-checkable) that should get a warning but don’t? I have a few more on my list, but the core is in there.
  • I don’t think the URL validation (for <a>, <link>, <img>) is perfect: I still need to go back to the RFC and check the details. Any cases you notice that don’t pass but should would be appreciated.
  • Any spelling/grammar errors?
  • I’m trying not to duplicate functionality of the HTML validators: they already do their job well. But, notice the links to “other checkers” on the right. Didn’t know about all of them, did you? Any others I should include?

My intention is to GPL the code and CC license the text, but let’s take one step at a time.

5 Responses to “My latest project: web lint”

  1. Godfrey Says:

    This would be a pretty interesting assignment topic for 165… just saying 😀

    Not sure if it’s already there, but here is my list:
    – b/i/u (…and blink?!) tags
    – strong tag followed by a br, or a strong in its own p or div tag (should be using headers?)
    – div tag warping just text (should be using paragraphs?)

    Maybe I’ll have the time to actually test it tomorrow.

  2. Hora Says:

    This is pretty damn cool, although it’s kinda annoying when you have the same warning a million times. Maybe you should group the warning by warning type? Something like ‘Warning X: 15 times’, ‘Warning Y: 4 times’, and show you where?

    This also got me to think about a few things I didn’t before..

  3. Allen Pike Says:

    If you run it on, it complains that I have a with an invalid link. The problem is, it’s an inline style (@import, to be exact) so there is no link to speak of.

  4. Allen Pike Says:

    Of course it strips the HTML in the previous comment. It’s complaining about a style tag.

  5. Greg Says:

    The tool checks style tags to see if they are just @import. Since I was already doing that, it was easy to treat the imported URL as if it was a link and validate the details.

    But, my regex missed the optional whitespace inside the parens (which you have). Fixed in my development version.