Main

technology Archives

October 29, 2004

MySql & XML Output

Pulled this from DECAFBAD... Nice tasty article all about -X command line switch in mysql. Wonderful I thought. XML based power in the worlds most popular and free relational database. I heard the sounds of dreams coming true.. well maybe ;) There had to be a catch however.

However, just when I thought everything was peachy I started playing with this feature with less than spectacular results. Some intensive googling yielded the answer.
It seems that using the -X command line option for exporting the data in XML format produces invalid XML. It assumes XML escaped data in the DB!.. On what grounds??? Mysql only encloses the query results in XML element tags, but doesn't do encoding of the contents inside the tags.


In XML, if you want to use one of the characters <, >, &, etc. inside an element tag is not valid. If you want to use one of those characters, you have to use the respective entity instead. Mysql doesn't seem to do that, so when selecting tagged data or markup like "<foo>red & green</foo>" with the -X command line option will always lead to invalid XML.
An uncool workaround would be to perform some string replacements for every selected column when using the -X option:


  • replace all & by &amp;

  • replace all < by &lt;

  • replace all > by &gt;

  • replace all " by &quot;

  • replace all ' by &#39;


Other stuff, like language specific characters (umlauts etc.) has to be encoded as well or has to be handled by defining or applying a different character set when post processing the XML output.

So the command produces invalid XML as invalid chars haven't been escaped... Now this is a shame as writing some code to escape it in the db server coulda been done quite easily. A combination of escapes and using different charsets (perhaps as a command line option) along the lines of mysql --xml --xmlcharset=mycharset would be sweet. We'll see what happens in the next release

October 30, 2004

Web Services standardisation (or trying to pass a herd of overkeen elephants through the eye of a needle)

In a previous life as a research manager in an Irish research group called TSSG I wrote a piece about semantic web for a technology column in a local paper. It's the usual non-critical high-level look at a technology but the excitement at the promise of semantic web is very real.


However I'm less than convinced about the current web services standardisation effort. In comments to another blog I was scathingly critical of the original WS technology (SOAP & XML-RPC) and the malaise of WS standards and specs. I've also been keenly following the wise words of Steve Vinoski, Chief Engineer of IONA technologies, another company I used to work for. Before I digress onto another topic entirely I'm going to reiterate some of my original comments about WS standardisation, mirroring steve's feelings about the lessons that can be learned from CORBA regarding tool & vendor support. So without wanting to offend too many of the great people involved in the process, here are my considered thoughts:

  • "Web Services" is a brand name for a range of disparate and relatively unfocused technologies.

  • The technology was hugely overhyped without accepted standards to back it up

  • XML messages were touted as human-readable. If you know that many humans who read large XML schemas in their spare time you need to get yourself and your friends "to a nunnery". OK, maybe not but you get the point ;-)

  • It often seems that around 20 years of distributed systems thinking was ignored in their creation. Hence SOAP was misnamed "Simple". "Incomplete" would have been more appropriate.

  • With usefulness comes complexity. With complexity comes unwieldiness and with unwieldiness comes confusion. The secret is normally appropriate abstraction but it's early days yet

  • The standardisation effort is frustrating and feels uncoordinated. All too often standards are hurriedly created to plug holes in other standards. Often if feels like the wheel is being reinvented, as if nobody in the effort knows that RPC has been done before. I hear Vinoski's cries for an overarching Architecture spec so have both a map and a flashlight

  • Almost none of this matters as the major industry players are now behind it in a bid to recapture the goldrush of the late 90s with a 'must-have'technology. For this reasons alone the tool support will hide much of the complexity and encourage utilisation. This is already happening. Thank you Microsoft, IBM, HP, BEA, IONA, SUN etc.

  • The most loosely coupled thing about WS/SOA is often the standardisation process. There could be trouble ahead



However there's hope for us all in the form of REST. It may correct several issues with webservices (including the loengthy standardisation process). WS piping is so incredibly powerful that it can't be overlooked. Also, REST provides some neat answers to security issues, automation, semantic web & may just bring about world peace given an appropriate level of vendor support


Arguably the URI is the reason the web took off in the 1st place. There were better transport and application layer protocols, more elegant markup grammars but the idea of the URI is compelling. Arguably with REST, semantic web & canonical URI's we may just be getting somewhere. I believe that these technologies will determine the success or failure of the web service initiative and everything else is pretty much window dressing.

November 1, 2004

Native XML support in ECMAScript (E4X)

Yet another interesting nugget of information pulled from Jon Udell's site. Makes you wonder how many bloggers are merely human blog aggregators of other people's blogs. Eventually there's 1 part content and O(nn) level of repetition, like P2P only worse as info is wrapped with 'opinion' by each subsequent blogger. There's a study that could be done here using a combination of the google API and bloglines. Blog information is distributed virally? Discuss...
E4X is native data type for XML in ECMAScript. More information here

P2P traffic's effect on ISP's

The Internet was designed as a content access system which a predominantly client/server, assymetrically biased towards downstream (downloads etc.) With P2P exchange of data, the creation of decentralized groups allows for information to flow over the public Internet in an anonymous logical fashion. The individual users of these applications are shielded via this anonymity. There are obvious issues with IPR here but also more subtle issues regarding the categories and topology of P2P traffic. (I'll provide a more rigorous mathematical look on this soon) via this form of information exchange, the service providers no longer have the ability to forecast network capacity based on historical subscriber usage patterns. There are four key areas where service providers are feeling the pinch:


  1. Upstream/downstream traffic is flipped where the upstream traffic is much larger then the downstream traffic. This results in network congestion on the upstream link that was never planned for with initial broadband deployments.

  2. Time of day usage statistics no longer apply. Previously, service providers could assume peak usage at certain times of the day and lower usage at other times. With P2P applications, the computers are often left to transfer data throughout the day in an unattended fashion.

  3. Previously, peering traffic always traversed the Internet to another location. In today’s world, two home users can form a direction connection.

  4. Over-subscription assumptions no longer apply. A handful of power users can “hog” all of the bandwidth deployed for a much larger usage base.


Thanks to network world for some pointers in this post.

November 3, 2004

IDC information society index

This index was established in mid 90s and provides a statistical analysis of the degree of IT access and absorbtion within 53 countries worldwide. Ireland can only manage 23rd spot, which is less than impressive considering we're a small nation with such a disproportionate amount of our Gross Domestic Product (GDP) coming from IT. (For a cold hard look at our GDP/GNP comparisons read this) Our neighbours in the UK fare better in 10th, while the tech savvy danes and swedes claim 1st and 2nd place respectively.

November 4, 2004

Browser Identities

Browser incompatibilities are definitely the bane of a web developer's life. Having spent much of my development life messing around with command lines, I'm now spending a lot of time looking ath CSS section of w3schools grabbling with CSS positioning & layout issues.


I decided that I'd solve some of these browser incompatibilites on the server side rather than with client side javascipt.. MT's natty Perl-plugin interface looked the best bet and I whipped up a few quick lines of PERL to pull the HTTP_USER_AGENT from the env and parse it. Easy-peasy I thought having read all about browser identities here (skipped the RFC)... This turned out to be no fun. I learned a lot about writing plugins which are a really great feature but when I outputted the browser ID for both IE and Opera I got guess what?
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]
Not exactly what I was expecting. A diff of the two confirmed that I wasn't going nuts. They're the same so my plugin is effectively useless for sorting out CSS layout issues between IExploder, Opera and Nutscrape... So HTTP_USER_AGENT is apparently not the thing to use.. The appName in javascript would be more reliable apparently. SO much for sorting out the problem on the server side. Ah well... de nouveau au conseil de dessin as they say in pidgin french :P

November 5, 2004

Steve Vinoski's comments on the WS* standardisation track

Following on from my earlier post about WS standardisation. Steve Vinoksi points out that traditional standardisation efforts are often too slow and overly political. In this month's IEEE Distributed Systems Online (DSO) he discusses WS-NonexistentStandards. Lots of standardisation work but where are the accepted standards and how does the process facilitate the creation and adoption of practical standards?

To get around these problems, WS-* authors appear to be taking a different approach toward standardization:



  1. Write a specification and make it publicly available.

  2. Invite interested parties to one or more private workshops where they can learn more details about the specification and provide feedback.

  3. Iterate steps 1 and 2 until chosen feedback from the workshop participants has been incorporated, and the specification is considered finished.

  4. Submit the specification to an official standards body with the hope of fast tracking it to actual standardization with minimal changes.


Overall, this approach reduces the number of participants involved, which can be a good thing because it reduces the overall volume of communication required to create the specification and resulting standard. However, it can also reduce the resulting standard’s effectiveness, even rendering it useless, because it circumvents at least some of the process of building consensus by not being a truly open process. A standard that is not generally agreed on is a standard on paper only.

This definitely seems to be part of the problem. It's in marked contrast to the IETF standardisation process which often appears much more open and perhaps democratic. However, it's a fine line to walk. I can't help but feel that 2 modifications to the process would significantly improve matters.

  1. The creation of WS-arch so we can categorically say what piece of the WS-jigsaw goes WS-where? ;-)

  2. Incentivised involvement of independent s/w developers in the standardisation process. Spec consumers rather than spec producer/pushers who can't provide neutral guidance. Maybe even some decisions could be put to general developers using a web-based voting system.


Probably/definitely need to think about this more...

Design By Contract in C

Charlie Mills creates a Design-By-Contract library for C (which could equally be used for C++ with minor changes) in his most recent OnLamp article. DBC views functions and methods as contractual agreements between the functional caller and the object/module providing the function. Charlie's implementation is a really neat idea using Object Constraint Language (OCL)to describe:


  • function preconditions

  • function postconditions

  • type and function invariants


The implementation is hacked up using Ruby and Racc and is available here.. I'm currently playing around with DBC for Java using iContracts and I'll post the inevitable success stories here soon...

November 9, 2004

Putting the brakes on spammers

Pulled this off benezedrine.cx. It a tasty, easy to replicate mechanism for dealing with spammers, safe in the knowledge that you're slowing down their grubby, stinking little operations in the process. As most spammers get paid by volume this reduces the money they make from slowing down the internet, helping to spread viruses and generally being complete assholes. The author advocates the creation of a tarpit using spamd which is basically an MTA which keeps SMTP relaying connections open but slows responses down to a C-R-A-W-L... Throw in the use of spamassasin for some dynamic spam detection together with the creation of a blacklist for tarpit redirection using information from an authoritative site like spews.org and you have a reliable system that kicks the majority of spammers where it hurts. The original text is shown below.
To quote Bill Hicks: "just trying to plant seeds"


Introduction
I don't like getting spam. The problem is not detecting it automatically, that works very well with tools like SpamAssassin and bmf . Even though I can automatically delete spam without reading it, the spammers still successfully deliver their mails and get paid by volume. I want to hurt them. They should not be able to deliver their mails, and waste as much of their resources as possible attempting to do so.

Tarpits
Tarpits like spamd are fake SMTP servers, which accept connections but don't deliver mail. Instead, they keep the connections open and reply very slowly. If the peer is patient enough to actually complete the SMTP dialogue (which will take ten minutes or more), the tarpit returns a 'temporary error' code (4xx), which indicates that the mail could not be delivered successfully and that the sender should keep the mail in his queue and retry again later. If he does, the same procedure repeats. Until, after several attempts, wasting both his queue space and socket handles for several days, he gives up. The resources I have to waste to do this are minimal.

If the sender is badly configured, an uncooperative recipient might actually delay his entire queue handling for several minutes each time he connects to the tarpit. And many spammers use badly configured open relays.

Obviously, I only want known spammers to get connected to my tarpit instead of my real MTA.

Blacklists
I can use an externally maintained list of spammers like spews.org to redirect senders to the tarpit selectively. But such lists may be either to slow to include new spamming hosts, or too aggressive for my taste. Some blacklists will not only include single hosts, but entire networks that contain a single spamming host, willingly hurting innocent customers of an ISP to pressure the ISP to terminate the spammer. The blacklist maintainers document such policies, and if I agree with them, it's my decision to block mail from such networks by using their blacklist.

But even if I'm comfortable with blocking mail from innocent bystanders and use the most aggressive blacklists combined, there will still be spammers getting mails delivered to me through newly discovered open relays. Those spam mails will of course be detected by my spam filters, so I'd like to use these IP addresses to build my own blacklist.

Building my own blacklist
Assume I have the following procmail configuration in place to detect (and file) spam:

:0fw
| /usr/local/bin/bmf -m maildir -p
:0:
* ^X-Spam-Status: Yes
in-x-spam

:0fw
| /usr/local/bin/spamc
:0:
* ^X-Spam-Status: Yes
in-x-spam

Each incoming mail is piped through the two spam detectors. If either one of them classifies the mail as spam, the message gets stored in a separate file. I could delete them instead, but I might want to check the mails for false positives every once in a while. Once the classifiers are tuned right, there will be almost no false positives, and almost all spam is detected. I'm reaching 99.95% accuracy here, with maybe 0.01% false positives, which is fine for me.

Analyzing Received: headers
I'm using one additional tool, relaydb , to build a database of all hosts that send me mail. This is done after the classification by the spam detectors, so I can tell the database whether the sender was sending spam or legitimate mail.

I add the following parts to my procmail configuration:

:0fw
| /usr/local/bin/bmf -m maildir -p
:0c
* ^X-Spam-Status: Yes
| /home/dhartmei/bin/relaydb -b
:0:
* ^X-Spam-Status: Yes
in-x-spam

:0fw
| /usr/local/bin/spamc
:0c
* ^X-Spam-Status: Yes
| /home/dhartmei/bin/relaydb -b
:0:
* ^X-Spam-Status: Yes
in-x-spam

:0c
| /home/dhartmei/bin/relaydb -w

So, detected spam gets piped through relaydb -b (blacklist), and legitimate mail through relaydb -w (whitelist). Note that only copies of mails get piped through relaydb, the program never modifies or drops a mail. All it does is build a database of hosts that sent me mail, counting spam and legitimate mail from each one.

relaydb traverses all Received: headers in a mail from top (nearest relay) to bottom. It only acts on valid numerical IP addresses in [] brackets, which is the only reliable part. And it's only reliable when I trust the previous relay in the chain, as spammers often add fake Received: headers. So relaydb starts with the top-most relay in the header and consults its database to see whether it is a known host, and if so, whether it sent me legitimate mail before. If that's the case, it increases the respective counter (spam or legitimate, as told through the -b/-w option) for that host and continues with the next relay found in the header. If the relay is a known spammer, traversal ends, as further headers cannot be trusted.

After I run this setup for a while, relaydb has built both a blacklist and a whitelist. One important detail is that a legitimate mail has more weight than than a spam mail. I regularly receive spam through mailing lists. Of course, I don't consider the mailing list server a spamming host. Yet, each spam I receive through it will increase the spam counter for that server. Therefore, relaydb only reports hosts as blacklisted when their spam counter is at least three times as high as the counter for legitimate mail (and the factor can be adjusted, of course). So a relay doesn't get blacklisted as long as it sends me legitimate mail to compensate for spam it sends, which covers mailing list servers. But if I get a spam from a host that never sent me anything before, that will cause it to get blacklisted immediately (1 >= 0*3).

Completing the puzzle
Now I'm building my own blacklist, based on the evidence I've seen myself, classified by my own spam detector configuration. The only politics involved in someone getting blacklisted are my own, I don't have to trust a third party to make fair decisions.

And I use this blacklist to redirect hosts to the tarpit, using pf and some cronjobs:


$ pfctl -sn
rdr inet proto tcp from to any port 25 -> 127.0.0.1 port 8025

$ relaydb -lb | pfctl -t spammers -T replace -f -

This requires a recent OpenBSD -current system.

Instead of just loading the relaydb blacklist to redirect to spamd, I could combine it with spews. Or I can use the whitelist to prevent hosts which have sent me legitimate mail before from getting redirected to spamd due to a spews listing, etc. There are many interesting combinations.

And how well does it work?
I'm getting several dozen connections redirected to the tarpit per hour, and most peers waste about ten minutes per connection, and retry several times, for multiple days. The impact on my own resources is minimal.

Best of all, I regularly get spam through a mailing list and the sender (not the mailing list server!) gets blacklisted. Then the same spammer connects to me directly, too, as it harvested my address like the one of the mailing list. And it gets stuck in the tarpit. For long. And many times.

Remember, I'm doing all of this not to reduce the amount of incoming spam. That gets detected and filed very reliably, anyway. The sole purpose is to hurt the spammers. And I'm thoroughly enjoying watching my spamd log now, as I'm perfectly sure that each of those connections comes from a spammer who has spammed me before.

"Spam me once, shame on you. Spam me twice, shame on me." :)

If you have questions or comments, write to daniel@benzedrine.cx . And all you spammers harvesting email addresses from pages like this, please spam me. My trap is awaiting you.


Thanks to benzedrine and fif3. Also thannks to my mate Kieran for pointing me towards the original link. Cheers!

November 10, 2004

VoIP battle is really heating up in the US ... but where's the FCC going with all this?

The Federal Communications Commission (FCC) have decided that individual states cannot impose additional restrictions on VoIP service providers. This follows an attempt by the Minneapolis public utilities commission to force Vonage to abide by the same rules as existing telephony service providers. The FCC overruled deeming that this stance was "inconsistent with the FCC's deregulatory policies". More information on the reg. This is a fascinating story as the implications of this ruling are unclear. The FCC's policies to-date regarding VoIP are supportive but not coherent. It's very much a wait-and-see approach rather than a strategy promoting adoption of VoIP while reasonably compensating existing operators for the user of their network. This kind of sustainable policy is required to ensure that VoIP services are deployed in a safe and responsible manner with the reliability and security that users expect.

How to distribute an atomic bomb!

This post is actually about U2's new record "How to Dismantle an Atomic Bomb" which is currently spreading like wildfire on certain well-known P2P networks. The problem, apart from the obvious copyright infringements, is that the record hasn't even been released yet. It's due for release the 22nd of November. However, a copy of the album dissappeared at a photo shoot and since then there's been intense speculation about whether the band would bring forward the release date. No decision has been made as yet. More info at the reg..

Which command in DOS

I've been told that I should add more of the little programming hints and tips that I used to come up with during my reearch days to this site. Well here's something I was playing around with today that's useful for many windows developers. Like many programmers I'm often more comfortable at the command line than using some funky GUI where I have to drag (or learn so many command alias key-strokes that I may aswell be at the console anyway).
I was stuck for a UNIX version of the which command. According to man which this command


Which takes a series of program names, and prints out the
full pathname of the program that the shell would call to
execute it. It does this by simulating the shells search-
ing of the $PATH environment variable.

Replicating this functionality using DOS batch ain't that bad...

@ECHO OFF
rem Sanity check OS version and arguments.
IF "%OS%"=="Windows_NT" (SETLOCAL) ELSE (GOTO Syntax)
IF "%~1"=="" GOTO Syntax
IF NOT "%~2"=="" GOTO Syntax
ECHO.%1 ¦ FIND /V ":" ¦ FIND /V "\" ¦ FIND /V "*" ¦ FIND /V "?" ¦ FIND /V "," ¦ FIND /V ";" ¦ FIND /V "/" ¦ FIND "%1" >NUL
IF ERRORLEVEL 1 GOTO Syntax


SET Found=
rem Get the short name for the current directory
COMMAND /C REM
rem Search CurrentDir, path and pathext for the file
FOR %%A IN (%CD%;%Path%) DO FOR %%B IN (.;%PathExt%) DO IF EXIST "%%~A.\%~1%%~B" CALL :Found "%%~A.\%~1%%~B"
rem Display the result
ECHO.
IF DEFINED Found (ECHO.%Found%) ELSE (ECHO -None-)
rem Done
GOTO End


:Found
IF DEFINED Found GOTO:EOF
rem Store the first match found
SET Found=%~f1
GOTO:EOF
:Syntax
ECHO.
ECHO WHICH, Version 2.00
ECHO UNIX-like WHICH utility for Windows NT 4 / 2000 / XP
ECHO.
ECHO Usage: WHICH program_name
ECHO.
ECHO Specify program_name with or without
ECHO extension and without a drive or path.
ECHO Just like the UNIX command. (no wildcards please)


:End
IF "%OS%"=="Windows_NT" ENDLOCAL

November 12, 2004

Static substitution (Fowler Style)

Martin Fowler has a neat little article on refactoring class statics using instance variables. Most languages can't support polymorphism for static methods. e.g.

class A{
public void doInitStuff() { /*do stuff necessary for static init of B objects*/};
} ...
}
class B extends A{
public void doInitStuff() { /*do other stuff necessary for static init of B objects*/};
} ...
}
...
A a = new B();
A.doInitStuff(); /* but I'd quite like to polymorphically call B.doStuff(); Actually, could be trouble! */

Martin's solution is very elegant.

Creating shim libraries in Linux

Anybody who's done a bit of device driver development will know that occasionally system logs just don't provide enough information about the various problems you'll encounter and you have to hack up a shim library which sits between a problem library and it's loader/calling module. Linux Journal has a very nice article this month on creating just such a library for libusb. This could be useful for anybody developing an application or driver which needs to communicate with a USB device. Like writing a synch for a PDA, MP3 player or somesuch..

iPod wireless transceivers (only in Japan as yet)

Japanese Ratoc Systems Corp. has a new lineup of wireless audio products called the "REX-Link" series, and some of them are even specifically designed to fit your 3/4G iPod or iPod mini.

rexlink1p.jpg


There are four products in the series—two "receivers" and two "transmitters." Receivers come in the form of the "CR-RX01" with optical and analog audio outputs or the "REX-WHP1" headphones. For your transmitter, you have two options: the "CR-TXB01" USB transmitter, or "CR-TXB02" USB/analog transmitter (which also attaches to the back of 3/4G/mini iPods). These four products are matched to give three available packages: one with CR-TXB02 analog/USB transmitter and CR-RX01 receiver, one with CR-TBX01 USB transmitter and REX-WHP1 headphones, and one with CR-TXB02 analog/USB transmitter and REX-WHP1 headphones.
Original link from Gizmodo

November 15, 2004

Microsoft Shell (MSH)

Read about Microsoft Shell a few months ago and it seems like it's getting some serious attention. (maybe not as much as MS's search engine pitch but I'll hold fire for the moment).
MSH is a genuinely great idea from Microsoft. Not an unusual thing in itself but this is a bit different. To quote Udell


System administration has always been Windows' Achilles' heel. The graphical tools that simplify basic chores just get in the way when there's heavy lifting to be done. And CMD.EXE, the hapless command shell, pales in comparison to the Unix shells that inspired it. Win32 Perl has been my ace in the hole, combining a powerful scripting language with extensions that can wield Windows' directory, registry, event log, and COM services. But I've always thought there should be a better way.

Jeffrey Snover thought so, too. He's the architect of Monad, aka MSH (Microsoft Shell), the radical new Windows command shell first shown at the Professional Developers Conference last fall.

MSH is quirky, complex, delightful, and utterly addictive. You can, for example, convert objects to and from XML so that programs that don't natively speak .Net can have a crack at them. There's SQL-like sorting and grouping. You write ad hoc extensions in a built-in scripting language that feels vaguely Perlish. (sd: reminds me a bit of bash scripting) For more permanent extensions, called cmdlets, you use .Net languages.


This will really appeal to hardcore MS administrators and Winadmins coming from a Unix background. Also this is potentially a reall cool tool to enable the policy based management of collections of windows boxes using .NET commandlets. Very tasty... Thank you Microsoft, just what the doc ordered..

More on MSH

Just reading the complete Udell article again.
Can't help but feel that getting an XML representation of system processes over a certain size using a command like: <

MSH> get-process | pick-object name,vs | where { $_.vs -gt 150000000} | convert-xml

is extremely neat. Sample results are listed below. I'm less than convinced about the two-part name/type syntax for the XML representation (it's a bit clunky) but this is a small quibble.


<MshObjects>
<MshObject ReferenceID="ReferenceId-0" Version="1.1">
<MemberSet>
<Note Name="name" IsHidden="false" IsInstance="true" IsSettable="true">
<string> firefox</string>
</Note>
<Note Name="vs" IsHidden="false" IsInstance="true" IsSettable="true">
<int> 220983296</int>
</Note>
</MemberSet>
</MshObject>
</MshObjects>

November 16, 2004

Escaping entities using XSLT

While writing the last post I didn't fancy the idea of hand escaping the HTML entities into the MSN XML output. So I cheated using a funky little piece of XSLT that I cooked up earlier tonight. It's listed below...

<xsl:stylesheet version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no" doctype-
system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" doctype-public=
"-//W3C//DTD XHTML 1.0 Transitional//EN"> </xsl:output>
<xsl:template match="/">
<xsl:call-template name="escapexml">
<xsl:with-param name="block" select="."> </xsl:with-param>
</xsl:call-template>
</xsl:template>
<xsl:template name="escapexml">
<xsl:param name="block"> </xsl:param>
<xsl:for-each select="$block/*|$block/text()">
<xsl:choose>
<xsl:when test="self::text()">
<xsl:value-of select="."> </xsl:value-of>
</xsl:when>
<xsl:otherwise>
<xsl:text> &lt;</xsl:text>
<xsl:value-of select="name(.)"> </xsl:value-of>
<xsl:for-each select="@*">
<xsl:value-of select="concat(' ', name())">
</xsl:value-of>
<xsl:text> ="</xsl:text>
<xsl:value-of select="."> </xsl:value-of>
<xsl:text> "</xsl:text>
</xsl:for-each>
<xsl:text> &gt; </xsl:text>
<xsl:choose>
<xsl:when test="*">
<xsl:call-template name="escapexml">
<xsl:with-param name="block" select=".">
; </xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."> </xsl:value-of>
</xsl:otherwise>
</xsl:choose>
<xsl:text> &lt;/</xsl:text>
<xsl:value-of select="name(.)"> </xsl:value-of>
<xsl:text> &gt; </xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

I'll produce a tidier more beautifying version when I get the time but it's OK for a first attempt and I know of at least one person who's asked me to do something similiar in the past. And on that note.. it could be time to hit the hay ;-)

November 25, 2004

SUN's Java 6 (May the source be with you)

Picked up a useful piece of information from Xyling Java blog about Java 6 becoming increasingly open source
Building on the success of the snapshot release process started during the development of J2SE 5.0, SUN will make early snapshots available for J2SE 6. This will afford developers the chance to make a greater contribution to the Java development lifecycle. Mb>Only fair considering developers generally have an idea or two about improvements they'd like to see in languages, libraries, API's etc. Well done to SUN on this developer friendly initiative.
(thinking of coming up with a "developer-friendly" logo similiar to the dolphin friendly one you see on some cans of tuna! Put suggestions on the comments page )

SUN's Java coding conventions on one page

Save time writing a difficult and boring java style guide! Refer to William Blake's handy page which covers everything from Java exceptions to method and variable naming. I used to have something similiar in my lecturing days but this version is definitely better.

December 4, 2004

Web Service Ports

Just read an interesting post on Steven Vinoski's middleware matters about the lack of multiple-port support in the End-Point Reference EPR currently under review by the WS-Addresing working group and augmenting WSDL 1.1 by allowing for more dynamic usage patterns. Currently the EPR doesn't suport multiple ports. Ports, for those that remember the original WSDL spec, enable a webservice to be accesible through multiple protocol/transport/format alternatives. Steve Vinoski proposes a useful "business card" analogy to explain the practicality of multiple ports, covered by one EPR. Personal addressing on the internet has arguably evolved this way anyway. Here are some examples of ports associated with the person Shane Dempsey (of Geesan Tech, for SPAM avoidance purposes) with basic URI: sdempsey@geesan.com


  • EMAIL: mailto:sdempsey@geesan.com (TCP, port 25)

  • MULTIMEDIA SESSIONS: sip:sdempsey@geesan.com (UDP/TCP, port 5060)

  • SIP INSTANT MESSAGING: im:sdempsey@geesan.com (UDP, port 5060)

  • JABBER INSTANT MESSAGING: sdempsey@geesan.com (TCP, port 5222)


In some cases the URL scheme is provided, indicating a particular port (e.g. mailto: implies SMTP). The use of schemes is far from uniform however, meaning that there is not a direct port-scheme correlation. In the service domain, this is better. For example a SOAP service where information is transferred over an alternative application layer protocol such as SOAP, SIP or SMTP is possible. A hyperlink to such a SOAP service would take a form similiar to mailto:soap@mydomain.com

December 5, 2004

Can't Add, Can't Post!

Picked up the following link from Jon Udell about the CAPTCHA (Computer Automated Public Turing test to tell Computers & Humans apart) preventing blog spam. This is a really (should that be raelly) tasty idea from Rael Dornfest. It can be summed up as can't add, can't post. He uses the Blosxom Writeback function which provides weblog comments with write-backs. An arithmetic sum is embedded in the writeback and no commets are allowed unless the answer is posted correctly. An example of this is
5 + 2 =
Neatly sidestepping more general blog spambots. The numbers are generated randomly. A definite improvement would be image obfuscation (a la Captcha!) and a bigger range.. He currently only uses 0-9 meaning a 1 in 20 chance you're gonna get the right number. I'm not sure I want to encourage blogspammers to brute force my site, especially when a post is so tantalisingly close
I'm working on my own interesting weapon in the battle against blogspam. It currently has the catchy title of blogassasin (Apologies to jmason & the rest of the spamassasin team). Also, it doesn't kill blogs but early versions come close. Active blacklist generation is another tidy feature. So spammers should think before thy HTTPiss Off innocent bloggers. Personally I don't believe that my blog (or anyone else's for that matter) needs to become any less relevant or increasingly grbled. So let's say NO to blogspam ;-)

December 8, 2004

Blog trading

It had to happen at some stage I suppose. (although I'm still not quite sure why???). Blogshares enables users to trade blogshares similiar to a fantasy stockbroker game.
Blogs are assigned monetary values based on the number of incoming and outgoing links to other blogs. It's similiar to Google in that it measures 'connectedness' Currently this blog is very lowly ranked :( Probably due to the fact that most of my friends don't actually maintain blogs so the usual web of trackbacks is avoided. It could also be that I haven't said anything interesting. I hope not. Also my blogroll is generated using javascript and it appears that the blogshare parser has failed to pick up these links...

December 9, 2004

Is spam driving you mad?

Evolutiontwo has a profane and funny response to all spammers. I hear you brother. Like the rest of the sane world he has no intention of passing his bank a/c details over to some spammer claiming to be from Africa, buying a fake rolex or using a super-cheap online pharmacy to buy drugs to enlarge various body parts. Also, to give a lot of net user's credit they're bright enough to know that it was a spammer rather than an online lottery that harvested their email address.

December 10, 2004

Fun with Regular Expressions

I was playing around with regular expressions in Java. AFAIK these are only around since the JDK 1.4 and are therefore quite new. As a sometimes Perl programmer I've some experience with these but .
Hoever, all this hacking reminded me of the most amazing regular expression I ever saw. I saw this on the ActiveState's RX cookbook some time ago.
It's actually a useful and logically sound solution to a common problem... How to match all RFC 1738 compliant URLs and turn them into hyperlinks! It was posted by Abigail to comp.lang.perl.misc on 08/14/2000. Abigail, I love you!!!


$string =~ s<
(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.
)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)
){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?
:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-
fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-
)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?
:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!
*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:
(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(
?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](
?:[a-zA-Z\d]|[_.+-])*)|\*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d
])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z
\d]|[_.+-])*)(?:/(?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a
-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d]
)?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:
(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:
(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+
))?)(?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\d$
\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]
{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%
[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]
|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:
(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-zA-Z
\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)
*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:(?:(?
:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-
zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d
$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d])
)|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%2
0)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?
:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID
|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])
?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(
?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(
?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|o
id)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(
?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:
%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(
?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:
\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a
-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%2
0)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-f
A-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d$\-_.+!*'(
),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs])://(?:(?:(?
:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?
:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))
?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:
[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\?(?:(?:[a-zA-Z\d$\-_
.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\d$\-_.+!*'(),
]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA
-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)
?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=
])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@
&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=]
)*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z
\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d
]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(
?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[
Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2
}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])
?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:
\d+)){3}))(?::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:
%[a-fA-F\d]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|
[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))
|[&=~:@/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-
9]\d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~
:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*
)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn
]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)))?))
)?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-
Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:
\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'
(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),
])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-
Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))
><a href = "$1">$1</a>>gx;

Needless to say, anything this complex requires a license to say that it may not work which is reprinted here (even though it logically should work at all times). Wow and wholeheated respect to Abigail...

The Gaisan regular expression toolkit

If you want to match URL's reliably without creating a regexp monster so big that you need to connect up the digital projector just so you can work on it then this is something tasty I've come up with. Demonstrated in java, my language of choice


Pattern urlPattern = Pattern.compile("(((URL:|url:|http:|htt:)\\/\\/)|www\\.)(((([A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9]"+
"|[A-Za-z0-9])\\.)*([a-zA-Z][A-Za-z0-9-]*[A-Za-z0-9]|[a-zA-Z]))|([0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+))"+
"(:[0-9]+)?(\\/([a-zA-Z0-9$_.+!*'(,);:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*(\\/([a-zA-Z0-9$_.+!*'(,)"+
";:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*)*(\\?([a-zA-Z0-9$_.+!*'(,);:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*)?)?)";
Matcher urlMatcher = urlPattern.matcher("http://streamserver.gaisan.com/ourapplication?sd=234324&cam=1");
boolean matches2 = m2.matches();
System.out.println("Match should be true:\t" + urlMatcher.matches());

December 15, 2004

Ban P2P applications

Perhaps not. I've commented in the past about the effects that P2P networks have upon the ISP traffic topologies (timing, upstream/downstream biases etc.) and we all know they can be used to illegally share copyrighted files. However, I strongly believe that P2P applications are the prototype for the next generation of highly resilient and scalable internet applications. In my former job as a telecomms researcher at TSSG we came up with quite a novel approach to integrating active networking and peer-2-peer apps at the top of the stack. I'm not sure what became of that work but my faith in the technology hasn't waivered.
I guess that's why I was so fascinated by the following post on boing-boing about 2 Princeton researchers who've cooked up a P2P app in 15 lines of concise Python code. The original post is located on Ed Felton's blog. It was damn funny to see someone hack up a Perl version in 9 lines. Without disrespect, the python implementation is more legible but the Perl code wins my "tight code" Award for 2004. Matthew Scala has a well used styrofoam cup with an strategicaly embedded 1/2 fried 2Mb Dimm (circa 1993) winging its way to him at this very moment. Enjoy! What a prize and what a hack :D

Where old computers go to die :(

IBM_AT_System_s1.jpg
While I was typing the last entry I wondered if there were any websites devoted to old PC technology. My sense of nostalgia overwhelmed me when I visited Old-Computers.com. In particular this article on the IBM PC AT brought it all back. I remember using one of these in school when I was younger. It had the 286 processor (which really kicked ass in its day) an outlandish 1Mb of RAM and 16-bit expansion slots, of course. This had a type 2 mobo with 4 standardised 256k slots instead of 128.. Those were the days.. when computers were dumb, real men used DOS, we played footie in the park, jumpers for goalpsots...
Apparently the 128 k slots were a bit weirder than they seemed initially.

The first AT used 128 k chips, which appeared to be two 64k chips stacked. It used two DMA chips, which tended to fail in tandem. It also used a second IRQ controller. If the AT had more than 640 k of RAM, the CMOS would only allocate the first 512 as Convential, the rest as Extended.
Only 17 hard drive type were supported in the CMOS, causing no end of headaches when Seagate realsed their 40 meg half height. The 1.2 meg floppy drive could read and write 360's, but if you formatted one, it couldn't be read by a regular double density drive.

December 20, 2004

headmap

I must confess to being absolutely fascinated by headmap.org. I'm fascinated by the idea of smart spaces which infer user intent based on learned context. For example an office space that learns that automatically adjusts the heating in a room based on predictions about a meeting occuring. Lights that switch themselves off when there's nobody around etc. The most value is achieved when the ambient intelligence is fully integrated with other organisational information systems such as email and IM servers, project management tools, data and profile repositories etc. I fancy the idea that every node in an increasingly networked world could dynamically negotiate new cooperative strategies and operations based on an understanding of user intent. A true User Oriented Architecture. This could be communicated using a standardised information markup with transforms for hetergeneous devices to address capability differences. I'm straying into agents territory here but there was a lot of value in that research. In essence,extending the human computer interface (HCI) throughout the user environment. In particular I like the idea about capturing memories at locations, augmenting the real world with location/memory tags, a bit like the virtual worlds created by multi-player games. The possibilities are amazing & the results would perhaps be indistinguishable from magic...

December 21, 2004

p2(very small)p

This is getting ridiculous. Following my recent post about Ed Shelton's P2P program in 15 lines of Perl there's been a P2P app done in 9 lines of Perl and now (wait for it) a full peer-2-peer application in 6 lines of Ruby with a 3 lines of comments.
Just to show everybody how nuts this has become the code is reproduced below...

# Server: ruby p2p.rb password server server-uri merge-servers
# Sample: ruby p2p.rb foobar server druby://localhost:1337 druby://foo.bar:1337
# Client: ruby p2p.rb password client server-uri download-pattern
# Sample: ruby p2p.rb foobar client druby://localhost:1337 *.rb
require'drb';F,D,C,P,M,U,*O=File,Class,Dir,*ARGV;def s(p)F.split(p[/[^|].*/])[-1
]end;def c(u);DRbObject.new((),u)end;def x(u)[P,u].hash;end;M=="client"&&c(U).f(
x(U)).each{|n|p,c=x(n),c(n);(c.f(p,O[0],0).map{|f|s f}-D["*"]).each{|f|F.open(f,
"w"){|o|o<<c.f(p,f,1)}}}||(DRb.start_service U,C.new{def f(c,a=[],t=2)c==x(U)&&(
t==0&&D[s(a)]||t==1&&F.read(s(a))||p(a))end;def y()(p(U)+p).each{|u|c(u).f(x(u),
p(U))rescue()};self;end;private;def p(x=[]);O.push(*x).uniq!;O;end}.new.y;sleep)

I think I've had more than enough of this. Pick a suitably high-level language, use single character variable names and some whacky formatting to exchange a file over a socket and call it P2P. Next someone is gonna write a java programme using SUN's JXTA that just inits a class or two, format it all on about 4 lines and say, wow it's the shortest P2P app ever... More interesting would be a P2P application written in a declarative language like Prolog or a functional language like Haskell or Hope. Haven't done much Haskell programming in a while (damn rusty and for some reason don't feel like breaking out the books) but prolog looks tempting. Expect a post. I may have to use some file IO/socket programming but it sounds like an interesting project. I'll let readers know how I get on ;-)

December 30, 2004

Changing the administrator password on an NT/2k/XP box

A friend (honest) forgot their pwd recently and asked me to hack into their machine and change their admin pwd. I found the following really tasty application which does the trick.
The Offline Password and NT registry editor by Petter Nordahl-Haggen. This is a very useful utility which I've used in the past and which has proven very effective. There are bootable floppy and CD images on the site that you can use edit your windoze box's passwords, stored in the reg's SAM file. For more hints and tips check here. Merry Christmas and a happy new year to everyone...

January 3, 2005