Author Archive

Zombies banging at the WordPress door

Zombies are impossible to avoid in this day and age, even (maybe especially) online. Anyone who maintains a publicly accessible server and pays even passing attention to their security logs can attest to the repeated brute force login attempts for everything from SSH to WordPress admin panels. Having a strong SSH password, disabling root logins, and changing the default port works wonders, but doesn't help with my WordPress installation, which is where I most often get hit these days.

In the past, these naive attacks were fairly easy to block using tools like fail2ban or using plugins like Limit Login Attempts, which put IP addresses on a blacklist for too many failed login attempts. Unfortunately, of late it seems like the crackers have caught on, and are now bringing the full force of the zombie hordes to bear: each zombie computer only tries logging in to a particular website once, preventing an IP blacklist from having any effect.

Thus, I've had to resort to more drastic action: IP whitelists by placing the following code in the .htaccess file of my WordPress directory:

<Files wp-login.php>
order deny,allow
deny from all
allow from 127.0.0.1
</Files>

127.0.0.1 should obviously be replaced with your IP address unless you only access WordPress Admin from the server itself (either directly or through a proxy like Socks5). I have a dynamic IP address, but now use a Socks5 SSH tunnel for working on WordPress. Anyone else trying to login will now get a 403 forbidden page, not only adding to my peace of mind, but also saving me the processing time and bandwidth of serving up the login page to would-be crackers. 😀

OWA and Mobile message composition conflicts

My school webmail uses Outlook Web Access (OWA) by default to access Microsoft Exchange 2010. Although this is clearly suboptimal, I do find myself using OWA occasionally when I'm not in my office and need to quickly send an email (I don't particularly like drafting emails on my phone).

Unfortunately, I would randomly get the error message "The action couldn't be completed. You might have updated the item on another computer or a mobile phone."
Screenshot of error message: "The action couldn't be completed. You might have updated the item on another computer or a mobile phone."
It usually only happened for longer messages, but would occasionally pop up nearly immediately after I began composing. I'd then have to copy and paste the entire message and start up a new message, hoping that I'd have enough time to hit send. A cursory Google search wasn't too helpful, and since I only rarely used OWA, I didn't bother to actually troubleshoot the issue.

Since I've had a little more time after finals, a little testing quickly revealed that the problem is trivially solved by telling my phone not to access the "Drafts" folder. Turns out that I'd enabled push synchronisation on K9-Mail on my Android phone for the Drafts folder. When OWA saved the draft for the message in composition, K9-Mail would automatically download the message; even without any editing, that seemed to be sufficient to make OWA unhappy.

My quick hack has been to create a Drafts-Mobile folder for my phone email client to use, so that it never tries to access in-progress compositions from OWA. However, it still seems just a bit strange that OWA would be so sensitive to syncing from a mobile phone without any modification of the draft. After all, I didn't actually "update" the item on another computer or phone, and it seems to me like the target audience for Microsoft Exchange is fairly likely to have smartphones. Easy enough to work around, I suppose, but here's to hoping Microsoft fixes this in a future release!

Phantom Google Apps/Google Groups Subscription

Most modern Listserv software are pretty good about making unsubscriptions relatively painless. For most reputable mailing lists, gone are the days when unsubscribing meant having to email sometimes inattentive administrators and asking to be taken off. Unfortunately, while the automated mailing list managers usually operate without a hitch, when errors do occur, it can sometimes be hard to convince the software to take you off.

This most recently happened to me in the form of Google Groups. I had been signed up to a group under a Google Apps email account (e.g. liam@example.com), and wanted to unsubscribe. This is normally easy enough. However, after removing my membership in the handy "Edit my membership" page, I still kept on receiving emails.

Multiple times I tried unsubscribing by sending emails to groupname+unsubscribe@googlegroups.com, but it would always state that because I wasn't subscribed, no action was taken. In trying to trouble-shoot, the only suggestion was that I had signed up under an email forwarder, but not only did I have none set up to that address, but careful examination of the email headers also confirmed that Google Groups was instead still sending emails to liam@example.com. This was several months ago, and I resigned myself to failure and just set up a message filter.

This story has a happy ending though: earlier today, I was reminded that when setting up Google Apps for Domains, one is assigned a temporary email address of the form liam%example.com@gtempaccount.com while waiting for the MX records to propagate! I hadn't thought of this, because I had subscribed through the Google Groups web interface, and just assumed that it would use liam@example.com, which it's done every single other time. Lo and behold, after sending an unsubscription request from liam%example.com@gtempaccount.com, I was promptly rewarded with an unsubscription confirmation! 😀

Because I have used Google Groups successfully a number of times before in conjunction with Google Apps, I am fairly certain that this was a one-time fluke in the system, whereby the wrong address was accidentally used. However, because I wasn't able to find relief anywhere else on the Inter-webs, I thought I'd write up my experiences here, in case anyone else suffers from the same problem and cannot get the attention of the list admin to be manually removed. That's it, that's all!

Networking Epidemics

I originally wrote this op-ed this summer for A Global Village, a student run journal on international affairs at Imperial College aimed at a non-specialist audience. The full issue can be found here. Obviously, the article is now a bit dated, and some figures are missing from this version (see original nicely formatted PDF).

Although it is online social networking sites like Myspace and Facebook that garner the majority of media attention these days, social networks themselves have always been a part of our lives. From the small hunter-gatherer communities of long ago to our present overlapping family, school, and work groups, humans arguably were and still are defined by their relationships to one another. While the extent to which those connections have evolved over time is debatable, a crucial difference in the modern world is how much quantitative data has been compiled. Even as privacy advocates grow increasingly concerned about data rights, it remains the case that such data could be a veritable gold mine for policy-makers working with phenomenon that spread via network effects.

One such phenomenon that is of considerable interest to public policy is the epidemic spread of disease, for obvious reasons. Traditional models of epidemics compartmentalise the population into broad categories – e.g. those who are susceptible to being infected, those who have already been infected, and those who have been infected and are now immune. However, these models assume that there is random mixing within the population; that is to say, an infected individual is equally likely to infect any member of the susceptible category. This is clearly a somewhat unnatural assumption; in the real world we are far more likely to transmit an infectious disease – for instance, swine flu – to people with whom we come into close contact. Thus, mapping of the network of social interactions could provide insights into both the path and rate of spread of a disease and provide potential tools for halting or interrupting that spread.

Thresholds

Human interaction networks are, however, extraordinarily complex and thus difficult to quantify accurately. Thus, although epidemic modelling historically has roots
in the study of human diseases – with seminal work being conducted by Imperial College academics – let us first turn to an analogous but simpler problem: the spread of computer viruses for large networks (such as the Internet).

While different from biological systems in certain key respects, computers can also become infected by viruses (normally much to our dismay
and detriment), spread the infection to connected computers, and then recover after a time (whether through anti-virus software or by a hard-disk format). What's more, the network topology – the structure of the connected computer network – is far better characterized.

Scientists at Carnegie Mellon University were able to use a nonlinear dynamical systems model to relate the connectivity of a network to whether or not an epidemic spreads or dies out1. They found that, given the complete network of connections between computer systems, it is possible to derive an epidemic threshold for the virus below which the epidemic dies out exponentially2. In tackling such virus spread, identification of critical nodes, or computers, for 'immunization' is key: immunizing a particular computer rewires the network and, if chosen correctly, may stop the spread of infection.

One can compute the change in the epidemic threshold before and after immunization to determine if the node was effectively chosen to stop the spread of the infection. Ideally, if anti-viral resources ('vaccines' in the biological disease case) are limited, one wishes to immunize only those computers that result in a rewired network with a minimized epidemic threshold.

By constructing a similar 'contact network' for human disease transmission, it would theoretically be possible to make similar recommendations for vaccination regimes. Often, when a new epidemic is sweeping through a community, vaccine supplies are limited and policy-makers are forced to make unpalatable choices about where efforts should be directed. Network science could one day be used to advise authorities on the optimal allocation of available resources.

Building the Network

Unfortunately, it is somewhat more difficult to generate similar network maps for human diseases. Despite the staggering amount of data available from Facebook, limitations persist 3. Scientists must first characterize the types of interactions that contribute to increased transmission rates. In some cases, e.g. for HIV-AIDS, it is well known what sorts of interactions should 'count' in constructing the network. However, for many other diseases, it is unclear a priori the types of contact that are linked to transmission rates. Human interactions are also ephemeral in nature such that any network of interactions is constantly evolving.

Over the past few decades researchers have been working on just that: figuring out what sorts of interactions make up a contact network – this is done largely by making copious observations during instances of outbreak of disease. This type of data is almost always imperfect, but statistical inference may permit insight into modes of transmission. For instance, in a study by the Pennsylvania H1N1 working group (composed of researchers from Imperial College and the US Centres for Disease Control and Prevention), researchers examined the rates of swine flu transmission in relation to different features of primary (grade) school life 4. While it was
impossible to pinpoint directly how particular children were infected, after constructing social networks corresponding to the interaction patterns of the school children, the team could determine accurate probabilities of transmission as correlated with a number of different factors. Unsurprisingly, sex-related mixing patterns played a role – it was hypothesized that because children of the same sex in a class tend to play together, there is an increased likelihood of transmitting disease to one another. However, contrary to what may be popular belief, sitting next to an infected individual in the classroom did not significantly increase the probability of infection.

Although still a far cry from the extent to which we understand the spread of computer viruses, data of this sort continues to be compiled. As epidemiological modelling continues to advance, it will likely one day be possible to accurately map human disease transmission networks and make policy recommendations such as the one highlighted in the last section.

Brave New World

Of course, while the behaviour of infected computer systems might be in agreement with some models of disease epidemic propagation, such models do not take into account the vagaries of human nature. Unfortunately, while the humanities and social sciences have much of value to say about the human condition, and often advise what people or indeed policy-makers ought to do (whether from a moral, philosophical, or economic perspective), seldom are those fields quantitatively perfectly predictive of actual actions.

One obvious solution would be to ask actual human beings to inhabit the role of agents in an 'epidemic' game online. Perhaps an unusual approach, there are plausible advantages to characterizing behaviour in a simulation involving actual people despite that the fact that one's risk analysis changes considerably when one's own life – as opposed to one's game avatar's life – is at stake! This isn't a hypothetical; in late 2005, a glitch in the popular online game World of Warcraft resulted in the so-called 'Corrupted Blood incident', during which players' characters could be infected by a deadly 'disease'. Players invest hundreds of hours, monthly subscription fees, and significant portions of their social lives into the game; the virtual epidemic was able to provoke a wide range of responses from players, from flight to more sparsely populated areas to some characters trying to help others. Several epidemiologists have attempted to draw insight from these events, mapping them onto potential behaviour of people in real life epidemic disease scenarios 5.

As our online and offline worlds slowly merge, it is not altogether surprising that we should look from one to the other for insight and prediction. Although currently of largely academic interest, it is only a matter of time before social networks become an essential tool for government strategy in epidemiology and beyond, playing a role not only in dissemination of information, but also in policy and action.

  1. Charkrabarti, D. et al. (2008). Epidemic Thresholds in Real Networks. ACM Transactions on Information and System Security.
  2. More precisely, we encode the connectivity of the network into an adjacency matrix A, which specifies which computers are connected to one another for purposes of viral transmission. The epidemic threshold is then the inverse of the dominant eigenvalue of A, easily computable even for extremely large matrices using numerical techniques.
  3. It is possible to get interesting results even in the absence of a known contact network. For example, because of the friendship paradox ('your friends have more friends than you do'), the named friends of randomly selected people tend to become infected earlier in an epidemic than the randomly selected people themselves, which provides an early-detection mechanism. Christakis, N.A. & Fowler, J.H. (2010). Social Network Sensors for Early Detection of Contagious Outbreaks. PLoS ONE.
  4. Cauchemez, S. et al. (2011). Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc. Natl. Acad. Sci.
  5. Lofgren, E.T. and Fefferman, N.H. (2007). The untapped potential of virtual game worlds to shed light on real world epidemics. The Lancet Infectious Diseases.

Cyber-Space: The Next Frontier

I originally wrote this op-ed last winter for A Global Village, a student run journal on international affairs at Imperial College aimed at a non-specialist audience. The full issue can be found here. Obviously, the article is now a bit dated, and some figures are missing from this version (see original nicely formatted PDF).

Ranging from Iran's nuclear facilities to thousands of American diplomatic cables, recent high profile breaches of IT systems have highlighted the growing importance of cyber-security for this Information Age. Cyber-crime crosses national boundaries, and the issue is further exacerbated by the anonymity of attackers and the disproportionate potential for damage. While a notable problem in its own right, cyber-crime presages the inevitable conflicts that will arise from the close contact afforded by the Internet between varying cultural norms.

The advent of the Information Age has enabled unprecedented connectivity between not only individuals around the globe, but also connectivity across organizational scales. Large governments and corporations may quickly – and cheaply – directly reach and be reached by almost anyone with Internet access, as information transmission to even to non-networked systems is greatly facilitated by common software platforms used throughout the world. Such access, while beneficial to all, comes at a potential price.

Under Attack

Perhaps the most popularly reviled incarnation of cyber-crime comes in the form of malware. It is a rare computer user who hasn't at some point dealt with viruses, worms, Trojan horses, spyware, and other similar programs of their ilk (see table). These generally unwanted inhabitants of our IT systems, while personally devastating – as anyone who has lost data to particularly virulent malware can testify – are almost always undirected, causing damage, stealing information, and taking over computers throughout the computing world wherever improperly secured systems can be found.

Originally the province of hobbyists and academics, organized cyber-crime has, in the last decade, been the driver of much development in this field. However, the appearance of Stuxnet in mid-2010 suggests the growing involvement of national governments both as instigator and target of cyber-attacks. A very sophisticated computer worm, almost certainly requiring the efforts of multiple skilled programmers, Stuxnet appears to have targeted centrifuges in Iran's nuclear facilities thus delaying the uranium enrichment program. Given the lack of obvious commercial motive and the significant investment in the creation of Stuxnet, many people believe it to be the work of a Western power, possibly the United States or Israel, though no direct evidence has surfaced to indicate one way or the other.

More directed attacks against particular IT systems are most commonly seen against large, juicy targets like corporate servers, government databases, etc. These range from the technologically simple Distributed Denial of Service (DDoS), which involves simply overwhelming a server with junk traffic, slowing or even completely blocking legitimate traffic, to carefully crafted 'hacking' of data servers to steal information. Successful attacks can result in the exposure of important personal information, such as credit card numbers, for thousands of people or the crippling of Internet services; as we come to rely more and more upon the Internet for everything from communication to payment, the potential for damage only grows.

Very recently, highly publicized DDoS attack attempts have been made by 'hacktivists' against companies perceived to have crossed Wikileaks, such as Paypal, Visa, and Mastercard. Not long later, the self-styled group 'Gnosis' stole over a million user emails and encrypted passwords from Gawker Media, which runs a number of fairly popular websites. National governments are not immune either, as demonstrated by the defacement of Georgian government websites during the 2008 South Ossetia war.

Conversely, Google made a splash in early 2010 when it announced a pullout from operations in mainland China due to the hacking of Chinese human rights activists' Gmail accounts. Google, the American government, and the Western press generally level suspicion at the Chinese government, yet a concrete link was never made. Leaked American diplomatic cables suggest that these incidents were part of a more extensive network of cyber-attacks traced to hackers geographically located in China using Chinese-language keyboards.

As damaging as the immediate ramifications of critical infrastructure and data systems being under adversarial control may be, the ease of duplication and dissemination of information on the Internet, and the subsequent irreversibility of damage, only compounds the problem. There has been much in the news lately regarding the dissemination by Wikileaks of confidential American diplomatic cables, almost certainly leaked by someone authorized to access SIPRNet, a 'secure' classified network used by the US Department of Defense and Department of State. While Washington's response was perhaps of debatable justification, despite its best efforts, all of the leaked information is still online and will likely remain so.

Insecure by Design

Why is it that the IT landscape proves so resistant to securing? One reason is that many of the network protocols and paradigms used in today's Internet date back to a bygone era in which it was reasonable to assume that no one was acting maliciously or deceptively. For instance, SMTP, the protocol by which nearly all email is sent, allows the sender to arbitrarily specify the 'from' address; similar types of spoofing are possible with other protocols, hiding the actual originator of a transmission. There are of course many proposed technical solutions to these sorts of issues, but the need to retain backwards compatibility, coupled with the uneven adoption of new protocols, has limited the success of such mechanisms.

Yet there exists an even more intractable issue: us, the general populace. As Internet access becomes both cheap and increasingly necessary for day-to-day affairs, a growing number of people have always-on high-speed connections. Unfortunately, this makes personal computers a target not only in their own right, but also for use in further cyber-crime. By assembling large teams of compromised computers, or 'botnets', consisting of 'zombie' computers, malicious agents are able to significantly increase not only their available network firepower for sending spam email or performing DDoS attacks but also may better cover their tracks – the attacks come directly from the systems of innocent but insufficiently security-conscious bystanders. It is thus impossible to identify the actual adversary in most cases.

While it is easy to blame naive end users for not securing their personal computers, adopting such an attitude is of little use. It is almost trite to refer to the annoying persistence of Windows Vista security prompts, but this exemplifies how willing people are to repeatedly trade minor intangible risks for more immediate concrete rewards. Although the results of individual security breaches at the personal level are relatively minor, except when in the aggregate form of botnets, similar patterns of behaviour by privileged users, for instance members of the armed forces, can have far greater immediate impact. Around the turn of the century, the virus Melissa was able to jump from the wider Internet to the American military's closed network after less than 24 hours, most likely due to a careless user who connected the same system to both networks. More recently, Stuxnet was designed to spread via USB memory sticks, easily hopping to the targeted Iranian centrifuge control computers, despite a rational effort to sequester them from the Internet.

Even where network connectivity is not involved, the mere existence of other similar systems provides vectors for malware transmission and a cloak of anonymity for the originator. If the Iranian computer systems had not been running the same software and operating system as thousands of other industrial plants, it would have been difficult for Stuxnet to spread as it did, hopping from plant to plant while solely activating its destructive payload on specific targets. In this scenario, a more directed attack would likely have been required, one which may have revealed the perpetrators.

The Great Leveller

Before the current Information Age, barring occasional exceptions, it was difficult for a single disaffected individual to acquire huge influence over the world or damage critical infrastructure without significant personal risk to life and limb. The Internet has changed all of that: a single person with moderate technical skills can, from the relative safety of his or her home, direct a botnet to temporarily cripple an e-commerce site through a DDoS attack, as was attempted against Visa, Mastercard, and Paypal following their decisions to suspend payments to Wikileaks.

Similarly, a single disaffected individual with access to classified information was able to make public thousands of diplomatic cables, in a manner that has proven exceedingly difficult for the US Government to suppress. Although the source may have been found and arrested in this particular case, the damage had already been done. With multiple copies of the data scattered on servers throughout the world and downloaded on individual computers, it is near impossible to prevent further dissemination.

Even something as complicated as Stuxnet, whose construction involved three zero-day exploits, two stolen security certificates, and detailed knowledge of the Programmable Logic Controllers used in industrial systems, could conceivably have been designed by a small group of vigilantes. Given the complexity of the operation, the deliberate targeting and the obvious motivators for a major Western power, this is a somewhat unlikely explanation. Indeed, a far scarier scenario would have been a malicious individual simply seeking to wreck havoc on industrial plants and using something similar but untargeted. It remains a fact that there is not, and unlikely ever to be, any direct evidence of involvement by a State.

Internet Without Borders

National governments have begun recognizing the challenges posed. Indeed, in the 2010 UK National Security Strategy review, 'hostile attacks upon UK cyber-space' is categorized as a Tier One priority risk. However, the dual issues of the lack of accountability and the disproportionate potential impact of single individuals present significant difficulties for governmental responses. While it has long been possible for covert operations agencies to achieve plausible deniability, most major operations could be reasonably understood not to be the work of individuals acting alone. This is no longer the case. How much of the hacking done from within Russia is actually sanctioned by Moscow? What is the appropriate response to a nearly untraceable attack like Stuxnet? Some nations have begun requiring the use of real names online, which if perfectly implemented could take away the anonymity that malicious parties hide behind. However, even ignoring the applicable free speech considerations, it is highly unlikely that any such system would work in practice, as technical systems are susceptible themselves to being mislead and zombie computers could still be used.

Another possible solution is for governments to hold others accountable for all hacking activity originating from within their countries. Indeed, in an analogous fashion, Beijing has already warned that they would hold Washington responsible for terrorist attacks conducted with the assistance of Google Earth, as the US Government has not complied with China's requests for Google to lower the image resolution of sensitive areas.

However, while it may be reasonable to state a priori that each nation should police within national borders, this is technically very difficult when it comes to the Internet; cyber-crime almost inherently crosses national boundaries. How, for example, should a government respond to cyber-warfare waged by a botnet primarily situated in Britain, responding to orders given through servers in Russia, and controlled by someone located in Brazil – assuming that the attack could be traced that far back in the first place? What share of the blame should the owners of the compromised British computers take for not having properly secured their systems? In Germany, Internet customers are personally liable if they do not properly secure their wifi networks, which are then used for illegal file sharing; unknowing participation in cyber-warfare would presumably be treated with more gravity.

Difficult as the jurisdictional and enforcement issues will be, an even thornier issue arises from the fact that citizens from various nations are in direct contact with the governments of others: the cultural norms and laws differ considerably from one nation to another. For example, much to the consternation of American free speech advocates, an English court claimed jurisdiction in the 2004 libel case surrounding the book Funding Evil – which was not published in the UK – based on the reasoning that 23 copies were purchased in England from online retailers and a chapter was made available on the Internet. Several American states have since passed laws specifically aimed at protecting against 'libel judgments in countries whose laws are inconsistent with the freedom of speech granted by the [US] Constitution'. What might happen to an American who aimed to disseminate information 'harmful to the spiritual, moral, and cultural sphere of' China, as the Shanghai Cooperation Organisation has chosen to define 'information war'? The United States would probably regard such dissemination of information as falling under freedom of speech, yet China may consider it an instance of cyber-terrorism; and, should the United States fail to take appropriate action, construe the American government's response as an act of cyber-aggression itself!

The Next Frontier

Cyber-space arguably represents the next frontier in the development of international relations, as nations cope with the challenges of being able to immediately and directly influence the infrastructure, culture, and the lives of people throughout the world, and more importantly, the possibility of being reciprocal to such influences. The cross-boundary nature of the Internet is beginning to come in conflict with the existence of differing national laws and cultural norms, spurred on by the obvious difficulties in dealing with cyber-security on a global stage. Thus, in parallel with technical and educational measures to enhance cyber-security, diplomatic norms will have to be altered to account for the powers afforded to individuals by the anonymity and interconnectedness of the Internet.

GNU screen / Bash: start commands in interactive sessions

I run nearly everything from the terminal, including my mail client (mutt), music player (cmus), custom simulation software (Python/Matlab/C), and some hacked together daemon-like mail and server scripts—my browser (Firefox) is the only always-on GUI application. Thus, GNU screen is an essential tool in my workflow, since it allows for easy multiplexing and detaching of sessions.

One of its many features is the ability to start multiple sessions and automatically run commands in each of them. This isn't necessary for my server and workstation, since I just run screen persistently. However, my MacBook Pro is a different story, and not having to start up each program individually is a great convenience. The canonical way to do so is through creation of an alternate screenrc

# $HOME/.screen/startuprc
source $HOME/.screenrc
screen -t cmus 0 cmus
screen -t mutt 1 mutt
screen -t bash 2
screen -t custom_title 9 custom_script.sh

and to then invoke screen -c ~/.screen/startuprc.

This works great with stable programs like cmus and mutt, but I oftentimes need to restart my hacked together scripts, maybe after a crash or if I just want to tweak the code. Unfortunately, as set up, once an automatically started program finishes running, screen (correctly) exits the session. Starting up a new session each time can get a bit old.

The easiest solution I found was to create a custom Bash rcfile for each program, which would first source ~/.bashrc and then run the command at hand (e.g. custom_script.sh). This way, an interactive Bash session remains after exiting the program.

# $HOME/.screen/startuprc
source $HOME/.screenrc
screen -t cmus 0 bash --rcfile $HOME/.screen/cmus_rcfile
screen -t mutt 1 bash --rcfile $HOME/.screen/mutt_rcfile
screen -t bash 2
screen -t custom_title 9 bash --rcfile $HOME/.screen/custom_script_rcfile

This is almost sufficient, but there's still one quirk left. Because Bash is running the script from an rcfile, it doesn't store the command in its history buffer. A minor thing, to be sure, but it means I can't use the four-keystroke sequence ctrl-C, <up>, <enter> to restart programs. Luckily, Bash provides a means of deliberately inserting commands into history without running them, so the simple addition of history -s "command" to the appropriate rcfile fixes that:

# $HOME/.screen/custom_script_rcfile
custom_script.sh
history -s "custom_script.sh"

And that's it! I'd love to hear if anyone has come up with a more elegant solution. For now though, the above provides everything I need, and is relatively simple to setup. Happy hacking!

Mindhack – Mental math pseudo-random number generators

Human minds are great for many things, but picking random numbers is not one of them1. At one point, these sorts of cognitive biases were used to support arguments for determinism, to the point where some experimental psychologists even undertook to train subjects to produce statistically random-looking output2. Determinism is beyond the scope of this post, but in keeping with the spirit of the last, we'll be looking at simple mathematical tricks to generate pseudo-random numbers.

DISCLAIMER: These techniques are not suitable for practical applications, and especially not any serious cryptography. I am not an expert in the field, and have just been playing with these ideas in my spare time.

Background

Computer programs often need quasi-random numbers for everything from scientific simulations to cryptography; unfortunately, true randomness is difficult to extract, and usually limited in quantity. However, not all applications require the same level of randomness: cryptography needs to protect against a dedicated human adversary, and so has the greatest need for numbers being actually random3, while scientific simulations only require that the numbers look indistinguishable from random ones by certain statistical tests, but need lots of them as quickly as possible, even if they aren't quite as "random".

To fill those niches, mathematicians and computer scientists have developed a wide range of different pseudo random number generators (PRNG), each with their pluses and minuses. What exactly are PRNGs? Simply put, they're sets of mathematical rules for deterministically producing random-looking output. Because they are deterministic, they require an initial seed number, and always produce the same output for the same seed4. However, to most humans who aren't statisticians or cryptanalysts, they're sufficiently good.

Here, we present three simple PRNG algorithms, implementations of the Lehmer PRNG5, that you can carry around in the back of your head for when you need to wow an audience with your ability to randomise, . For the record, I am not to be held liable if you lose all of your friends, stutter at interviews, or get hit by a bus because you're too busy updating your internal PRNG.

Lehmer PRNG

First some (skippable) maths: given a large prime p, a multiplier m, and a seed number x_0, a Lehmer PRNG can be defined by the recursive function x_{n+1} = m \cdot x_n \mod p 6. By a judicious choice of m, the series \{x_i\} does not repeat for a long time, ideally going through all of the integers from 1 to p-1 7. The output stream is then some function of the current state x_i. For computer programs p, m are usually chosen for ease of binary computation. Furthermore, computers can handle far bigger numbers than most of us mere humans can—the historically popular choice8 2^{31} -1 is a lovely number, but I'd hate to be trying to divide by it in my head. Luckily, there are lots of primes, and good choices make it far easier to do modular arithmetic in decimal notation, reducing everything to basic addition/multiplication on the digits. Furthermore, instead of taking the bit parity or least/most significant bit, we instead use the unit's digit for the output stream.

One pair choice is p=59, m=6 9. Because 60 \equiv 1 \mod 59, each iteration is a simple matter of multiplying the ones digit by 6 and adding the tens digit. To illustrate, a sequence starting from 17 would continue 43, 22, 14, 25, 32, 15, 31, 9, 54, 29,..., and taking just the unit's digit, that turns into 3, 2, 4, 5, 2, 5, 1, 9, 4, 9, which looks pretty random. Naturally, since the modulus is only 59, the sequence repeats itself every 58 iterations. Also, the numbers 9 and 0 are slightly less well-represented in the stream than the others. In most everyday contexts neither would be a problem—unless your friends have a habit of forcing you to spit out hundreds of random numbers and then running statistical analysis10, and if that's the case, I suggest you find new friends. However, there are other, slightly harder to compute pairs if you want a bit more.

Next up is p=101, m=50. Because p=101, this construction has the advantage that the sequence goes from 1 to 100, making the distribution of the output stream uniform over 1 to 10 and providing a longer period. The choice of multiplier also simplifies computation: if the current state x_i is even, the next number x_{i+1} = 101 - \frac{x_i}{2}, and if odd, x_{i+1} = 50 - \frac{x_i-1}{2}. Not as nice as the previous example, but still not bad.

If you have far too much time on your hands, you might consider an even larger prime to increase the period. For example, the pair p=1999, m=20. Unlike the previous two examples, the period of this Lehmer PRNG isn't actually the full 1998, but rather "only" 999. While another choice of multiplier would give a better answer, 20 has the advantage of making computing far simpler. Besides, it's fairly likely that you'll make a math error before getting to 999 that'll put you in the other group of 999 anyways. For the calculation, first break up x_i = 100 a_i + b_i, where $a_i, b_i$ are two digit numbers. Then x_{i+1} = 20\cdot b_i + a_i. Or equivalently, take the smallest two digits, multiply them by 2 and shift them to the left one place, and then add the larger two digits. Some mental gymnastics required, but still do-able.

Visual analysis of (p=1999, m=20)

Visual analysis of (p=1999, m=20). Bitmap image generated by linking pixel brightness to the values of output stream. 9 corresponds to white, 0 to black, and shades of grey to everything in between.

And of course, there are an infinite number of primes, so if none of the above suits your fancy, pick another. For ease of computation, I've found it best to choose primes close to k \cdot 10^n, because with those you can simplify the arithmetic as we've done above. However, the sky's the limit.

Seeding your internal PRNG

No, I'm not talking metaphorically (though the referenced article does propose a good idea). Now that you have an internal PRNG, you'll have to start worrying about the initial values. Luckily, unlike a computer, you have the entire world around you, and there are lots of options. Maybe look at the seconds hand on a clock, count the number of clouds in the sky, or just throw some dice. Just add the values to the current state, and voila, you've jumped to a completely different section in the output stream. This is especially suggested if you use the (1999,20) Lehmer PRNG, because otherwise you'll always be stuck on only 999 of the possible states.

Indeed, for those of you who aren't happy with the idea of trying to forever keep a 2-4 digit number in your head, you can just use these methods as transformations of common numbers, a substitution cipher for numbers to make them seem more random. Just think of the date/time whenever someone asks you for a random number, use it as your current state, and give the next iteration to as many digits as required. Won't quite be random, but it's probably still more than enough. Happy number crunching!


As an aside, some parapsychologists (both cranks and more reputable researchers, like the Global Consciousness Project) believe that it's possible for people to mentally influence random numbers through quantum effects. I'm not going give my opinion on the validity of those claims, but if you use these PRNGs, because it's completely deterministic, you're safe from their influence!

  1. Wagenaar, W. A. Generation of random sequences by human subjects: A critical survey of literature. Psychological Bulletin, 77, 65-72 (1972) (URI).
  2. Neuringer, A. Can People Behave "Randomly?": The Role of Feedback. Journal of Experimental Psychology, 115, 62-75 (1986).
  3. For crypto, either true random numbers from an external source or specialised cryptographically secure PRNGs are needed. Cryptographically secure PRNG algorithms tend to be far too complicated for mental math, though I was really intrigued by Blum Blum Shub.
  4. That PRNGs always produce the same output for the same seed is exploited by stream ciphers for encryption.
  5. I had wanted to do a Linear Feedback Shift Register as well, but unfortunately those don't work well in base 10, since 10 isn't prime.
  6. I'm simplifying the maths deliberately; Lehmer PRNGs are actually a slightly larger class of methods.
  7. m needs to have high multiplicative order modulo p, and \{x_i\} = \{i\in \mathbb{R}: 1\le i \le p-1\} when p is a primitive root.
  8. Park, S.K. & Miller, K.W. Random number generators: good ones are hard to find. Comm. of the ACM 31, 1192-1201 (1998) (URI).
  9. Thanks to an online post a while back by the late Professor Marsaglia. Unfortunately, I can't seem to readily find the reference anymore. Edit: found by courtesy of Josh Jordan.
  10. It's actually extremely easy to cryptanalyse Lehmer PRNGs, given the right tools, which is why they shouldn't be used for anything secure.

A simple Chrome extension to remove external URL mangling by Facebook

Update 2012-12-09: Facebook changed their URI mangling code, so this probably won't work as written anymore. However, a similar scheme should still work if you take the time to read the FB source code to figure out how to remove the new mangling code. (I no longer actively use FB, so I haven't updated the extension).

One of the thing's that's been irritating me lately about Facebook is it's habit of mangling the URLs of all links going away from the site. The status bar displays the correct URL, for instance

http://www.xkcd.com/

but upon actually clicking the link, some clever javascript substitutes

http://www.facebook.com/l.php?u=http%3A%2F%2Fwww.xkcd.com%2F&h=IAQCKSgbY

which is on Facebook's server, but then immediately redirects the user to XKCD. Facebook is thus able to track the links clicked, which presumably lets them better personalise the News Feed content.

This shouldn't surprise anyone, as most (if not all) of the major Internet players do some sort of click tracking, and Facebook in particular is well known for its stance on privacy. I'm not too bothered about it from a privacy standpoint, as I choose to give away a lot more personal information to Facebook anyways. Indeed, other net giants Google and Yahoo use a similar technique for all of their search results1, and Bing is smart enough to track clicks without resorting to link mangling. Click tracking is an integral part of online advertising, and since I'm using free services, I won't fault a profit-making company for employing it.

However, the difference is that the Facebook's links don't work properly when copied and pasted from one browser instance to another2. They instead send you to a redirection page requiring a manual click through, which is all right every so often but irritating in repetition.

Facebook Redirection Page

Facebook Redirection Page

Luckily, modern browsers come with a wide array of tools, and in this post, I'll be using a simple Chrome extension3 to remove the link mangling, allowing me to copy and paste my links in peace. It also has the side effect of possibly keeping Facebook from tracking my external clicks, but that's another matter entirely4

I should warn you at this point that there will be some minimal amount of coding involved, but if you are using multiple browser instances, than I'll assume you're up to the task. First, you'll need to create a folder to contain the extension files, say FacebookLinkDeMangler. In that folder, create manifest.json as

// manifest.json
// Describes the extension & tells Chrome what scripts to run
{
  "name": "Facebook Link De-Mangler",
  "version": "0.1",
  "description": "Disables Facebook external link mangling",
  "content_scripts": [
    {
      "matches": ["*://*.facebook.com/*"],
      "js": ["jquery-1.6.2.min.js","main.js"],
      "all_frames": true
    }
  ]
}

The first couple of lines are just description; content_scripts is where actual instructions go. In this case, we're telling Chrome that for all pages on Facebook, run jquery-1.6.2.min.js and main.js.

Of course, those two files need to exist for the extension to function properly. First, go and grab yourself the latest version of JQuery, which is a free Javscript library enabling all sorts of cool tricks5. Then just create main.js:

function modifier()
{
// The following line is optional and will highlight all external links in yellow
$("a[href][onmousedown^='UntrustedLink.bootstrap']").css({'background-color': 'yellow'});
$("a[href][onmousedown^='UntrustedLink.bootstrap']").attr('onmousedown','');
}

intID = setInterval("modifier()",5000); 

All of Facebook's links that go to external pages call UntrustedLink.bootstrap in order to alter the URLs when the mouse is clicked on it. The modifier function gets rid of the onmousedown attribute, so the link remains unmangled. It runs every 5 seconds, which is necessary because page content loads dynamically in Facebook, and not just once.

And that's it; I told you the coding was minimal. Of course, the new extension needs to be loaded. To do that, go to the Google Chrome extensions manager. If there's a "+" next to "Developer Mode", you'll need to expand that. Then just "Load Unpacked Extension" and direct Chrome over to the folder you saved the files in (e.g. /path/to/FacebookLinkDeMangler). Now for the final step: copy and paste links from Facebook to your heart's content. That's all folks!

  1. To be fair, Google only mangles links when a user is logged in.
  2. I oftentimes have multiple instances of Chrome and Firefox running to be simultaneously signed in under different accounts.
  3. For those using Firefox, it should be trivial to write a similar script using Greasemonkey, though I haven't tested it myself.
  4. Facebook could also be doing something similar to what Bing does, and do remember that any time you visit a site that's enabled Facebook's Social Plugin, your visit is probably sent to their servers.
  5. If it's a later version of JQuery, the filename in manifest.json will obviously need to be updated to match.