Auto-login to web proxies using NetworkManager

My ISP uses a web proxy that one has to log into to access the Internet. This logging in is a manual, repetitive process, which is easily automatable. So I embarked on a few hour-long project to get to the proxy, supply login credentials and configure NetworkManager to auto-login via running the script each time a connection goes up.

It’s not just ISPs — hotel wifi networks, airport wifis, all use such web-based proxies that one has to login to first before the ‘net becomes accessible. So the steps I followed can be easily followed by others to add support for auto-logging into such web proxies.

I’ll get to the details in a bit, but I’ll first point to the code (licensed under the GPL, v2). It’s written in Python, a language that’s relatively new for me. I’ve written a couple of small programs earlier, but those were just enough to remind me of the syntax; I had to frequently look up the Python docs to get a lot of the details, like interacting with http servers, cookie management, config file management and so on. My C-style writing of the Python script might be evident: it should be possible for someone with more experience in Python to shorten or optimise the script.

My ISP, Tikona Digital Networks, uses a somewhat roundabout way to bring up the login page: for any URL accessed before the proxy login, it first displays an http page that has a redirect URL and a ‘Please wait while login page is loaded’ message. The page to be redirected to is then loaded. This page shows another ‘Please wait’ message, sets a cookie and does a POST action to the real login page after a 5-second timeout. The real login page asks for the username and password. After providing that info, one has to click on the Login button, which translates to a javascript-based POST request, and if the username/password provided match the ones in their database, we’re authenticated to the web proxy. The web proxy doesn’t interfere with any further ‘net access.

Now that I’ve gone through the rough overview of the approach to take, I’ll detail the steps I took to get this script ready:

Step 1: Follow the redirect URL

Open a browser, type in some URL — say ‘www.google.com’. This always resulted in a page that asked me to wait while it went to the login page.

OK, so time for a short python script to check what’s happening:

import urllib

f = urllib.urlopen("http://www.google.com")s = f.read()f.close()

print s

This snippet accesses the google.com website and dumps on the screen the result of the http request.

Here’s the dump that I get before the login.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><title>Please wait while the login page is loaded...</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"/><META HTTP-EQUIV="EXPIRES" CONTENT="-1"/><META HTTP-EQUIV="Refresh" CONTENT="2;URL=https://login.tikona.in/userportal/?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943"></head><body><p align="center">Please wait...<p>Please wait while the login page is loaded...<!---<msc><login_url><![CDATA[https://login.tikona.in/userportal/NSCLOGIN.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></login_url><logout_url><![CDATA[https://login.tikona.in/userportal/NSCLOGOUT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></logout_url><status_url><![CDATA[https://login.tikona.in/userportal/NSCSTATUS.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></status_url><update_url><![CDATA[https://login.tikona.in/userportal/NSCUPDATE.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></update_url><content_url><![CDATA[https://login.tikona.in/userportal/NSCCONTENT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></content_url></msc>-->

</body></html>

This shows there’s a redirect that’ll happen after the timeout (the META HTTP-EQUIV=”Refresh” line). The redirect is to the link shown.

Step 2: Get the redirect link

So now our task is to get the link from the http-equiv header and open that later. Using regular expressions, we can remove the text around the link and just obtain the link:

refresh_url_pattern = "HTTP-EQUIV="Refresh" CONTENT="2;URL=(.*)">"refresh_url = search(refresh_url_pattern, s)

The URL to access is then available in refresh_url.group(1). group(1) contains the matched string in parentheses above in the pattern searched.

Now open the page obtained in the refresh URL:

f = urllib.urlopen(refresh_url.group(1))s = f.read()

s now contains:

<html><head><title>Powered by Inventum</title><SCRIPT>function moveToLogin() {setTimeout("loadForm()",500);}function loadForm(){document.forms[0].action="login.do?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null";document.forms[0].method="post";document.forms[0].submit();}</SCRIPT> </head><body onload="moveToLogin();"><FORM>Loading the login page...</FORM></body></html>

Step 3: Get the base URL, open login page

So this page does an HTTP POST request. The URL of the new page being loaded is relative to the current one, so we have to extract the base URL from the previous redirect URL obtained.

base_url_pattern = "(http.*/)(?.*)$"base_url = search(base_url_pattern, refresh_url.group(0))

The baseurl is then available via base_url.group(1). This regular expression pattern isolates the text before the first ‘?’, as is found in the refresh URL above.

So now we have to load the page login.do which is at address ‘https://login.tikona.in‘ and which is to be passed the parameters ‘?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null‘. This calls for another regular expression by which we can isolate the ‘login.do…‘ part from the ‘action’ part of the POST request above.

load_form_pattern = ".*action="(.*)";"load_form_id = search(load_form_pattern, s)load_form_url = base_url.group(1) + load_form_id.group(1)

load_form_url is now the URL we need to access to get to the login page:

f = urllib.urlopen(load_form_url)s = f.read()

This should get our login page.

But it’s not. After spending some time checking and double-checking what’s happening I couldn’t see anything going wrong. There was just one more thing to try: cookies. I disabled cookies in firefox and tried accessing the page. Voila, no login page.

Step 4: Enable cookie handling

So we now have to enable cookies in our python script to be able to enter login information. The urllib2 and cookielib libraries do that for us, so a slight re-write of the code gets us to this:

import urllib, urllib2, cookielib, ConfigParser, osfrom re import search

cj = cookielib.CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

f = opener.open("http://google.com")s = f.read()

All other open calls (urllib.urlopen) are now replaced by opener.open. This way cookies are handled for the session and the login page appears after accessing the load_form_url:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /><title>Tikona Digital Networks</title><link rel="stylesheet" type="text/css" href="/userportal/pages/css/style.css" /><script language="JavaScript" src="/userportal/pages/js/cookie.js"></script><script language="JavaScript" src="/userportal/pages/js/common.js"></script>

</head>

<body><form name="form1"><div id="wrap"><div class="background_login"><div class="logo_header"><div class="logoimg"><img src="/userportal/pages/images/logo.jpg" alt="Tikona Digital Networks" /></div><div class="sitelink"><a href="http://www.tikona.in" target="_blank">www.tikona.in</a></div>

</div><div class="clear"></div><div class="login_box"><div id="right_curved_block"><div class="blue_head"><div class="blue_head_right"><div class="blue_head_left">&nbsp;</div><div class="hdng">Login</div></div></div>

<div class="clear"></div><div class="block_content"><div class="form"><table height="100%" border="0" cellpadding="0" cellspacing="0">

<tr>

<td width="126"><label>Service Type</label></td><td width="200" align="left" valign="middle"><select name="type"><option value="1">Check Account Details</option>

<option value="2" selected="selected">Internet Access</option></select></td>

</tr><tr><td width="126"><label>User Name</label></td><td width="200" align="left" valign="middle"><input type="text" name="username" value="" class="logintext">

</td></tr><tr><td width="126"><label>Password</label></td><td width="200" align="left" valign="middle"><input type="password" name="password" value="" class="loginpassword"></td></tr><tr><td width="126"><label>Remember me</label></td>

<td width="200" align="left" valign="middle"><div style=" width:30%; float:left;"><input name="remeberme" id="rememberme" type="checkbox" class="checkbox"/></div><div style=" width:70%; float:right;"><a href="javascript:savesettings()"><img src="/userportal/pages/images/login.gif" alt="" width="117" height="30" hspace="0" vspace="0" border="0" align="right" /></a></div></td></tr></table></div></div><div class="clear"> </div><div class="white_bottom"></div></div></div>

<div class="tips_box"><div class="v_box"><div id="tips_block"><div class="white_head_v"><div class="blue_head_right"><div class="white_head_left_v">&nbsp;</div><div class="wbs_version">&nbsp;</div></div></div><div class="clear"></div><div class="block_content"><div class="scrol"><h1>Importance of Billing Account Number</h1><br />

<font size="2"><ul><li>Billing Account Number (BAN) is a 9 digit unique identification number of your Tikona Wi-Bro service billaccount. It is mentioned below your name and address in the bill.</li><li>Bill payments done through cheque or demand draft should mandatorily have BAN mentioned on them. <br /><span style="color:#558ed5">Example:</span> Cheque or demand draft should be drawn in the name of &lsquo;Tikona Digital Networks Pvt. Ltd. a/c xxx xxx xxx&rsquo;. Here &lsquo;xxx xxx xxx&rsquo; denotes your BAN.</li><li>If the BAN is not mentioned or incorrectly mentioned on the cheque or demand draft, the bill amount does not get credited against your Tikona Wi-Bro service account.</li><li>In case you have paid bill through cheque or demand draft without mentioning BAN on it and the amount is not credited to your Tikona billing account, then please contact TikonaCare at 1800 20 94276. Kindly furnish your cheque number, service ID, BAN and bank statement for payment verification.</li></ul>    </font><br /></div></div><div class="clear"> </div><div class="white_bottom">&nbsp;</div></div></div>

</div><div style="padding:110px 0 0 0; float:left; width:100%;"><div class="helpline">Tikona Care: 1800 20 94276  | <a href="mailto:customercare@tikona.in">customercare@tikona.in</a></div></div>

<div class="footer_line">&nbsp;</div><div class="footer_blueline"></div>

<div class="footer">Copyright &copy; 2009. Tikona Digital Networks. All right Reserved.</div></div></div><input type="hidden" name="act" value="null"></form></body></html>

Step 5: Login

OK, this page doesn’t say what exactly to do after the username/password is entered. There’s no POST action. Instead, what they do is call the saveettings() function on clicking of the login.gif image. saveettings() is in the cookie.js file:

function savesettings(){

if (document.forms[0].rememberme.checked){ createCookie('nasusername',document.forms[0].username.value,2);createCookie('type',document.forms[0].type.value,2);createCookie('nasrememberme',1,2);

}else{eraseCookie('nasusername');eraseCookie('type');eraseCookie('nasrememberme');}document.forms[0].action = "newlogin.do?phone=0";document.forms[0].method = "post";document.forms[0].submit();return true;      }

OK, so the page ‘newlogin.do’ is to be opened as a response to the clicking of the login button. And the username and password info has to be passed along, of course.

We already have the base url for the login page that we just used. Now we have to combine the base url with the ‘newlogin.do’ page instead of the ‘login.do’ page that we accessed earlier:

login_form_id = "newlogin.do?phone=0"type = "2"

login_form_url = base_url.group(1) + login_form_id

login_data = urllib.urlencode({'username': username, 'password': password,'type': type})

f = opener.open(login_form_url, login_data)

… and success! This is enough to get the login done. I added config file handling to the final code so that the username/password are stored in a config file. The final code also ensures that we’re on a Tikona network before proceeding with the steps of logging in (by checking if the redirect URL is obtained in Step 1). See the latest code here.

Step 6: Auto-login on successful connection

Just one last step remains: a NetworkManager dispatcher script that will invoke this login program each time a network becomes ready:

#!/bin/sh

if [ "$2" = "up" ]; then/home/amit/bin/tikona-auto-login || :fi

Put this in /etc/NetworkManager/dispatcher.d with the appropriate permissions (744) and we’re good to go!

Next steps:
The project surely isn’t complete: a lot of support has to be added to NetworkManager itself to present a good UI to enable/disable these dispatcher scripts and also to prompt for a username/password instead of storing in a config file. This and several other TODO items are listed in the README file. If you plan on adding new networks that can be auto-logged in to, it’s easy to follow these steps or feel free to email me for guidance.

Communication between Guests and Hosts

Guest and Host communication should be a simple affair — the venerable TCP/IP sockets should be the first answer to any remote communication.  However, it’s not so simple once some special virtualisation-related constraints are added to the mix:

  • the guest and host are different machines, managed differently
  • the guest administrator and the host administrator may be different people
  • the guest administrator might inadvertently block IP-based communication channels to the host via firewall rules, rendering the TCP/IP-based communication channels unusable

The last point needs some elaboration: system administrators want to be really conservative in what they “open” to the outside world.  In this sense, the guest and host administrators are actively hostile to each other.  Also, rightly, neither should trust each other, given that a lot of the data stored in operating systems are now stored within clouds and any leak of the data could prove disastrous to the administrators and their employers.

So what’s really needed is a special communication channel between guests and hosts that are not susceptible to being blocked out by guests or hosts as well as being a very special-purpose low-bandwidth channel that doesn’t look to re-implement TCP/IP.  Some other requirements are mentioned on this page.

After several iterations, we settled on one particular implementation: virtio-serial.  The virtio-serial infrastructure rides on top of virtio, a generic para-virtual bus that enables exposing custom devices to guests.  virtio devices are abstracted enough so that guest drivers need not know what kind of bus they’re actually riding on: they are PCI devices on x86 and native devices on s390 under the hood.  What this means is the same guest driver can be used to communicate with a virtio-serial device under x86 as well as s390.  Behind the scenes, the virtio layer, depending on the guest architecture type, works with the host virtio-pci device or virtio-s390 device.

The host device is coded in qemu.  One host virtio-serial device is capable of hosting multiple channels or ports on the same device.  The number of ports that can ride on top of a virtio-serial device is currently arbitrarily limited to 31, but one device can very well support 2^31 ports.  The device is available since upstream qemu release 0.13 as well as in Fedora from release 13 onwards.

The guest driver is written for Linux and Windows guests.  The API exposed includes open, read, write, poll, close calls.  For the Linux guest, ports can be opened in blocking as well as non-blocking modes.  The driver is included upstream from Linux kernel version 2.6.35.  Kernel 2.6.37 will also have asynchronous IO support — ie, SIGIO will be delivered to interested userspace apps whenever the host-side connection is established or closed, or when a port gets hot-unplugged.

Using the ports is simple: when using qemu from the command line directly, add:

-chardev socket,path=/tmp/port0,server,nowait,id=port0-char
-device virtio-serial
-device virtserialport,id=port1,name=org.fedoraproject.port.0,chardev=port0-char

this creates one device with one port and exposes to the guest the name ‘org.fedoraproject.port.0‘.  Guest apps can then open /dev/virtio-ports/org.fedoraproject.port.0 and start communicating with the host.  Host apps can open the /tmp/port0 unix domain socket to communicate with the guest.  Of course, there are other qemu chardev backends that can be used other than unix domain sockets.  There also is an in-qemu API that can be used.

More invocation options and examples are given in the invocation and how to test sections. 

There is sample C code for the guest as well as sample python code from the test suites.  The original test suite, written to verify the functionality of the user-kernel interface, will in the near future be moved to autotest, enabling faster addition of more tests and tests that not just check for correctness, but also regressions and bugs.

virtio-serial is already in use by the Matahari, Spice, libguestfs and Anaconda projects.  I’ll briefly mention how Anaconda is going to use virtio-serial: starting Fedora 14, guest installs of Fedora will automatically send Anaconda logs to the host if a virtio-serial port with the name of ‘org.fedoraproject.anaconda.log.0‘ is found.  virt-install is modified to create such a virtio-serial port.  This means debugging early anaconda output will be easier with the logs available on the host (and not worrying about guest file system corruptions during install or network drivers not available before a crash).

Further use: There are many more uses of virtio-serial, which should be pretty easy to code:

  • shutting down or suspending VMs when a host is shut down
  • clipboard copy/paste between hosts and guests (this is under progress  by the Spice team)
  • lock a desktop session in the guest when a vnc/spice connection is closed
  • fetch cpu/memory/power usage rates at regular intervals for monitoring