Auto-login to web proxies using NetworkManager

My ISP uses a web proxy that one has to log into to access the Internet. This logging in is a manual, repetitive process, which is easily automatable. So I embarked on a few hour-long project to get to the proxy, supply login credentials and configure NetworkManager to auto-login via running the script each time a connection goes up.

It's not just ISPs -- hotel wifi networks, airport wifis, all use such web-based proxies that one has to login to first before the 'net becomes accessible. So the steps I followed can be easily followed by others to add support for auto-logging into such web proxies.

I'll get to the details in a bit, but I'll first point to the code (licensed under the GPL, v2). It's written in Python, a language that's relatively new for me. I've written a couple of small programs earlier, but those were just enough to remind me of the syntax; I had to frequently look up the Python docs to get a lot of the details, like interacting with http servers, cookie management, config file management and so on. My C-style writing of the Python script might be evident: it should be possible for someone with more experience in Python to shorten or optimise the script.

My ISP, Tikona Digital Networks, uses a somewhat roundabout way to bring up the login page: for any URL accessed before the proxy login, it first displays an http page that has a redirect URL and a 'Please wait while login page is loaded' message. The page to be redirected to is then loaded. This page shows another 'Please wait' message, sets a cookie and does a POST action to the real login page after a 5-second timeout. The real login page asks for the username and password. After providing that info, one has to click on the Login button, which translates to a javascript-based POST request, and if the username/password provided match the ones in their database, we're authenticated to the web proxy. The web proxy doesn't interfere with any further 'net access.

Now that I've gone through the rough overview of the approach to take, I'll detail the steps I took to get this script ready:

Step 1: Follow the redirect URL

Open a browser, type in some URL -- say 'www.google.com'. This always resulted in a page that asked me to wait while it went to the login page.

OK, so time for a short python script to check what's happening:

import urllib

f = urllib.urlopen("http://www.google.com")
s = f.read()
f.close()

print s

This snippet accesses the google.com website and dumps on the screen the result of the http request.

Here's the dump that I get before the login.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Please wait while the login page is loaded...</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"/>
<META HTTP-EQUIV="EXPIRES" CONTENT="-1"/>
<META HTTP-EQUIV="Refresh" CONTENT="2;URL=https://login.tikona.in/userportal/?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943">
</head>
<body>
<p align="center">Please wait...<p>
Please wait while the login page is loaded...
<!---
<msc>
<login_url><![CDATA[https://login.tikona.in/userportal/NSCLOGIN.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></login_url>
<logout_url><![CDATA[https://login.tikona.in/userportal/NSCLOGOUT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></logout_url>
<status_url><![CDATA[https://login.tikona.in/userportal/NSCSTATUS.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></status_url>
<update_url><![CDATA[https://login.tikona.in/userportal/NSCUPDATE.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></update_url>
<content_url><![CDATA[https://login.tikona.in/userportal/NSCCONTENT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></content_url>
</msc>
-->

</body>
</html>

This shows there's a redirect that'll happen after the timeout (the META HTTP-EQUIV="Refresh" line). The redirect is to the link shown.

Step 2: Get the redirect link

So now our task is to get the link from the http-equiv header and open that later. Using regular expressions, we can remove the text around the link and just obtain the link:

refresh_url_pattern = "HTTP-EQUIV="Refresh" CONTENT="2;URL=(.*)">"
refresh_url = search(refresh_url_pattern, s)

The URL to access is then available in refresh_url.group(1). group(1) contains the matched string in parentheses above in the pattern searched.

Now open the page obtained in the refresh URL:

f = urllib.urlopen(refresh_url.group(1))
s = f.read()

s now contains:

<html>
<head>
<title>Powered by Inventum</title>
<SCRIPT>
function moveToLogin() {
setTimeout("loadForm()",500);
}
function loadForm(){
document.forms[0].action="login.do?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null";
document.forms[0].method="post";
document.forms[0].submit();
}
</SCRIPT>
</head>
<body onload="moveToLogin();">
<FORM>
Loading the login page...
</FORM>
</body>
</html>

Step 3: Get the base URL, open login page

So this page does an HTTP POST request. The URL of the new page being loaded is relative to the current one, so we have to extract the base URL from the previous redirect URL obtained.

base_url_pattern = "(http.*/)(?.*)$"
base_url = search(base_url_pattern, refresh_url.group(0))

The baseurl is then available via base_url.group(1). This regular expression pattern isolates the text before the first '?', as is found in the refresh URL above.

So now we have to load the page login.do which is at address 'https://login.tikona.in' and which is to be passed the parameters '?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null'. This calls for another regular expression by which we can isolate the 'login.do...' part from the 'action' part of the POST request above.

load_form_pattern = ".*action="(.*)";"
load_form_id = search(load_form_pattern, s)
load_form_url = base_url.group(1) + load_form_id.group(1)

load_form_url is now the URL we need to access to get to the login page:

f = urllib.urlopen(load_form_url)
s = f.read()

This should get our login page.

But it's not. After spending some time checking and double-checking what's happening I couldn't see anything going wrong. There was just one more thing to try: cookies. I disabled cookies in firefox and tried accessing the page. Voila, no login page.

Step 4: Enable cookie handling

So we now have to enable cookies in our python script to be able to enter login information. The urllib2 and cookielib libraries do that for us, so a slight re-write of the code gets us to this:

import urllib, urllib2, cookielib, ConfigParser, os
from re import search

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

f = opener.open("http://google.com")
s = f.read()

All other open calls (urllib.urlopen) are now replaced by opener.open. This way cookies are handled for the session and the login page appears after accessing the load_form_url:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Tikona Digital Networks</title>
<link rel="stylesheet" type="text/css" href="/userportal/pages/css/style.css" />
<script language="JavaScript" src="/userportal/pages/js/cookie.js"></script>
<script language="JavaScript" src="/userportal/pages/js/common.js"></script>

</head>

<body>
<form name="form1">
<div id="wrap">
<div class="background_login">
<div class="logo_header">
<div class="logoimg"><img src="/userportal/pages/images/logo.jpg" alt="Tikona Digital Networks" /></div>
<div class="sitelink"><a href="http://www.tikona.in" target="_blank">www.tikona.in</a></div>

</div>
<div class="clear"></div>
<div class="login_box">
<div id="right_curved_block">
<div class="blue_head">
<div class="blue_head_right">
<div class="blue_head_left">&nbsp;</div>
<div class="hdng">Login</div>
</div>
</div>

<div class="clear"></div>
<div class="block_content">
<div class="form">
<table height="100%" border="0" cellpadding="0" cellspacing="0">

<tr>


<td width="126"><label>Service Type</label></td>
<td width="200" align="left" valign="middle">
<select name="type"><option value="1">Check Account Details</option>

<option value="2" selected="selected">Internet Access</option></select>
</td>

</tr>
<tr>
<td width="126"><label>User Name</label></td>
<td width="200" align="left" valign="middle"><input type="text" name="username" value="" class="logintext">

</td>
</tr>
<tr>
<td width="126"><label>Password</label></td>
<td width="200" align="left" valign="middle"><input type="password" name="password" value="" class="loginpassword"></td>
</tr>
<tr>
<td width="126"><label>Remember me</label></td>

<td width="200" align="left" valign="middle"><div style=" width:30%; float:left;"><input name="remeberme" id="rememberme" type="checkbox" class="checkbox"/></div>
<div style=" width:70%; float:right;"><a href="javascript:savesettings()"><img src="/userportal/pages/images/login.gif" alt="" width="117" height="30" hspace="0" vspace="0" border="0" align="right" /></a></div></td>
</tr>
</table>
</div>
</div>
<div class="clear"> </div>
<div class="white_bottom">
</div>
</div>
</div>

<div class="tips_box">
<div class="v_box">
<div id="tips_block">
<div class="white_head_v">
<div class="blue_head_right">
<div class="white_head_left_v">&nbsp;</div>
<div class="wbs_version">&nbsp;</div>
</div>
</div>
<div class="clear"></div>
<div class="block_content">
<div class="scrol">
<h1>Importance of Billing Account Number</h1>
<br />

<font size="2">
<ul>
<li>Billing Account Number (BAN) is a 9 digit unique identification number of your Tikona Wi-Bro service bill
account. It is mentioned below your name and address in the bill.</li>
<li>Bill payments done through cheque or demand draft should mandatorily have BAN mentioned on them. <br />
<span style="color:#558ed5">Example:</span> Cheque or demand draft should be drawn in the name of &lsquo;
Tikona Digital Networks Pvt. Ltd. a/c xxx xxx xxx&rsquo;. Here &lsquo;xxx xxx xxx&rsquo; denotes your BAN.
</li>
<li>If the BAN is not mentioned or incorrectly mentioned on the cheque or demand draft, the bill amount does
not get credited against your Tikona Wi-Bro service account.</li>
<li>In case you have paid bill through cheque or demand draft without mentioning BAN on it and the amount is
not credited to your Tikona billing account, then please contact TikonaCare at 1800 20 94276. Kindly furnish
your cheque number, service ID, BAN and bank statement for payment verification.</li>
</ul>
</font><br />
</div>
</div>
<div class="clear"> </div>
<div class="white_bottom">&nbsp;</div>
</div>
</div>

</div>
<div style="padding:110px 0 0 0; float:left; width:100%;">
<div class="helpline">
Tikona Care: 1800 20 94276 | <a href="mailto:customercare@tikona.in">customercare@tikona.in</a></div>
</div>

<div class="footer_line">&nbsp;</div>
<div class="footer_blueline"></div>

<div class="footer">
Copyright &copy; 2009. Tikona Digital Networks. All right Reserved.
</div>
</div>
</div>
<input type="hidden" name="act" value="null">
</form>
</body>
</html>

Step 5: Login

OK, this page doesn't say what exactly to do after the username/password is entered. There's no POST action. Instead, what they do is call the saveettings() function on clicking of the login.gif image. saveettings() is in the cookie.js file:

function savesettings()
{

if (document.forms[0].rememberme.checked)
{
createCookie('nasusername',document.forms[0].username.value,2);
createCookie('type',document.forms[0].type.value,2);
createCookie('nasrememberme',1,2);

}
else{
eraseCookie('nasusername');
eraseCookie('type');
eraseCookie('nasrememberme');
}
document.forms[0].action = "newlogin.do?phone=0";
document.forms[0].method = "post";
document.forms[0].submit();
return true;
}

OK, so the page 'newlogin.do' is to be opened as a response to the clicking of the login button. And the username and password info has to be passed along, of course.

We already have the base url for the login page that we just used. Now we have to combine the base url with the 'newlogin.do' page instead of the 'login.do' page that we accessed earlier:

login_form_id = "newlogin.do?phone=0"
type = "2"

login_form_url = base_url.group(1) + login_form_id

login_data = urllib.urlencode({'username': username, 'password': password,
'type': type})

f = opener.open(login_form_url, login_data)

... and success! This is enough to get the login done. I added config file handling to the final code so that the username/password are stored in a config file. The final code also ensures that we're on a Tikona network before proceeding with the steps of logging in (by checking if the redirect URL is obtained in Step 1). See the latest code here.

Step 6: Auto-login on successful connection

Just one last step remains: a NetworkManager dispatcher script that will invoke this login program each time a network becomes ready:

#!/bin/sh

if [ "$2" = "up" ]; then
/home/amit/bin/tikona-auto-login || :
fi

Put this in /etc/NetworkManager/dispatcher.d with the appropriate permissions (744) and we're good to go!

Next steps:
The project surely isn't complete: a lot of support has to be added to NetworkManager itself to present a good UI to enable/disable these dispatcher scripts and also to prompt for a username/password instead of storing in a config file. This and several other TODO items are listed in the README file. If you plan on adding new networks that can be auto-logged in to, it's easy to follow these steps or feel free to email me for guidance.