Tag Archives: free software

Fedora Miniconf and foss.in/2010

A very delayed post on the Fedora Miniconf and foss.in/2010.

foss.in/2010 was held on the 15th, 16th and 17th of this month in Bengaluru. I could confirm my attendance very late, so I missed out on the CfP and a chance at speaking in the main conference, but I could manage to get a speaking slot in the Fedora miniconf. Thanks to Rahul for accomodating me at a short notice.

One of the main things I was looking forward to was meeting my team-mate Juan Quintela. Though we met recently at the KVM Forum 2010, I was going to use this opportunity to catch him and discuss some of the things I’m working on that overlap with his domain, virtual machine live migration, and get things going.

The other thing was to get to know more people — Fedora users and developers from India who I’ve spoken with on the irc channel but not met, other developers and users of free software from around the world. Add to that a few people who I’ve worked with and not met and also people whose software I use daily and who I want to thank for working on what they do.  It was also nice meeting the old known faces from the IBM LTC in Bengaluru — Balbir Singh, Kamalesh Babulal, Vaidy, Aneesh K. V., et al.

It’s always a certainty that there will be users of virtualization (particularly kvm) stack and it’s nice to get a feel of how many people are using kvm, in what ways, how well it works for them, and so on. That’s always a motivation.

The Fedora miniconf was on the 16th. The schedules for talks for miniconfs aren’t published by the foss.in people, so it was left to us to do our advertising and crowd-pulling. Rahul had listed the speakers and the talks on the Fedora foss.in/2010 wiki page. I went ahead and took out a few print-outs for the talks and assigned time slots for each talk depending on the suggested length given by the speakers for their talks as well as the slot allotted to the Fedora Project for the miniconf. The print-outs of the schedules were meant to be pasted around the venue to attract attention to the remotest section that was to host the miniconf, Hall C. However, we just ended up keeping the printouts as handouts at the Fedora stall that we set up. The Fedora stall was quite a crowd-puller. And since it was set up on the second day, we didn’t have to compete with the other stalls since they had their share of attendance on the first day.

The other members of the Fedora crowd, Rahul, Saleem, Arun, Shreyank, Aditya, Suchakra, Siddhesh, Neependra, … have written about the Fedora stall and their experiences earlier (and linked to from the Fedora foss.in/2010 page).

The Fedora miniconf was a great success, going by the attendance and the participation we had. My talk was the first, and I could see we had a full house. I think my talk went quite well. It could have been a little disappointing for people who expected demos, but I wanted to aim this talk towards people who had a general sense of using and deploying Fedora virt as well as Fedora on the cloud and also at people who would go and do stuff themselves rather than being given everything on a silver platter. This does resonate also with the foss.in philosophy of recent years of being a contributor-oriented conference rather than a user-originted one, so I didn’t mind doing that. Gauging by the response I got after the talk, I believe I was right in doing that. (I even got one email mentioning it was a great talk by the CEO of a company).

The other talks from the Fedora miniconf were engaging, I learnt quite a bit from what the others are up to. Arun’s talk on packaging emacs extensions was entertaining. He connects with the audience, I liked that about him.

Aditya’s talk on Fedora Summer Coding was a good call to students to participate in the free software world via Fedora’s internship programme. He narrated his own experience as a Fedora Project intern, which touches the right chords of the intended audience. I think doing more such talks will get him over the jitters of presenting to a big crowd.

Suchakra’s doing good work on accessing an embedded Linux box via a console inside a browser tab — it’s a very interesting project.

Neependra’s talk was a good walk-through of using tracing commands to see what really happens in the kernel when a userspace program runs. He walked through the ‘mkdir’ command and showed the call trace. This was a good demo. He spoke about the various situations in which tracing tools could be used, not just for debugging, and that should have set people’s thoughts in motion as to how they could get more information on how the system behaves instead of just using a system.

Shreyank’s talk on creating a web tool for managing student projects and the Fedora Summer of Code was interesting as well. It was nice to see the way an actual student project was designed and developed and how it’s going to make future students’ and mentors’ lives easier. This talk should have served as a good introduction to the flow and process students have to go through in applying, starting, reviewing and completing their project.

Apart from the Fedora miniconf, I attended a few sessions in the main conf. James Morris’s keynote on the history of the security subsytem in the Linux kernel was very informative. Rahul’s keynote on the ‘Failures of Fedora‘ was totally packed with anecdotes and analyses of the decisions taken by the Fedora project and their impact on the users and developers. Fedora (earlier Red Hat Linux) is one of the oldest distributions around, and any insights into the functioning and data as to what works and what does not is a great source of information to look for building engaging communities of users and contributors.

Lennart‘s two talks on systemd and the state of surround sound on Linux were not very new to me. However, there were a few bits in there that provided some food for thought.

Juan‘s talk on live migration was packed full of experiences in getting qemu to a state where migration works fairly well. He also spoke about all the work that’s left to do. It was totally technical and I think the people who were misguided by it being labelled as a ‘sysadmin’ talk or by the title (expecting to migrate from an older physical machine to a newer physical machine w/o downtime) quickly left the hall. Whoever stayed back were either people who work on QEMU/KVM (esp. the folks from the IBM LTC) or people too polite to walk out.

Dimitris Glezos‘s talk on building large-scale web applications was a very informative one for me. I’ve never done web programming (except for html, css and a bit of php ages ago), and this was a good intro for me to understand what various web development frameworks there are, their pros and cons, the way to deploy them, the way to structure them, etc. It was evident he took a lot of effort to prepare the slides and the talk, it was totally worth it.

Danese Cooper‘s keynote on the Wikimedia Foundation was an equally informative talk. She spoke on a wide range of topics, including the team that makes up Wikimedia, their servers and datacentres, their load balancing strategy, their backup systems, their editing process, their localisation efforts, their search for a new mirror site in the APAC region, etc. I was interested in one aspect, machine-readable wikipedia content, to which they had a satisfactory answer: they’re migrating to semantic web content and would look at a machine-readable API once they’re done adding semantics to their content.

The other time was spent at the Fedora booth and talking to Juan and the other friends.

The foss.in team announced this would be the last foss.in, so thanks to them for hanging around so long. To fill the void, we’re going to have to step up and organise a platform for like-minded people from the free/open source software community around here. I’ve been part of organising some events earlier in different capacities, and I’m looking forward to being part of an effort that provides such a platform. There’s a FUDCon being planned for next year in Pune, I’ll be involved in it, and will take things along from there.

Auto-login to web proxies using NetworkManager

My ISP uses a web proxy that one has to log into to access the Internet. This logging in is a manual, repetitive process, which is easily automatable. So I embarked on a few hour-long project to get to the proxy, supply login credentials and configure NetworkManager to auto-login via running the script each time a connection goes up.

It’s not just ISPs — hotel wifi networks, airport wifis, all use such web-based proxies that one has to login to first before the ‘net becomes accessible. So the steps I followed can be easily followed by others to add support for auto-logging into such web proxies.

I’ll get to the details in a bit, but I’ll first point to the code (licensed under the GPL, v2). It’s written in Python, a language that’s relatively new for me. I’ve written a couple of small programs earlier, but those were just enough to remind me of the syntax; I had to frequently look up the Python docs to get a lot of the details, like interacting with http servers, cookie management, config file management and so on. My C-style writing of the Python script might be evident: it should be possible for someone with more experience in Python to shorten or optimise the script.

My ISP, Tikona Digital Networks, uses a somewhat roundabout way to bring up the login page: for any URL accessed before the proxy login, it first displays an http page that has a redirect URL and a ‘Please wait while login page is loaded’ message. The page to be redirected to is then loaded. This page shows another ‘Please wait’ message, sets a cookie and does a POST action to the real login page after a 5-second timeout. The real login page asks for the username and password. After providing that info, one has to click on the Login button, which translates to a javascript-based POST request, and if the username/password provided match the ones in their database, we’re authenticated to the web proxy. The web proxy doesn’t interfere with any further ‘net access.

Now that I’ve gone through the rough overview of the approach to take, I’ll detail the steps I took to get this script ready:

Step 1: Follow the redirect URL

Open a browser, type in some URL — say ‘www.google.com’. This always resulted in a page that asked me to wait while it went to the login page.

OK, so time for a short python script to check what’s happening:

import urllib

f = urllib.urlopen("http://www.google.com")
s = f.read()
f.close()

print s

This snippet accesses the google.com website and dumps on the screen the result of the http request.

Here’s the dump that I get before the login.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Please wait while the login page is loaded...</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"/>
<META HTTP-EQUIV="EXPIRES" CONTENT="-1"/>
<META HTTP-EQUIV="Refresh" CONTENT="2;URL=https://login.tikona.in/userportal/?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943">
</head>
<body>
<p align="center">Please wait...<p>
Please wait while the login page is loaded...
<!---
<msc>
<login_url><![CDATA[https://login.tikona.in/userportal/NSCLOGIN.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></login_url>
<logout_url><![CDATA[https://login.tikona.in/userportal/NSCLOGOUT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></logout_url>
<status_url><![CDATA[https://login.tikona.in/userportal/NSCSTATUS.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></status_url>
<update_url><![CDATA[https://login.tikona.in/userportal/NSCUPDATE.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></update_url>
<content_url><![CDATA[https://login.tikona.in/userportal/NSCCONTENT.do?requesturi=http%3a%2f%2fgoogle%2ecom%2f&ip=113%2e193%2e150%2e95&mac=00%3a16%3a01%3a8e%3a06%3a92&nas=tikonapune&requestip=google%2ecom&sc=5a54aa1fd2de7a9c2b92a865de55b943]]></content_url>
</msc>
-->

</body>
</html>

This shows there’s a redirect that’ll happen after the timeout (the META HTTP-EQUIV=”Refresh” line). The redirect is to the link shown.

Step 2: Get the redirect link

So now our task is to get the link from the http-equiv header and open that later. Using regular expressions, we can remove the text around the link and just obtain the link:

refresh_url_pattern = "HTTP-EQUIV="Refresh" CONTENT="2;URL=(.*)">"
refresh_url = search(refresh_url_pattern, s)

The URL to access is then available in refresh_url.group(1). group(1) contains the matched string in parentheses above in the pattern searched.

Now open the page obtained in the refresh URL:

f = urllib.urlopen(refresh_url.group(1))
s = f.read()

s now contains:

<html>
<head>
<title>Powered by Inventum</title>
<SCRIPT>
function moveToLogin() {
setTimeout("loadForm()",500);
}
function loadForm(){
document.forms[0].action="login.do?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null";
document.forms[0].method="post";
document.forms[0].submit();
}
</SCRIPT>
</head>
<body onload="moveToLogin();">
<FORM>
Loading the login page...
</FORM>
</body>
</html>

Step 3: Get the base URL, open login page

So this page does an HTTP POST request. The URL of the new page being loaded is relative to the current one, so we have to extract the base URL from the previous redirect URL obtained.

base_url_pattern = "(http.*/)(?.*)$"
base_url = search(base_url_pattern, refresh_url.group(0))

The baseurl is then available via base_url.group(1). This regular expression pattern isolates the text before the first ‘?’, as is found in the refresh URL above.

So now we have to load the page login.do which is at address ‘https://login.tikona.in‘ and which is to be passed the parameters ‘?requesturi=http%3A%2F%2Fgoogle.com%2F&act=null‘. This calls for another regular expression by which we can isolate the ‘login.do…‘ part from the ‘action’ part of the POST request above.

load_form_pattern = ".*action="(.*)";"
load_form_id = search(load_form_pattern, s)
load_form_url = base_url.group(1) + load_form_id.group(1)

load_form_url is now the URL we need to access to get to the login page:

f = urllib.urlopen(load_form_url)
s = f.read()

This should get our login page.

But it’s not. After spending some time checking and double-checking what’s happening I couldn’t see anything going wrong. There was just one more thing to try: cookies. I disabled cookies in firefox and tried accessing the page. Voila, no login page.

Step 4: Enable cookie handling

So we now have to enable cookies in our python script to be able to enter login information. The urllib2 and cookielib libraries do that for us, so a slight re-write of the code gets us to this:

import urllib, urllib2, cookielib, ConfigParser, os
from re import search

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

f = opener.open("http://google.com")
s = f.read()

All other open calls (urllib.urlopen) are now replaced by opener.open. This way cookies are handled for the session and the login page appears after accessing the load_form_url:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Tikona Digital Networks</title>
<link rel="stylesheet" type="text/css" href="/userportal/pages/css/style.css" />
<script language="JavaScript" src="/userportal/pages/js/cookie.js"></script>
<script language="JavaScript" src="/userportal/pages/js/common.js"></script>

</head>

<body>
<form name="form1">
<div id="wrap">
<div class="background_login">
<div class="logo_header">
<div class="logoimg"><img src="/userportal/pages/images/logo.jpg" alt="Tikona Digital Networks" /></div>
<div class="sitelink"><a href="http://www.tikona.in" target="_blank">www.tikona.in</a></div>

</div>
<div class="clear"></div>
<div class="login_box">
<div id="right_curved_block">
<div class="blue_head">
<div class="blue_head_right">
<div class="blue_head_left">&nbsp;</div>
<div class="hdng">Login</div>
</div>
</div>

<div class="clear"></div>
<div class="block_content">
<div class="form">
<table height="100%" border="0" cellpadding="0" cellspacing="0">

<tr>


<td width="126"><label>Service Type</label></td>
<td width="200" align="left" valign="middle">
<select name="type"><option value="1">Check Account Details</option>

<option value="2" selected="selected">Internet Access</option></select>
</td>

</tr>
<tr>
<td width="126"><label>User Name</label></td>
<td width="200" align="left" valign="middle"><input type="text" name="username" value="" class="logintext">

</td>
</tr>
<tr>
<td width="126"><label>Password</label></td>
<td width="200" align="left" valign="middle"><input type="password" name="password" value="" class="loginpassword"></td>
</tr>
<tr>
<td width="126"><label>Remember me</label></td>

<td width="200" align="left" valign="middle"><div style=" width:30%; float:left;"><input name="remeberme" id="rememberme" type="checkbox" class="checkbox"/></div>
<div style=" width:70%; float:right;"><a href="javascript:savesettings()"><img src="/userportal/pages/images/login.gif" alt="" width="117" height="30" hspace="0" vspace="0" border="0" align="right" /></a></div></td>
</tr>
</table>
</div>
</div>
<div class="clear"> </div>
<div class="white_bottom">
</div>
</div>
</div>

<div class="tips_box">
<div class="v_box">
<div id="tips_block">
<div class="white_head_v">
<div class="blue_head_right">
<div class="white_head_left_v">&nbsp;</div>
<div class="wbs_version">&nbsp;</div>
</div>
</div>
<div class="clear"></div>
<div class="block_content">
<div class="scrol">
<h1>Importance of Billing Account Number</h1>
<br />

<font size="2">
<ul>
<li>Billing Account Number (BAN) is a 9 digit unique identification number of your Tikona Wi-Bro service bill
account. It is mentioned below your name and address in the bill.</li>
<li>Bill payments done through cheque or demand draft should mandatorily have BAN mentioned on them. <br />
<span style="color:#558ed5">Example:</span> Cheque or demand draft should be drawn in the name of &lsquo;
Tikona Digital Networks Pvt. Ltd. a/c xxx xxx xxx&rsquo;. Here &lsquo;xxx xxx xxx&rsquo; denotes your BAN.
</li>
<li>If the BAN is not mentioned or incorrectly mentioned on the cheque or demand draft, the bill amount does
not get credited against your Tikona Wi-Bro service account.</li>
<li>In case you have paid bill through cheque or demand draft without mentioning BAN on it and the amount is
not credited to your Tikona billing account, then please contact TikonaCare at 1800 20 94276. Kindly furnish
your cheque number, service ID, BAN and bank statement for payment verification.</li>
</ul>
</font><br />
</div>
</div>
<div class="clear"> </div>
<div class="white_bottom">&nbsp;</div>
</div>
</div>

</div>
<div style="padding:110px 0 0 0; float:left; width:100%;">
<div class="helpline">
Tikona Care: 1800 20 94276 | <a href="mailto:customercare@tikona.in">customercare@tikona.in</a></div>
</div>

<div class="footer_line">&nbsp;</div>
<div class="footer_blueline"></div>

<div class="footer">
Copyright &copy; 2009. Tikona Digital Networks. All right Reserved.
</div>
</div>
</div>
<input type="hidden" name="act" value="null">
</form>
</body>
</html>

Step 5: Login

OK, this page doesn’t say what exactly to do after the username/password is entered. There’s no POST action. Instead, what they do is call the saveettings() function on clicking of the login.gif image. saveettings() is in the cookie.js file:

function savesettings()
{

if (document.forms[0].rememberme.checked)
{
createCookie('nasusername',document.forms[0].username.value,2);
createCookie('type',document.forms[0].type.value,2);
createCookie('nasrememberme',1,2);

}
else{
eraseCookie('nasusername');
eraseCookie('type');
eraseCookie('nasrememberme');
}
document.forms[0].action = "newlogin.do?phone=0";
document.forms[0].method = "post";
document.forms[0].submit();
return true;
}

OK, so the page ‘newlogin.do’ is to be opened as a response to the clicking of the login button. And the username and password info has to be passed along, of course.

We already have the base url for the login page that we just used. Now we have to combine the base url with the ‘newlogin.do’ page instead of the ‘login.do’ page that we accessed earlier:

login_form_id = "newlogin.do?phone=0"
type = "2"

login_form_url = base_url.group(1) + login_form_id

login_data = urllib.urlencode({'username': username, 'password': password,
'type': type})

f = opener.open(login_form_url, login_data)

… and success! This is enough to get the login done. I added config file handling to the final code so that the username/password are stored in a config file. The final code also ensures that we’re on a Tikona network before proceeding with the steps of logging in (by checking if the redirect URL is obtained in Step 1). See the latest code here.

Step 6: Auto-login on successful connection

Just one last step remains: a NetworkManager dispatcher script that will invoke this login program each time a network becomes ready:

#!/bin/sh

if [ "$2" = "up" ]; then
/home/amit/bin/tikona-auto-login || :
fi

Put this in /etc/NetworkManager/dispatcher.d with the appropriate permissions (744) and we’re good to go!

Next steps:
The project surely isn’t complete: a lot of support has to be added to NetworkManager itself to present a good UI to enable/disable these dispatcher scripts and also to prompt for a username/password instead of storing in a config file. This and several other TODO items are listed in the README file. If you plan on adding new networks that can be auto-logged in to, it’s easy to follow these steps or feel free to email me for guidance.

Communication between Guests and Hosts

Guest and Host communication should be a simple affair — the venerable TCP/IP sockets should be the first answer to any remote communication.  However, it’s not so simple once some special virtualisation-related constraints are added to the mix:

  • the guest and host are different machines, managed differently
  • the guest administrator and the host administrator may be different people
  • the guest administrator might inadvertently block IP-based communication channels to the host via firewall rules, rendering the TCP/IP-based communication channels unusable

The last point needs some elaboration: system administrators want to be really conservative in what they “open” to the outside world.  In this sense, the guest and host administrators are actively hostile to each other.  Also, rightly, neither should trust each other, given that a lot of the data stored in operating systems are now stored within clouds and any leak of the data could prove disastrous to the administrators and their employers.

So what’s really needed is a special communication channel between guests and hosts that are not susceptible to being blocked out by guests or hosts as well as being a very special-purpose low-bandwidth channel that doesn’t look to re-implement TCP/IP.  Some other requirements are mentioned on this page.

After several iterations, we settled on one particular implementation: virtio-serial.  The virtio-serial infrastructure rides on top of virtio, a generic para-virtual bus that enables exposing custom devices to guests.  virtio devices are abstracted enough so that guest drivers need not know what kind of bus they’re actually riding on: they are PCI devices on x86 and native devices on s390 under the hood.  What this means is the same guest driver can be used to communicate with a virtio-serial device under x86 as well as s390.  Behind the scenes, the virtio layer, depending on the guest architecture type, works with the host virtio-pci device or virtio-s390 device.

The host device is coded in qemu.  One host virtio-serial device is capable of hosting multiple channels or ports on the same device.  The number of ports that can ride on top of a virtio-serial device is currently arbitrarily limited to 31, but one device can very well support 2^31 ports.  The device is available since upstream qemu release 0.13 as well as in Fedora from release 13 onwards.

The guest driver is written for Linux and Windows guests.  The API exposed includes open, read, write, poll, close calls.  For the Linux guest, ports can be opened in blocking as well as non-blocking modes.  The driver is included upstream from Linux kernel version 2.6.35.  Kernel 2.6.37 will also have asynchronous IO support — ie, SIGIO will be delivered to interested userspace apps whenever the host-side connection is established or closed, or when a port gets hot-unplugged.

Using the ports is simple: when using qemu from the command line directly, add:

-chardev socket,path=/tmp/port0,server,nowait,id=port0-char 
-device virtio-serial 
-device virtserialport,id=port1,name=org.fedoraproject.port.0,chardev=port0-char
this creates one device with one port and exposes to the guest the name ‘org.fedoraproject.port.0‘.  Guest apps can then open /dev/virtio-ports/org.fedoraproject.port.0 and start communicating with the host.  Host apps can open the /tmp/port0 unix domain socket to communicate with the guest.  Of course, there are other qemu chardev backends that can be used other than unix domain sockets.  There also is an in-qemu API that can be used.
More invocation options and examples are given in the invocation and how to test sections.

There is sample C code for the guest as well as sample python code from the test suites.  The original test suite, written to verify the functionality of the user-kernel interface, will in the near future be moved to autotest, enabling faster addition of more tests and tests that not just check for correctness, but also regressions and bugs.

virtio-serial is already in use by the Matahari, Spice, libguestfs and Anaconda projects.  I’ll briefly mention how Anaconda is going to use virtio-serial: starting Fedora 14, guest installs of Fedora will automatically send Anaconda logs to the host if a virtio-serial port with the name of ‘org.fedoraproject.anaconda.log.0‘ is found.  virt-install is modified to create such a virtio-serial port.  This means debugging early anaconda output will be easier with the logs available on the host (and not worrying about guest file system corruptions during install or network drivers not available before a crash).

Further use: There are many more uses of virtio-serial, which should be pretty easy to code:

  • shutting down or suspending VMs when a host is shut down
  • clipboard copy/paste between hosts and guests (this is under progress  by the Spice team)
  • lock a desktop session in the guest when a vnc/spice connection is closed
  • fetch cpu/memory/power usage rates at regular intervals for monitoring

Upgrading from Fedora 11 to Fedora 13

Having already installed (what would be) F13 on my work and personal laptops the traditional way — by installing a fresh copy (since I wanted to modify the partition layout), I tried an upgrade on my desktop.

My desktop was running Fedora11 and I moved it to Fedora13. I wanted to test how the upgrade functionality works, does it run into any errors (esp. since it’s from 11 -> 13, skipping 12 entirely), if the experience is smooth, etc.

I started out by downloading the RC compose from http://alt.fedoraproject.org/. Since all my installs are for the x86-64 architecture, I downloaded the DVD.iso. I then loopback-mounted the DVD on my laptop:

# mount -o loop /home/amit/Downloads/Fedora-13-x86_64-DVD.iso /mnt/F13

I then exported the contents of the mount via NFS; edit /etc/exports and put the following line:

/mnt/F13 172.31.10.*

This ensures the mount is only available to users on my local network.

Then, ensure the nfs services are running:

# service nfs start
# service nfslock start

On my desktop which was to be upgraded, I mounted the NFS export:

# mount -t nfs 172.31.1.12:/mnt/F13 /mnt

And copied the kernel and initrd images to boot into:

# cp /mnt/isolinux/vmlinuz /boot
# cp /mnt/isolinux/initrd.img /boot

Then update the grub config with this new kernel that we’ll boot into for the upgrade. Edit /boot/grub.conf and add:

title Fedora 13 install
    root (hd0,0)
    kernel /vmlinuz
    initrd /initrd.img

Once that’s done, reboot and select the entry we just put in the grub.conf file. The install process starts and asks where the files are located for the install. Select NFS and provide the details: Server 172.31.1.12 and directory /mnt/F13.

The first surprise for me was to see the updated graphics for the Anaconda installer. They got changed in the time I installed F13 (beta) on my laptops. The new artwork certainly looks very good and smooth. More white, less blue is a departure from the usual Fedora artwork, but it does look nice.

I then proceeded to select ‘upgrade’, it found my old F11 install and everything after that ‘just worked’. I was skeptical about this while it was running: I had some rpmfusion.org repositories enabled and some packages installed from those repositories. I was wondering if those packages would be upgraded as well, or would they be left at the current state, which could create dependency problems, or if they would be completely removed. I had to wait for the install to finish, which took a while. The post-install process took more than half an hour, and when it was done, I selected ‘Reboot’. Half-expecting something to have broken or to not work, I logged in, and voila, I was presented the shiny new GNOME 2.30 desktop. The temporary install kernel that I had put in as the default boot kernel was also removed. Small thing in itself, but great for usability.

Everything looked and felt right, no sign of breakage, no error messages, no warnings, just some good seamless upgrade.

I can’t say really expected this. Coming from a die-hard Debian fan, distribution upgrades are something that was the forte of just Debian. For now. The Fedora developers have done a really good job of getting this process extremely easy to use and extremely reliable. Kudos to them!

While the Fedora 13 release has been pushed back a week for a install-over-NFS bug, it needs a certain combination of misfortunes to trigger, and luckily, I didn’t hit that bug. However, when trying the F13 beta install on my laptop, I had hit a couple of Anaconda bugs, one of which is now resolved for F14 (crash when upgrading without a bootloader configuration) and the other one (no UI refresh if I switch between virtual consoles until a package finishes install — really felt while installing over a slow network link) is a known problem with the design of Anaconda, and hopefully the devs get to it.

Overall, a really nice experience and I can now comfortably say Fedora has really rocketed ahead (all puns intended) since the old times when even installing packages used to be a nightmare. This is good progress indeed, and I’m glad to note that the future of the Linux desktop is in very good hands.

Cheers to the entire team!

Virtualisation (on Fedora)

A few volunteers from India associated with the Fedora Project wrote articles for Linux For You‘s March 2010 Virtualisation Special. Those articles, and a few others, are put up on the Fedora wiki space at Magazine Articles on Virtualization. Thanks to LFY for letting us upload the pdfs!

We’re always looking for more content, in the form of how-tos, articles, experiences, tips, etc., so feel free to upload content to the wiki or blog about it.

We also have contact with some magazine publishers so if you’re interested in writing for online or print magazines, let the marketing folks know!

Debian moving to time-based releases

http://www.debian.org/News/2009/20090729

I have used Debian since several years now and have always been either on the ‘testing’ or the ‘sid’ releases on my desktops / laptops. I never felt the need to switch to ‘stable’ as even sid was stable enough for me for my regular usage (with a few scripts to keep out buggy new debs).

I’ve seen, over time, people move to Ubuntu though. That means people really like Debian but they also wanted ‘stable’ releases at predictable times. If one stayed on a Debian stable release, ‘bleeding edge’ or ‘new software’ was never possible. When a new Debian release would be out, upstreams would’ve moved one or two major releases ahead.

So Ubuntu captured the desktop share away from Debian. The server folks wouldn’t complain for lack of new features. So would this really make any difference?

Will the folks who migrated to Ubuntu go back to Debian?

(I’ve since moved majority of my machines to Fedora though — but that’s a different topic)

We open if we die

I wrote a few comments about introducing “guarantees” in software — how do you assure your customers that they won’t be left in the lurch if you go down. It generated a healthy discussion and that gave me an opportunity to fine-tune the definition of “insurance” in software. Openness is such an advantage to foster great discussions and free dialogue.

So reading this piece of news this morning via phoronix about a company called pogoplug has me really excited. I’d feel vindicated if they could increase their customer base by that announcement. I hope they don’t go down; but I’d also like to see them go open regardless of their financial health; if an idea is out in the market, there’ll be people copying it and implementing it in different ways anyway. If, instead, they open up their code right away, they can engage a much wider community in enhancing their software and prevent variants from springing up which might even offer competing features.

Re-comparing file systems

The previous attempt at comparing file systems based on the ability to allocate large files and zero them met with some interesting feedback. I was asked why I didn’t add reiserfs to the tests and also if I could test with larger files.

The test itself had a few problems, making the results unfair:

- I had different partitions for different file systems. So the hard drive geometry and seek times would play a part in the test results

- One can never be sure that the data that was requested to be written to the hard disk was actually written unless one unmounts the partition

- Other data that was in the cache before starting the test could be in the process of being written out to the disk and that could also interfere with the results

All these have been addressed in the newer results.

There are a few more goodies too:
- gnuplot script to ease the charting of data
- A script to automate testing of on various file systems
- A big bug fixed that affected the results for the chunk-writing cases (4k and 8k): this existed right from the time I first wrote the test and was the result of using the wrong parameter for calculating chunk size. This was spotted by Mike Galbraith on lkml.

Browse the sources here

or git-clone them by

git clone git://git.fedorapeople.org/~amitshah/alloc-perf.git

So in addition to ext3, ext4, xfs and btrfs, I’ve added ext2, reiserfs and expanded the ext3 test to cover the three journalling modes: data, writeback and guarded. guarded is the new mode that’s being proposed (it’s not yet in the Linux kernel). It’s to have the speed of writeback and the consistency of ordered.

I’ve also run these tests twice, once with a user logged in and a full desktop on. This is to measure the times that a user will see when actually working on the system and some app tries allocating files.

I also ran the tests in single mode so that there are no background services running and the effect of other processes on the tests is not seen. This is done to see the timing. The fragmentation will of course remain more or less the same; that’s not a property of system load.

It’s also important to note that I created this test suite to mainly find out how fragmented the files are when allocating them using different methods on different file systems. The comparison of performance is a side-effect. This test is also not useful for any kind of stress-testing file systems. There are other suites that do a good job of it.

That said, the results suggest that btrfs, xfs and ext4 are the best when it comes to keeping fragments at the lowest. Reiserfs really looks bad in these tests.Time-wise, the file systems that support the fallocate() syscall perform the best, using almost no time in allocating files of any size. ext4, xfs and btrfs support this syscall.

On to the tests. I created a 4GiB file for each test. The tests are: posix_fallocate(), mmap+memset, writing 4k-sized chunks and writing 8k-sized chunks. These tests are repeated inside the same partition sized 20GiB. The script reformats the partition for the appropriate fs before the run.

The results:

The first 4 columns show the times (in seconds) and the last four columns show the fragments resulting from the corresponding test.

The results, in text form, are:

# 4GiB file
# Desktop on
filesystem posix-fallocate mmap chunk-4096 chunk-8192 posix-fallocate mmap chunk-4096 chunk-8192
ext2 73 96 77 80 34 39 39 36
ext3-writeback 89 104 89 93 34 36 37 37
ext3-ordered 87 98 89 92 34 35 37 36
ext3-guarded 89 102 90 93 34 35 36 36
ext4 0 84 74 79 1 10 9 7
xfs 0 81 75 81 1 2 2 2
reiserfs 85 86 89 93 938 35 953 956
btrfs 0 85 79 82 1 1 1 1

# 4GiB file
# Single
filesystem posix-fallocate mmap chunk-4096 chunk-8192 posix-fallocate mmap chunk-4096 chunk-8192
ext2 71 85 73 77 33 37 35 36
ext3-writeback 84 91 86 90 34 35 37 36
ext3-ordered 85 85 87 91 34 34 37 36
ext3-guarded 84 85 86 90 34 34 38 37
ext4 0 74 72 76 1 10 9 7
xfs 0 72 73 77 1 2 2 2
reiserfs 83 75 86 91 938 35 953 956
btrfs 0 74 76 80 1 1 1 1

[Sorry; couldn't find an option to make this look proper]

Fig. 1, number of fragments. reiserfs performs really bad here.

Fig. 2. The same results, but without reiserfs.
Fig. 3, time results, with desktop on

Fig. 4. Time results, without desktop — in single user mode.

So in conclusion, as noted above, btrfs, xfs and ext4 are the best when it comes to keeping fragments at the lowest. Reiserfs really looks bad in these tests. Time-wise, the file systems that support the fallocate() syscall perform the best, using almost no time in allocating files of any size. ext4, xfs and btrfs support this syscall.

Comparison of File Systems And Speeding Up Applications

Update: I’ve done a newer article on this subject at http://log.amitshah.net/2009/04/re-comparing-file-systems.html that removes some of the deficiencies in the tests mentioned here and has newer, more accurate results along with some new file systems.

How should one allocate disk space for a file for later writing? ftruncate() (or lseek() followed by write()) create sparse files, not what is needed. A traditional way is to write zeroes to the file till it reaches the desired file size. Doing things this way has a few drawbacks:

  • Slow, as small chunks are written one at a time by the write() syscall
  • Lots of fragmentation

posix_fallocate() is a library call that handles the chunking of writes in one batch; the application need not have to code his/her own block-by-block writes. But this still is in the userspace.

Linux 2.6.23 introduced the fallocate() system call. The allocation is then moved to kernel space and hence is faster. New file systems that support extents make this call very fast indeed: a single extent is to be marked as being allocated on disk (as traditionally blocks were being marked as ‘used’). Fragmentation too is reduced as file systems will now keep track of extents, instead of smaller blocks.

posix_fallocate() will internally use fallocate() if the syscall exists in the running kernel.

So I thought it would be a good idea to make libvirt use posix_fallocate() so that systems with the newer file systems will directly benefit when allocating disk space for virtual machines. I wasn’t sure of what method libvirt already used to allocate the space. I found out that it allocated blocks in 4KiB sized chunks.

So I sent a patch to the libvir-list to convert to posix_fallocate() and danpb asked me about what the benefits of this approach were and also asked about using alternative approaches if not writing in 4K chunks. I didn’t have any data to back up my claims of “this approach will be fast and will result in less fragmentation, which is desirable”. So I set out to do some benchmarking. To do that, though, I first had to make some empty disk space to create a few file systems of sufficiently large sizes. Hunting for a test machine with spare disk space proved futie, so I went about resizing my ext3 partition and creating about 15 GB of free disk space. I intended to test ext3, ext4, xfs and btrfs. I could use my existing ext3 partition for the testing, but that would not give honest results about the fragmentation (existing file systems may already be fragmented, causing big new files surely to be fragmented whereas on a fresh fs, I won’t run into that risk).

Though even creating separate partitions on rotating storage and testing file system performance won’t give perfectly honest results, I figured if the percentage difference in the results was quite high, that won’t matter. I grabbed the latest Linus tree and the latest dev trees for the userspace utilities for all the file systems and created about 5GB partitions for each fs.

I then wrote a program that created a file, allocated disk space and closed it and calculate the time taken in doing so. This was done multiple times for different allocation methods: posix_fallocate(), mmap() + memset() and writing zeroes in 4096 byte chunks and 8192 byte chunks.

So I had four methods of allocating files and 5G partition size. So I decided to check the performance by creating 1GiB file size for each allocation method.

The program is here. The results, here. The git tree is here.

I was quite surprised seeing poor performance for posix_fallocate() on ext4. On digging a bit, I realised mkfs.ext4 didn’t create it with extents enabled. I reformatted the partition, but that data was valuable to have as well. Shows how much a file system is better with extents support.

Graphically, it looks like this:
Notice that ext4, xfs and btrfs take only a few microseconds to complete posix_fallocate().

The number of fragments created:

btrfs doesn’t yet have the ioctl implemented for calculating fragments.

The results are very impressive and the final patches to libvirt were finalised pretty quickly. They’re now in the development branch libvirt. Coming soon to a virtual machine management application near you.

Use of posix_fallocate() will be beneficial to programs that know in advance the size of the file being created, like torrent clients, ftp clients, browsers, download managers, etc. It won’t be beneficial in the speed sense, as data is only written when it’s downloaded, but it’s beneficial in the as-less-fragmentation-as-possible sense.