Category Archives: ideas

Idea: Faster Metadata Downloads With Yum and Git

The presto plugin for yum has worked great for me so far.  It’s been very useful, not for the lack of download limits, but for the time saved in getting the bits downloaded.  The time saved is significant if the bandwidth is not too good (it never is).

However, I’ve observed in some cases the presto metadata is larger than the actual package size in some cases — e.g., a font.  If a font package, say 21KB in size, has a deltarpm of 3KB in size, it results in a savings of 18KB of downloads.  This is a very impressive 85% of savings.  However, the presto metadata itself could be more than 400KB, nullifying the advantage of the drpm.  We’re effectively downloading, in this corner case, 418KB instead of 21KB.  That is 19 times of what of the actual package size.

So here’s an idea: why not let git handle the metadata for us?  The metadata is a text (or sqlite) file that lists package names, their dependencies, version numbers and so on.  Since text can be very easily handled by git, it should be a breeze fetching metadata updates from a git server.  At install-time (or upgrade-time), the metadata git repository for a particular Fedora version can be cloned, and on each update, all that’s necessary for yum to do is invoke ‘git pull’ and it gets all the latest metadata.  Downloads: a few KB each day instead of a few MBs.

The advantages are numerous:

  • Saves server bandwidth
  • Uses very less server resources when using the git protocol
  • Scales really well
  • Compresses really well
  • Makes yum faster for users
    • I think this is the biggest win — not having to wait ages for a ‘yum search’ to finish everyday has to get anyone interested.  Makes old-time Debian users like me very happy.

There are some challenges to be considered as well:

  • Should the yum metadata be served by just one canonical git server, while the packages get served by mirrors?  Not each mirror may have the git protocol enabled nor can the Fedora project ask each mirror to configure git on the server.
    • Doing this can result in slow mirrors not able to service package download requests for the latest metadata
    • This can be mitigated by using git over http over the server
  • The metadata can keep growing
    • This can be mitigated by having a separate git repository for the metadata belonging to each release.  Multiple git repos can be set up easily for extra repositories (e.g., for external repos or for multiple version repos while doing an upgrade).
  • The mirror list has to be updated to also include git repositories that can be worked on with ‘git remote’.

I’ve filed an RFE for this feature.  For someone looking for a weekend hack for yum in python, this should be a good opportunity to jump right in!  If you intend to take this up, get in touch with the developers, make sure no one else is working on this yet (or collaborate with others) and update the details on the Fedora Feature Page.

We open if we die

I wrote a few comments about introducing “guarantees” in software — how do you assure your customers that they won’t be left in the lurch if you go down. It generated a healthy discussion and that gave me an opportunity to fine-tune the definition of “insurance” in software. Openness is such an advantage to foster great discussions and free dialogue.

So reading this piece of news this morning via phoronix about a company called pogoplug has me really excited. I’d feel vindicated if they could increase their customer base by that announcement. I hope they don’t go down; but I’d also like to see them go open regardless of their financial health; if an idea is out in the market, there’ll be people copying it and implementing it in different ways anyway. If, instead, they open up their code right away, they can engage a much wider community in enhancing their software and prevent variants from springing up which might even offer competing features.

Re-comparing file systems

The previous attempt at comparing file systems based on the ability to allocate large files and zero them met with some interesting feedback. I was asked why I didn’t add reiserfs to the tests and also if I could test with larger files.

The test itself had a few problems, making the results unfair:

- I had different partitions for different file systems. So the hard drive geometry and seek times would play a part in the test results

- One can never be sure that the data that was requested to be written to the hard disk was actually written unless one unmounts the partition

- Other data that was in the cache before starting the test could be in the process of being written out to the disk and that could also interfere with the results

All these have been addressed in the newer results.

There are a few more goodies too:
- gnuplot script to ease the charting of data
- A script to automate testing of on various file systems
- A big bug fixed that affected the results for the chunk-writing cases (4k and 8k): this existed right from the time I first wrote the test and was the result of using the wrong parameter for calculating chunk size. This was spotted by Mike Galbraith on lkml.

Browse the sources here

or git-clone them by

git clone git://git.fedorapeople.org/~amitshah/alloc-perf.git

So in addition to ext3, ext4, xfs and btrfs, I’ve added ext2, reiserfs and expanded the ext3 test to cover the three journalling modes: data, writeback and guarded. guarded is the new mode that’s being proposed (it’s not yet in the Linux kernel). It’s to have the speed of writeback and the consistency of ordered.

I’ve also run these tests twice, once with a user logged in and a full desktop on. This is to measure the times that a user will see when actually working on the system and some app tries allocating files.

I also ran the tests in single mode so that there are no background services running and the effect of other processes on the tests is not seen. This is done to see the timing. The fragmentation will of course remain more or less the same; that’s not a property of system load.

It’s also important to note that I created this test suite to mainly find out how fragmented the files are when allocating them using different methods on different file systems. The comparison of performance is a side-effect. This test is also not useful for any kind of stress-testing file systems. There are other suites that do a good job of it.

That said, the results suggest that btrfs, xfs and ext4 are the best when it comes to keeping fragments at the lowest. Reiserfs really looks bad in these tests.Time-wise, the file systems that support the fallocate() syscall perform the best, using almost no time in allocating files of any size. ext4, xfs and btrfs support this syscall.

On to the tests. I created a 4GiB file for each test. The tests are: posix_fallocate(), mmap+memset, writing 4k-sized chunks and writing 8k-sized chunks. These tests are repeated inside the same partition sized 20GiB. The script reformats the partition for the appropriate fs before the run.

The results:

The first 4 columns show the times (in seconds) and the last four columns show the fragments resulting from the corresponding test.

The results, in text form, are:

# 4GiB file
# Desktop on
filesystem posix-fallocate mmap chunk-4096 chunk-8192 posix-fallocate mmap chunk-4096 chunk-8192
ext2 73 96 77 80 34 39 39 36
ext3-writeback 89 104 89 93 34 36 37 37
ext3-ordered 87 98 89 92 34 35 37 36
ext3-guarded 89 102 90 93 34 35 36 36
ext4 0 84 74 79 1 10 9 7
xfs 0 81 75 81 1 2 2 2
reiserfs 85 86 89 93 938 35 953 956
btrfs 0 85 79 82 1 1 1 1

# 4GiB file
# Single
filesystem posix-fallocate mmap chunk-4096 chunk-8192 posix-fallocate mmap chunk-4096 chunk-8192
ext2 71 85 73 77 33 37 35 36
ext3-writeback 84 91 86 90 34 35 37 36
ext3-ordered 85 85 87 91 34 34 37 36
ext3-guarded 84 85 86 90 34 34 38 37
ext4 0 74 72 76 1 10 9 7
xfs 0 72 73 77 1 2 2 2
reiserfs 83 75 86 91 938 35 953 956
btrfs 0 74 76 80 1 1 1 1

[Sorry; couldn’t find an option to make this look proper]

Fig. 1, number of fragments. reiserfs performs really bad here.

Fig. 2. The same results, but without reiserfs.
Fig. 3, time results, with desktop on

Fig. 4. Time results, without desktop — in single user mode.

So in conclusion, as noted above, btrfs, xfs and ext4 are the best when it comes to keeping fragments at the lowest. Reiserfs really looks bad in these tests. Time-wise, the file systems that support the fallocate() syscall perform the best, using almost no time in allocating files of any size. ext4, xfs and btrfs support this syscall.

Comparison of File Systems And Speeding Up Applications

Update: I’ve done a newer article on this subject at http://log.amitshah.net/2009/04/re-comparing-file-systems.html that removes some of the deficiencies in the tests mentioned here and has newer, more accurate results along with some new file systems.

How should one allocate disk space for a file for later writing? ftruncate() (or lseek() followed by write()) create sparse files, not what is needed. A traditional way is to write zeroes to the file till it reaches the desired file size. Doing things this way has a few drawbacks:

  • Slow, as small chunks are written one at a time by the write() syscall
  • Lots of fragmentation

posix_fallocate() is a library call that handles the chunking of writes in one batch; the application need not have to code his/her own block-by-block writes. But this still is in the userspace.

Linux 2.6.23 introduced the fallocate() system call. The allocation is then moved to kernel space and hence is faster. New file systems that support extents make this call very fast indeed: a single extent is to be marked as being allocated on disk (as traditionally blocks were being marked as ‘used’). Fragmentation too is reduced as file systems will now keep track of extents, instead of smaller blocks.

posix_fallocate() will internally use fallocate() if the syscall exists in the running kernel.

So I thought it would be a good idea to make libvirt use posix_fallocate() so that systems with the newer file systems will directly benefit when allocating disk space for virtual machines. I wasn’t sure of what method libvirt already used to allocate the space. I found out that it allocated blocks in 4KiB sized chunks.

So I sent a patch to the libvir-list to convert to posix_fallocate() and danpb asked me about what the benefits of this approach were and also asked about using alternative approaches if not writing in 4K chunks. I didn’t have any data to back up my claims of “this approach will be fast and will result in less fragmentation, which is desirable”. So I set out to do some benchmarking. To do that, though, I first had to make some empty disk space to create a few file systems of sufficiently large sizes. Hunting for a test machine with spare disk space proved futie, so I went about resizing my ext3 partition and creating about 15 GB of free disk space. I intended to test ext3, ext4, xfs and btrfs. I could use my existing ext3 partition for the testing, but that would not give honest results about the fragmentation (existing file systems may already be fragmented, causing big new files surely to be fragmented whereas on a fresh fs, I won’t run into that risk).

Though even creating separate partitions on rotating storage and testing file system performance won’t give perfectly honest results, I figured if the percentage difference in the results was quite high, that won’t matter. I grabbed the latest Linus tree and the latest dev trees for the userspace utilities for all the file systems and created about 5GB partitions for each fs.

I then wrote a program that created a file, allocated disk space and closed it and calculate the time taken in doing so. This was done multiple times for different allocation methods: posix_fallocate(), mmap() + memset() and writing zeroes in 4096 byte chunks and 8192 byte chunks.

So I had four methods of allocating files and 5G partition size. So I decided to check the performance by creating 1GiB file size for each allocation method.

The program is here. The results, here. The git tree is here.

I was quite surprised seeing poor performance for posix_fallocate() on ext4. On digging a bit, I realised mkfs.ext4 didn’t create it with extents enabled. I reformatted the partition, but that data was valuable to have as well. Shows how much a file system is better with extents support.

Graphically, it looks like this:
Notice that ext4, xfs and btrfs take only a few microseconds to complete posix_fallocate().

The number of fragments created:

btrfs doesn’t yet have the ioctl implemented for calculating fragments.

The results are very impressive and the final patches to libvirt were finalised pretty quickly. They’re now in the development branch libvirt. Coming soon to a virtual machine management application near you.

Use of posix_fallocate() will be beneficial to programs that know in advance the size of the file being created, like torrent clients, ftp clients, browsers, download managers, etc. It won’t be beneficial in the speed sense, as data is only written when it’s downloaded, but it’s beneficial in the as-less-fragmentation-as-possible sense.

Startups in 14 sentences

Paul Graham has an article on the top 13 things to keep in mind for entrepreneurs. I have one to add (for software startups):

- Going open source can help
You might have a brilliant idea and a cool new product. It mostly will be disruptive technology. You might think of changing the world. But people might have to modify the way they were doing things. What if you run out of funds midway or some other unforeseen event by which your company has to shut shop? Customers will be vary of deploying solutions from startups for fears of them going down. If the customers are given access to the source code, they’re at least insured they can have control over the software if your company is unable to support it. And letting them know this can win some additional customers — who knows!

Making Suspend Safer for File Systems

I saw these File System Freezing patches that got merged into Linus’ tree yesterday and instantly thought that these patches could be used to freeze file systems before going into a suspended state. At the recent foss.in/2008, I met with Christoph Hellwig and one of the things we discussed was how he would never trust any file system to be in a consistent state after attempting suspend-to-disk.

The freezing patches are aimed at snapshotting as of now. Extending the suspend routines to make use of them is something I still have to look at. While working with file systems isn’t entirely new to me — I’ve worked on something called the Mosix File System earlier, it’s been a really long time. It’ll be quite interesting to work on this.

I had a brief chat with hch about this idea and while he says this still will not convince him to suspend to disk, it could be a good thing for suspend to ram where the laptop runs out of power but the fs could be in a good state. I agree. Though I’d like to use s-t-d with this!

I’ve had many ideas slip by without blogging about them for ages and later seen them implement by others. In this case, even if I don’t end up implementing something, I’d at least have the satisfaction of having penned it down first.

Laptops: The New Desktops

As laptop sales are outpacing desktop sales and laptops becoming more and more capable, it’s no wonder that laptops are now the preferred choice for a computer. The prices of laptops have fallen dramatically to aid this trend.

However, with this growing trend, there now comes a need to have even smaller portable computers. The laptops are now “too big”. So we now have the onslaught of netbooks and mobile phones doubling as handheld computers. So what exactly is it that people need? They want big screens but the device should be portable. They need more processing power but a small device and one that doesn’t heat up. It’s going to be very interesting wathcing this space in the next few years.

While discussing this topic with Vijay today, he mentioned he wanted to take just the laptop screen around without having to undock his laptop for presentations or short meetings. Tablets don’t work for him for some reasons. He just wanted the screen to go with him and communicate with the “base” wirelessly. I thought that should be possible with a low-power, low-speed processor on the screen itself running something off the RAM. Anyway, with the cloud-computing phenomenon, all one would need is a browser and a handful of other software (mainly plugins to browsers). The OSes will either have to evolve to support ASMP or the processor manufacturers will have to come out with low-power chips and being able to share the bus with a stronger processor (Intel’s Atom does seem it can fit here). The OS has to evolve in either case along with the chipset.

The desktop software will have to have support for this, of course, where you would click something like “detach the screen safely” and the necessary plugins can be transferred to the screen’s RAM. Or the screen can have some flash storage and the browser, presentation software, etc. can be stored natively all the time.

Anyway, is this still the most-desired gadget? Once this is done (as it has, Toshiba had a prototype two years back along the same lines), will people stop wanting more? Just in today’s world, I can imagine people just wanting to use their super mobile phones to work as the “screen” — a low-power computer when not in front of their laptops. They can be hooked up to projectors easily, they can be carried around, can be hooked up to bigger monitors to get more screen space. What’s stopping us from doing that now?

Piracy

The Indian movie industry (and that’s not just “Bollywood”) is plagued with piracy of movies as well as music. I’ve had several friends staying abroad telling me about recent releases they saw “on the Internet”. Of course, songs are always to be downloaded and not bought.

A movie I saw recently had a note at the end of the screening: “Please buy original CDs. Do not download music.” There was laughter in the sparsely-populated movie hall (on the 2nd day of the screening of the movie that talked about youth and music, no less).

That got me thinking: we spend quite a lot of money these days to watch movies in multiplexes. It’s about 5x-6x the cost from what I used to pay about 10 years back. And that too doesn’t guarantee a seat in the “balcony”. These days, the movie halls usually have flat pricing, no matter where you’re seated. You could be 5 feet away from the screen or 50.

So it’s no wonder people don’t want to go to movie theatres. They just walk across the street and buy a DVD for Rs 30 that has 3 or 4 of the latest releases. And they can always download the music or buy MP3 CDs that cost about the same but have music from 50 recent releases. Original audio CDs cost about the same it costs to watch the movie in the movie hall.

I was thinking what can help curb this piracy, and one thing that came to mind was the distributors and producers of the movies could give away audio CDs of the movie just after the screening either for free or for a very samll token amount, like Rs 30.

If this were done, people would actually go to the theatre to watch movies since the cost of the ticket no longer only gets them the movie but also gets them the CD to the songs which they’ve already listened to (and liked?) (side note: movies in India usually run more because of the music and actors than the story or reviews). Also, music gets distributed and listened to legally instead of it being pirated.

The producers need not worry about losing out on income via audio CD sales. I wonder how much they make anyway. Also, if this drives more people to the theatres, it’s only going to be good for them. For people who do not want to watch the movie but want the CD, they can buy the CDs as they had been buying previously. For people who wanted the music but did not buy it, there’s no negative in the model for the producer, but there’s a positive: enticing them to go watch the movie plus get a chance to get the CD.

So it came as a welcome surprise (though I don’t know how well this idea will take off) when I saw Google announced putting links in youtube videos for songs in the video.

I’ve had (non-Indian) friends tell me they don’t download music any more since they can get songs for just under a dollar from the various online stores. It hardly makes any difference to their bottomlines plus they get legal music and are free of any hassles they might later get into for doing illegal stuff (downloading).

This might work elsewhere, but in India, the mentality hasn’t changed enough that people will buy something instead of getting it for free or from a very cheap alternative. Adding ‘buy music you just liked from here’ won’t pick off. I’d like to be proven wrong, though.

There’s a lot to be gained in this model for everyone involved. Even the movie halls will see more traffic and hence more income for the various food courts and shopping plazas that are bundled in the movie hall complexes these days.

If this is implemented and takes off, the producers can then think about giving off DVDs of the movie for let’s say 50% of the original price. Why not?

Update: xkcd on piracy

Traffic

Traffic of the vehicular kind. Not the one I see on this blog. It’s not just traffic, it’s terrorism on roads. Every day. Every minute. I’m petrified of road accidents, and the people in Pune (or anywhere else in India, for that matter) don’t let me breathe easy whenever I travel.

I had written about traffic earlier:

Just read an article on Wired on ideas for smoother traffic flow. Some ideas suggested are not having any signs or signals at intersections, so drivers are more cautious and so on. Well, it kind of works that way here in Pune, with people ignoring signals, pedestrians walking on streets instead of footpaths, cyclists and bikers zipping through the footpaths at times to avoid a long queue of vehicles and so on.

And plus, the potholes, speed-brakers and gutters (which are not aligned with the road height) do ensure the vehicles don’t race beyond a particular limit.

I guess the Indian road authorities had this all figured out much, much earlier!

However, there’s more to add to the terrorism aspect of traffic:

Terror

- Everyone has a birthright to honk, and honk hard. People blow their horns to signal to others:
– that they’re overtaking from the wrong side
– that they’re overtaking from the right side
– that they’re speeding at a crossing and everyone else who hears the horn has to stop, or else…
– to communicate with passers-by
– that they’re about to cross the just-turned-red signal and all others should wait for him / her to cross

- Everyone has a right to spit out of their cars / buses / trucks.
– While waiting at a signal or passing a bus, make sure you alert the passengers of your existence else you’ll be spat upon.

- No one respects road dividers (even if they exist). If someone has to turn to the road on the other side of a divider, one doesn’t go slightly further and make a U-turn. You cut the road and drive on the other side

- Cyclists will never wear light clothes or reflectors. It’s upto the others to detect their presence and avoid them at nights

- Everyone drives with high beam at nights, blinding the traffic on the other side. Pedestrians, cyclists — never safe.

- Buses / trucks never stop in the space allotted to them. They occupy the whole road.

Did I mention I’m petrified of road accidents? I’ve seen people involved in road accidents scream in agony in hospitals. Young people. Those who like to speed across the roads, flaunting their flashy bikes and mobiles. Ending up in hospitals and screaming. I wonder if death is better than such a state after a road accident.

Government Apathy

I was headed out in my new car. I was waiting at a junction for the signal to go green. It went green, I went ahead, and bam! A biker from my left rammed straight into the car. He broke his signal. He justified by saying “left is a free turn” and that he could take it anytime. What’s alarming is he had zero visibility of my car due to some other car in front of him. Why do people race at crossings, and also go into blind territory? I wonder how many people overtake buses, cars, etc., only to find someone crossing and dashing into them?

Immediately after this accident, I was surrounded by the colleagues of the biker. One trainee cop did come up to the scene of the accident, but he was shooed away by the mob. I had no other go but to get the dents serviced at my own expense. I just kept wondering: I pay so much of tax each year, what do I get in return? Bad roads and no justice. Is there nothing a law-abiding citizen can expect his elected representative do anything for him?

Idea for a smoother traffic flow

Inspite of all this, I remain optimistic. With the traffic in Pune (and every Indian city) growing by the minute, I thought of something that is eco-friendly, traffic-friendly as well as people-friendly. Trams. Why not install trams in Pune that ferry people between hotspots and the central areas? For example, the Hinjewadi tech park to Shivajinagar, Magarpatta to Pune Station or Swargate, and then connect Shivajinagar, Pune Station, Swargate, Deccan, Kothrud, Aundh, etc.? There’s no risk of the trams stopping in the middle of the roads causing inconvenience. They can operate on electricity, which can be generated by nuclear, wind or solar energy (or a combination of them all). Travel within the city is already a pain with the auto rickshaw people being very uncooperative and the buses killing people and breaking traffic rules. This will also help take off a lot of private vehicles on roads and also the shuttles that companies ply in the city for employeees. Pollution (air, noise) levels will drop drastically.

Please, please do something like this.