Idea: Faster Metadata Downloads With Yum and Git

The presto plugin for yum has worked great for me so far.  It’s been very useful, not for the lack of download limits, but for the time saved in getting the bits downloaded.  The time saved is significant if the bandwidth is not too good (it never is).

However, I’ve observed in some cases the presto metadata is larger than the actual package size in some cases — e.g., a font.  If a font package, say 21KB in size, has a deltarpm of 3KB in size, it results in a savings of 18KB of downloads.  This is a very impressive 85% of savings.  However, the presto metadata itself could be more than 400KB, nullifying the advantage of the drpm.  We’re effectively downloading, in this corner case, 418KB instead of 21KB.  That is 19 times of what of the actual package size.

So here’s an idea: why not let git handle the metadata for us?  The metadata is a text (or sqlite) file that lists package names, their dependencies, version numbers and so on.  Since text can be very easily handled by git, it should be a breeze fetching metadata updates from a git server.  At install-time (or upgrade-time), the metadata git repository for a particular Fedora version can be cloned, and on each update, all that’s necessary for yum to do is invoke ‘git pull’ and it gets all the latest metadata.  Downloads: a few KB each day instead of a few MBs.

The advantages are numerous:

  • Saves server bandwidth
  • Uses very less server resources when using the git protocol
  • Scales really well
  • Compresses really well
  • Makes yum faster for users
    • I think this is the biggest win — not having to wait ages for a ‘yum search’ to finish everyday has to get anyone interested.  Makes old-time Debian users like me very happy.

There are some challenges to be considered as well:

  • Should the yum metadata be served by just one canonical git server, while the packages get served by mirrors?  Not each mirror may have the git protocol enabled nor can the Fedora project ask each mirror to configure git on the server.
    • Doing this can result in slow mirrors not able to service package download requests for the latest metadata
    • This can be mitigated by using git over http over the server
  • The metadata can keep growing
    • This can be mitigated by having a separate git repository for the metadata belonging to each release.  Multiple git repos can be set up easily for extra repositories (e.g., for external repos or for multiple version repos while doing an upgrade).
  • The mirror list has to be updated to also include git repositories that can be worked on with ‘git remote’.

I’ve filed an RFE for this feature.  For someone looking for a weekend hack for yum in python, this should be a good opportunity to jump right in!  If you intend to take this up, get in touch with the developers, make sure no one else is working on this yet (or collaborate with others) and update the details on the Fedora Feature Page.

3 thoughts on “Idea: Faster Metadata Downloads With Yum and Git”

  1. >you are saying right thing, except it already been done for debian/apt packaging…

    debian never loads all "medadata" at one package, but instead it download a tone (or one of) of small diff from last update time to last version.

    i suppose 'smart' (package managers) already does it…

    so yum is really outdated app.

  2. >Axet,

    apt might do that for metadata but it doesn't do that by default for the actual data which yum does. so calling it outdated is biased.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>