[IMC-Tech] Re: mirroring issues
Michael deBeer
madebeer at igc.org
Wed, 3 Jan 2001 04:35:12 -0800 (PST)
On Tue, 2 Jan 2001, Matthew Arnison wrote:
> i looked into altaway, and it's roughly $650 a year for something that i
> feel is much less than we currently have with loudeye.
I agree, loudeye is a hell of a good partner.
> * i think about 100 GB and growing of storage
If indymedia is already at 100 GB, when will the current 200 GB raid array
fill up? Do you have a breakdown of size per media-type (gifs would be
easier to distribute elsewhere than realaudio)? How much 'expires' and how
much of it is permanent? Is there a way to generate an average 'profile'
of a city.indymedia.org disk-usage over time, to forcast the disk usage
for the next year?
I can think of four possibilities:
* enough stuff expires so that you won't go above 200 GB till after 2001
* loudeye will give you RAID arrays as you need them, 400-600 GB, whatever
* find another big group to give you another server to put overflow
material on. Maybe approach Exodus and Sun, and ask them to host a free
300 GB RAID array.
* figure out a distributed storage for older stuff - building on the mysql
table you have planned which has the name of each content item.
Have the index server store a list of URLS where each content item can
be found. Maybe freenet or maybe a homebrew of distributed mirroring.
> so u can see maybe why i am willing to do some work on our mirroring
> software to suit loudeye if necessary.
Yes.
If for storage space reasons we do need to split some of the work between
loudeye and a distributed network of other servers, we'd want to make sure
it was clear which section of the network is loudeye and which was a
consortium of rabble-mirrors ;) Maybe have older stories have the URL:
rabble.indymedia.org or
archives.indymedia.org or
archives.prague.indymedia.org
> i think rsync would still not solve certain problems, due to the way
> we want to have media mirrored as soon as possible after it is
> published. while i agree rsync is excellent, and much better than ftp,
> i think rsync is designed for slower mirroring, such as daily.
That is true. rsync is better for sync-ing directories, for mirror sites
grabbing all the latest files at their leisure, not for file-upload on
demand.
> > Comments on the current scheme: As a way of dealing with thousands of
> > files per directory, perhaps do an MD5 hash of the filename, and use that
> > as the directory. This would only have to be computed once, and the
>
> sounds interesting, but i'm not sure i understand this. wouldn't the md5
> hash be different for each filename?
The hash would be different for most filenames. Some files might have the
same hash.
> also i think it's better to have URLs
> that are short and easy to type, from a usability standpoint (i know
> people shouldn't have to type them in, but sometimes people end up needing
> to for some reason or other).
True.
I think the directory system should not break, not matter how many files
are put into it. MD5-hash directory names would be expandable, but would
create ugly directory names. I think someone else suggested directories
for each month, which would work. Also, if each item has a numeric id in
the database, the directories could be based on the numeric ids, so item
number 2002 would go into directory /2/ and item 8003 would go to
directory /8/. If there is a reason for storing different media types in
different directories, the /2/ proposal could be joined with the
media-type proposal, so that a gif file numbered 3003 would go in /gif/2/
Michael