The Register® — Biting the hand that feeds IT

Comments on: Genetic researchers fill 1TB a week

Duuurrr 

Posted Friday 3rd August 2007 15:17 GMT

Haven't they heard of .zip files (or RAR or any other open source, free to use, non goverment controlled, license free, multi platform compression method. Have to mention this to aviod the fanboys).

big whoop... 

Posted Friday 3rd August 2007 15:41 GMT

2 terabyte ... thats two harddisks.

Seagate,WD and IBM/Hitachi all have 1tb models

This mornings ad at fry's : 500 Gbyte drive for 99$. limit 4 per customer. If 50 of those researchers hop over to their local fry's stores, they'll have enoug storage for the rest of the year.

Piffle... 

Posted Friday 3rd August 2007 18:20 GMT

CERN's Large Hadron Collider will generate 40 TB of data *per second*.

I wonder what coding they use... 

Posted Saturday 4th August 2007 00:01 GMT

Any chance their data looks like

<xml>

<structure>

<dna>

<helix>

<turn>

<pair>

<nucleotide>

<aderine/>

<tyamine/>

</nucleotide>

<nucleotide>

<cytosine/>

<guanine/>

</nucleotide>

</pair>

.

.

.

</turn>

.

.

.

</helix>

</dna>

</structure>

</xml>

?... just for the ease of using a COTS xml parser?

RE: wonder what coding they use... 

Posted Saturday 4th August 2007 03:59 GMT

Just started studying their coding so I can write parser for it.

the most comon is letter per base pair

so your example would be (preamble stuff..) AC

they use one byte not 2 bits because they need to cover the case where a particular base could be either (A or C) or (A or C or G) or any base or base is unknown etcetera.

There is a file format that only handles specified single base pairs and it use 2 bits per base with padding where needed to bring a sequence to 32 bits so it runs faster on 4 byte word machines.

Unfortunately they do all this in (6 different idioms for every common concept) PERL so it is easier to start writing utilities from screatch than trying to read existing code.[Key PERL fan boy flames)

It's more than just picking up a bunch of cheapo drives 

Posted Saturday 4th August 2007 04:31 GMT

For research that is supposedly valuable I wouldn't dare put them on cheap generic drives. 1TB drives are a recent item so I would be careful if I were to go out to the local fry's/best buy/compusa. Regardless the biggest issue I would have would be the physical space, power, and maintenance requirements.

If the data has to be accessed on the fly so that .zip files are out of the question they could still evaluate compressed partitions.

Some of these comments are excellent examples 

Posted Saturday 4th August 2007 12:57 GMT

of "consumer" coding fanboys. Big science doesn't use the simple tools code-droids are used to. There is another world of computing outside "Windows and -inx" that most people will never be exposed to.

Big science uses IT as a tool, it is not the end-all-be-all "world" for that industry. Big science is one of the few industries that hasn't fallen completely for the smokescreen that is IT.

RE: data storage 

Posted Sunday 5th August 2007 14:58 GMT

Just thinking, it would be interesting to know if off site backups were being made and how quickly they could be brought up online. (Another whole server farm, perhaps?)

RE: data storage 

Posted Wednesday 8th August 2007 12:43 GMT

If the generation of the 200Gb of data was performed over 1 week at a constant rate, they'd need around a 5Mbit/s connection to be able to back it up to a remotely hosted SAN. If it was just within working hours, you'd be looking at around 20Mbit/s

It doesn't seem that unviable... bearing in mind that "Big Science"(tm) has pretty deep pockets.

I'd guess that they'd have an alternative site for something like this - how would you explain to the big boss how you'd lost *everything* in the event of a major incident?

Don’t Miss

Amazon logo 75Amazon cloud heads for Asian sky

Steals MS thunder with Redmondian SDK

EMCData Domain-besotted EMC dumps Quantum

Not the girl for me, even after lending her $100m

Intel logo teaserBrace of Intel SSDs imminent

Low- or high-end as you prefer

AMD unmasks Opterons of servers future

Faces for 'Magny Cours' and 'San Marino'