![]() | You are viewing Log in Create a LiveJournal Account Learn more | Explore LJ: Life Entertainment Music Culture News & Politics Technology |
| People who might not suck 20 most recent entries |
Dot dot dot. post a comment
Canadian Economics Superhero Larry Smith has a new article out, addressing burning economic question questions such as:
I wrote something for someone else that probably shouldn't be posted publicly (it concerns HR practices at my company), and it expended my writing energy for the night, so instead, we get this placeholder.
Happy 2010 everyone :) post a comment
As if being 25% down wasn't annoying enough, now half of the elevators in my building are out of commission. Empirically, it seems travel times are up by 5 minutes until they get fixed.
So much win:
So Kodak has built an appliance for letting complete strangers (a) browse your family photos, and (b) beam shock porn directly into your living room! GOD BLESS AMERICA! This all works because the appliances won't connect to (e.g.) Flickr directly, they only phone home to Kodak's server, which then proxies all of the requests. But at least they're using OAuth instead of making you type your Flickr password into Kodak's server. This is a little surprising, actually, given the tip-top job their security engineers did of designing the rest of the infrastructure of this product line. I guess I ought to add a WebCollage source to generate random Kodak MAC addresses for use as an image source!
With a flight at 8:20 p.m. and given the added security measures/restrictions, I thought leaving the house at 5:00 p.m. would probably suffice, but wanted to leave at 4:00-ish to be prudent (especially with rush hour traffic) - we could get an early dinner along the way.
A couple of good stories I came across today:
"I have the Internet in my pocket!
DNA Lounge update, wherein we bend over and take it. 10 comments | post a comment
I just spent a few days of my Christmas vacation writing a new program, bup. bup is a program that backs things up. It's short for "backup." Can you believe that nobody else has named an open source program "bup" after all this time? Me neither. It also has almost no other meanings. Despite its unassuming name, bup is pretty cool. To give you an idea of just how cool it is, I wrote you this poem:
Bup is teh awesome Hmm. Did that help? Maybe prose is more useful after all. Reasons bup is awesome bup has a few advantages over other backup software:
(The README actually has a more detailed example.) Try making a remote backup: tar -cvf - /etc | bup split -r myserver: -n my-etc -vvTry restoring your backup: bup join -r myserver: my-etc | tar -tf - (On myserver) look at how much disk space your backup took: du -s ~/.bupMake another backup (yes, that's exactly the same command): tar -cvf - /etc | bup split -r myserver: -n my-etc -vv Look how little extra space your second backup used on top of the first: du -s ~/.bupRestore your *first* backup over again (the ~1 is git notation for "one older than the most recent"): bup join -r myserver: local-etc~1 | tar -tf -What's next? I have lots of plans for this lovely program, in the event that I actually get time to implement them. But if you think it's cool, please feel free to git clone it, hack away, and send some patches! Read the README for a list of some deficiencies in the current release. I'm sure there are also more deficiencies that I don't know about, of course. (Previous poetry-related adventures.) Update (2010/01/05): Commentary at ycombinator news and reddit programming. To answer the most common question: it's different from most of those other apps you mention because: a) bup backs up really huge files rather than silently ignoring them or running out of memory; b) bup is a backend, not a GUI, while most of those apps are GUIs (which could use bup as a backend if they wanted); c) bup stores its backups in big packfiles, rather than a one-file-per-file model, and thus can be much faster (but 0.01 isn't optimized yet). Update 2 (2010/01/05): By popular demand (well, nonzero demand from the populace, anyway), I've created a mailing list. You can subscribe by sending an email to bup-list+subscribe@googlegroups.com (note the weird + character in the email address). post a comment
Extremely short music reviews: Lady Gaga edition. Extremely short furniture reviews: IKEA edition. TERJE folding chair: Superb value for money; I actually find them comfortable. The colour is too orange, though; a paler birch or ash finish would suit better. NORDEN gate-leg table: I don't know if it's my discount edition, but quality control was extremely poor, with half a dozen pilot holes (mostly in the drawers) misaligned and therefore not only useless, but actively destructive to properly-placed ones. Alignment of the leaves is also out, as they don't pop on to the pegs on the gate legs for stability like they're supposed to. Love the concept, but the execution could be better. LILLBERG coffee table: Do not attempt assembly without a partner and some kind of drill/electric screwdriver. There are six two-inch screws that secure cross-members to the solid birch laminate tabletop, and I drove them all by hand. There were pilot holes, yes, but I woke up the next morning at 4AM with both forearms burning. Now assembled, it's a good table, though. Toaster get! I went to the Yonge and Bloor Bay and picked up a Cuisinart two-slicer for a mere $42. I can now make...TOAST! While there, I decided to patronize the "restaurant". How was it? Godforsaken. Once a feature and jewel of any department store worth its salt, it's the cafeteria tucked away in the back of the highest level. It seems that it was recently redecorated by someone who was really trying to make a go at it (I was able to sit on a (p?)leather sofa at a coffee table), but the only other patrons appeared to be two staff on an afternoon break. Surprisingly serene, though. After four and half years with my venerable Samsung x426, I have a new phone! A smartphone, no less! It is the Huawei U7519, AKA the T-Mobile Tap. Supposedly the most popular phone in China, it runs some kind of Java-based OS. I have Opera Mini and the GMail app installed already, but I'm on the lookout for more. Oh yes, I'm with WIND Mobile now. Up yers, Rogers. Service is cheap! $50 per month includes (truly) unlimited data! Although I am having trouble getting it to tether to my Ubuntu-running netbook, that's included, too. My number is unchanged, so I can retain my precious 416 digits and stay out of the second-class 647. WIND is super-flexible, so if it turns out this isn't for me, changing things around will be a snap, because there's no contract! It does mean the latest and greatest phones come at full price, though...from them. Apparently they're totally cool with you transferring your SIM to any AWS-capable device, which means a T-Mobile branded Android phone may be in my future if I find myself smartphoning often. For now, the Tap strikes me as an excellent entry-level device to see if this kind of thing is for me. So far: - I wish it had a slide-out keyboard - I wish T9 worked for punctuation, too - Screen response is not terrible - Call quality is perfectly acceptable - Performance is slightly pokey but not awful Phew! And that's my backlog cleared for now. Work tomorrow. Waaaaah. No more vacation until March at PAX time, although there is Family Day in February... 2 comments | post a comment
So let's say you've got a database with a 100k rows of 1k bytes each. That comes to about 100 megs, which is a pretty small database by modern standards. Now let's say you want to store the dumps of that database in a version control system of some sort. 100 megs is a pretty huge file by the standards of version control software. Even if you've only changed one row, some VCS programs will upload the entire new version to the server, then do the checksumming on the server side. (I'm not sure of the exact case with svn, but I'm sure it will re-upload the whole file if you check it into a newly-created branch or as a new file, even if some other branch already has a similar file.) Alternatively, git will be reasonably efficient on the wire, but only after it slurps up mind-boggling amounts of RAM trying to create a multi-level xdelta of various revisions of the file (and to do that, it needs to load multiple revisions into memory at once). It also needs you to have the complete history of all prior backups on the computer doing the upload, which is kind of silly. Neither of those alternatives is really very good. What's a better system? Well, rsync is a system that works pretty well for syncing small changes to giant files. It uses a rolling checksum to figure out which chunks of the giant file need to be transferred, then sends only those chunks. Like magic, this works even if the sender doesn't have the old version of the file. Unfortunately, rsync isn't really perfect for our purposes either. First of all, it isn't really a version control system. If you want to store multiple revisions of the file, you have to make multiple copies, which is wasteful, or xdelta them, which is tedious (and potentially slow to reassemble, and makes it hard to prune intermediate versions), or check them into git, which will still melt down because your files are too big. Plus rsync really can't handle file renames properly - at all. Okay, what about another idea: let's split the file into chunks, and check each of those blocks into git separately. Then git's delta compression won't have too much to chew on at a time, and we only have to send modified blocks... Yes! Now we're getting somewhere. Just one catch: what happens if some bytes get inserted or removed in the middle of a file? Remember, this is a database dump: it's plaintext. If you're splitting the file into equal-sized chunks, every chunk boundary after the changed data will be different, so every chunk will have changed. This sounds similar to the rsync+gzip problem. rsync really sucks by default on .tar.gz files, because if a single byte changes, every compressed byte after that will be different. To solve this problem, they introduced gzip --rsyncable, which uses a clever algorithm to "resync" the gzip bytestream every so often. And it works! tar.gz files compressed with --rsyncable change only a little if the uncompressed data changes only a little, so rsync goes fast. But how do they do it? Here's how it works: gzip keeps a rolling checksum of the last, say, 32 bytes of the input file. (I haven't actually looked at what window size gzip uses.) If the last n bits of that checksum are all 0, which happens, on average, every 2^n bytes or so, then toss out the gzip dictionary and restart the compression as if that were the beginning of the file. Using this method, a chunk ends every time we see a conforming 32-byte sequence, no matter what bytes came before it. So here's my trick: instead of doing this algorithm in gzip, I just do it myself in a standalone program. Then I write each chunk to a file, and create an index file that simply lists the filenames of the required chunks (in order). Naturally, I name each chunk after its SHA1 hash, so we get deduplication for free. (If we create the same chunk twice, it'll have the same name, so it doesn't cost us any space.) ...and to be honest, I got a little lazy when it came to creating the chunks, so I just piped them straight to git hash-object --stdin -w, which stores and compresses the objects and prints out the resulting hash codes. An extremely preliminary experimental proof-of-concept implentation of this file splitting algorithm is on github. It works! My implementation is horrendously slow, but it will be easy to speed up; I just wrote it as naively as possible while I was waiting for the laundry to finish. Future Work For our purposes at EQL Data, it would be extremely cool to have the chunking algorithm split based only on primary key text, not the rest of the row. We'd also name each file based on the first primary key in the file. That way, typical chunks will tend to have the same set of rows in them, and git's normal xdelta stuff (now dealing with a bunch of small files instead of one huge one) would be super-efficient. It would also be entertaining to add this sort of chunking directly into git, so that it could handle huge files without barfing. That would require some changes to the git object store and maybe the protocol, though, so it's not to be taken lightly. And while we're dreaming, this technique would also be hugely beneficial to a distributed filesystem that only wants to download some revisions, rather than all of them. git's current delta compression works great if you always want the complete history, but that's not so fantastic if your full history is a terabyte and one commit is 100 GB. A distributed filesystem is going to have to be able to handle sparse histories, and this chunking could help. Prior Art I came up with this scheme myself, obviously heavily influenced by git and rsync. Naturally, once I knew the right keywords to search for, it turned out that the same chunking algorithm has already been done: A Low-Bandwidth Network Filesystem. (The filesystem itself looks like a bit of a dead end. But they chunk the files the same way I did and save themselves a lot of bandwidth by doing so.) post a comment
There is a certain thematic continuity to the first few.
The male duck's penis is spiral-shaped: like a corkscrew, it twists in a counter-clockwise direction so that sperm will target the oviduct on the female's left-hand side. In almost all birds only the left ovary is functional, but in a 2007 study, Brennan and colleagues noticed that in ducks the female's vagina twists in the opposite direction. Previously, previously, previously.
There are 15,740 social media gurus on Twitter. 445 social media gurus 17 comments | post a comment
2009 was a pretty awesome year in many ways. Got engaged, got married, to name a couple of significant ones. :) I also lost two grandparents, and had a really rough time at my job. But mostly the year was good.
So, I had a great time at the New Years Eve party at |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||