Paul Haahr’s Blog

Almost never updated, but nevertheless…

Four years off the air

Posted on December 30, 2024 by Paul

As much as I admire blogging, I just don’t do it very often. When this site got hacked (again) four years ago, I upgraded it and, in trying to make it a little more secure, did something which broke it. And then I left it in that state for years, never finding the energy to debug what I’d done. I finally spent the time to undo the damage and everything seems to be working again. Along the way, I finally upgraded to https. (Yay for Let’s Encrypt!)

Does this mean I’ll blog more? Probably not. Should I delete some of the old entries here out of embarrassment? Probably, but I doubt I will.

Wordle Tweets (Part 2)

Posted on February 22, 2022 by Paul

[As with the previous post, this thread was originally posted on Twitter, but since I’ve stopped using that platform, I’m retroactively publishing it here.]

A short thread on *ordle games and parallel computation. /1

Going from Wordle…

… to Dordle …

… to Quordle …

… is a nice analogy for SIMD (single instruction multiple data) parallel programming. /5

Vector processing is the best known form of SIMD, where your data changes, but all operations are done in parallel. And that’s effectively what the higher n variants of Wordle look like. /6

Each “operation” in this SIMD analogy is a guess and, because you use fewer guesses per word, you can see the benefit of parallelization, but only because there are more words to reveal. /7

Just looking at Wordle, Dordle, and Quordle, they each allow n+5 guesses for an n word puzzle. However, as n goes up, I think you could reduce the constant factor; with an n>100 or so, I suspect n+2 or n+3 guesses would be sufficient for all games. /8

That is, make two initial guesses (e.g., RAISE and COUNT as I do). At that point, the process of elimination should enable correctly identifying a few of the unknown words. Filling in the letters and positions for those words, will give enough information for a few more words… /9

And so on until you’ve guessed everything. /10 (fin)

Wordle Tweets (Part 1)

Posted on January 21, 2022 by Paul

[I wrote this originally as a thread on Twitter, but since I’ve stopped using that platform, I’m retroactively publishing it here, slightly reformatted.]

A thread about Wordle, letter frequencies, and how the choice of corpus matters for textual analysis.

I’ve only been playing Wordle for about two weeks, so I can’t claim any expertise, but it tickled some old interests of mine and then I got a little obsessed. /1

Wordle’s rules are similar to those of one of my favorite games as a kid, Mastermind, except Wordle requires that guesses and answers be words rather than colored pegs. This is a big difference, which I’ll say more about below! /2

It’s clear that the starting guess in Wordle is important and there are articles talking about picking good ones. (E.g., The Best Starting Words to Win at Wordle.) /3

The general idea is you want to use common letters for two reasons: one, they’re more likely to be in the answer, and two, if you learn they aren’t in the answer, you can eliminate a large swath of words. /4

That letter frequency analysis works in Wordle is one reason it’s different from Mastermind: there’s no a priori distribution of color pegs which can lead to a useful strategy. /5

Back in third or fourth grade, I got a copy of Herbert Zim’s Codes & Secret Writing from the Scholastic Book Club and its list of most common letters – ETOANRISH, in order – is cemented in my brain. (I have a a weaker sense that DLU come next.) /6

Based on that, when I started playing, my first pair of words would be ORATE and CHINS. They get all of Zim’s first nine, with three vowels in the first word and a “C” thrown in. Not too bad. /7

I switched for a few games to HATER/BISON for reasons that don’t make much sense (“B” is rarer than “C”), but missed having the three vowels in the first word. /8

Then I got a little empirical and asked myself “Is Herbert Zim’s 1966 frequency table still right?” Language doesn’t change very quickly, but this is 2022 and it’s easy to check for yourself. /9

I found free, sample corpus of news articles on the web that could be downloaded, did a very little of scripting and, for that corpus, the first half of the alphabet by frequency is ETAOINSRHLDCU. /10

(By the way, Peter Norvig, a friend, colleague, and mentor of mine across two jobs and my boss in my early years at Google, did a similar analysis based on Google Books with similar results.) /11

ETAOINSRHLDCU is pretty similar to Zim’s order – with I and S moving up a little and R moving down. Nothing too interesting yet. /12

But then I started thinking about Wordle specifically and the frequency of letters in general text is not that interesting for the game, since Wordle only uses five-letter words. /13

If we restrict to five-letter words in the same corpus, we get a very different distribution: EARTSOIHLNDUC. Vowels (except U) are relatively more popular and T and N drop a lot. /14

Why T and N? Well, here are the most common ten words in this article corpus, with counts:

96786 the
44855 to
44855 of
41891 and
35282 a
32805 in
17630 that
16638 is
16080 for
13759 on

/15

None of those words are five letters, but the distribution of letters in those words, because they’re so common – they’re the “head” of a “long-tailed” (or Zipfian) distribution – strongly affect the frequencies of letters in English text. /16

To give one example, looking at the full article corpus but dropping just the word “the,” the most common letters become EATOINSRLDHCU: without “the,” T drops from 2nd to 3rd and H from 9th to 11th. /17

Definite articles like “the” (or “le”/“la” in French) are very common words. The effect in English is probably larger than other languages because it’s non-gendered, so there’s only one such word. (“a”/“an” are split, but not due to gender.) /18

One anecdote about the outsized effect of “the” on English… /19

In its early days, Google didn’t index “the”: it was omitted from both docs and queries. Circa 2000 – before my time – they started indexing it, but to not reduce index size, that cost adding ~25 machines to each thousand-machine serving cluster for the extra data. /20

Getting back to Wordle, all these issues of word frequency actually make the letter frequency tables much less relevant, because Wordle is drawing from some list of five-letter words where the frequency of those words (probably) doesn’t matter. /21

Why not? My assumption is that Josh Wardle is choosing words that are common enough that people don’t say “Is that really a word?” but doesn’t care that COAST is twice as frequent as METAL in a corpus of news articles, Either COAST or METAL is a fine Wordle word. /22

This means computing letter frequencies on general text – even if restricted to just the five-letter words – will probably mislead you if you’re trying to guess words in Wordle. /23

These are the ten most frequent five-letter words in the corpus I’m using:

4994 their
3679 which
3609 there
3430 about
3218 would
2698 after
2292 other
2155 first
2065 years
1874 could

Lots of THs! /24

So, instead of using a corpus of articles, a simple wordlist with no frequency information should give more useful insights. I’m using the /usr/share/dict/words that comes with MacOS and dropping words that contain capital letters (proper nouns) or other symbols. /25

There are 8497 five-letter words on that list meeting my criteria. But, many of the words wouldn’t be fun if they were the answer in Wordle: AALII, TOYON, CHORT, LENIS, SERUT, etc. If I had to guess, at least 50% would be too rare for the game. /26

But, which 50%? I thought about intersecting the list with words that passed some frequency threshold in the news corpus, but this is just an approximate exercise, so I went with the list as is, TOYONs and all. /27

The most common half of the alphabet in that list is AEROISTLNUYCD. That’s quite different from Zim’s list. A is more frequent than E! T and H are way down in popularity! (H drops to #14.) /28

Based on this list, I revised my opening Wordle guess to AROSE. Unless I have some insights based on the results of my first guess, I follow that with UNTIL (or UNLIT). /29

Can one do better? Well, machines can, especially if they know the word list (or a reasonable superset). How? By exhaustively keeping track of which words are impossible once the results are known from previous guesses. /30

(Spoiler alert: what follows are examples from a few days ago. By the time you’re reading this, if you’re playing Wordle, you’ll have safely seen them. But maybe you’re somehow saving Wordle games for later?) /31

For example, if today’s word were POLAR and you guessed AROSE, the YELLOW/YELLOW/YELLOW/GREY/GREY result only matches 117 words from my dictionary. /32

Similarly, if today’s word were SHIRE and you guessed AROSE, you’d get a GREY/YELLOW/GREY/YELLOW/GREEN that would match only 25 words on that list. /33

For a human, picking what word would distinguish best among those 117 or 25 words is a hard task, but a computer can just try all the words and see what reduces the number of possibilities most for the next round. It turns out, the space of words is very sparse. /33 [Oops! Two 33s!]

This gets at another difference between Wordle and Mastermind. In Mastermind, any combination of colored pegs is acceptable. In Wordle, it’s only (roughly, at most) 8497 words out of 26^5 = 11,881,376 possible combinations of five letters, or roughly 0.07%. /34

Based on some automated exploration, I think RAISE may be the “optimal” initial guess if the word list I’m using is close enough to the actual one used in Wordle. But, there’s not as “nice” a generic followup guess for it as AROSE/UNTIL; COUNT or MOUNT come close. /35

At worst, after guessing RAISE/COUNT, there would only be 98 possible word choices, for my word list. With MOUNT as the second word, it’s 90. HOTLY would reduce the possible set to 83 words and POUTY or PYLON, 89. /36

Since I switched to RAISE/COUNT as my default guesses, I’ve been solving the daily Wordles in three or four guesses, which feels a little better than before. But it may just be luck of the draw, in terms of the word of the day. /37

Summary:

While Wordle has similar rules to Mastermind, using words makes it very different.
Letter frequencies based on list of unique words versus real text differ a lot.
Only ~0.07% of five-letter combinations are words.
AROSE/UNTIL or RAISE/COUNT.

/38 (fin)

Jorn Haahr, 1935-2019

Posted on December 31, 2019 by Paul

My father, Jorn Haahr, died this month. He was a kind, gentle, smart man, who I learned so much from. If you ask people about him, after hearing adjectives like “kind” or “generous,” I think they’d tell you about how he was able to fix anything or how he’d go out of his way for you. He was a wonderful, loving father, who always showed his love in his actions.

For example, being willing to pick up his teenage or twenty-something son with a ride from anywhere, at any time. And being very matter of fact when I screwed up, in big or little ways, always focusing on the practical question of “What do you do next?” Dad was not one for beating yourself up about a mistake.

My father was born in Skive, Denmark, the youngest of three brothers, and grew up during Denmark’s occupation in World War II. He met my mother when she was studying in Copenhagen and he came to the U.S. to be with her. He worked as a power systems engineer, first in Boston, then in New York. He and my mother provided a happy, encouraging, loving family for my sisters and me.

I got to spend a lot of unstructured time with my father in my late teens, what can be a tough age for fathers and sons, with summer jobs in or near his office in. When I worked in his office, I could see that colleagues valued him for being the same always-competent, always-calm person that he was at home. We usually commuted together, taking a bus from Riverdale to Lower Manhattan, giving us a chance talk casually, more like peers than we ever had been. It was a different environment to be with my Dad in and helped set an easygoing, accepting tone for our adult relationship.

(Susan, now my wife, would later sometimes commute with my Dad by train and have similar unstructured times with him. It’s a very scary thought to have your Dad and your girlfriend talking without being there, but I know she always appreciated him from that time.)

When I first moved out to California – and was quite a mess – my Dad ended up visiting me four times in a not-quite year. Now, he’d been working in a remote office for a Palo Alto company for a few years at that point and I don’t think he’d ever visited them before, but, in that time, he found reasons to make a trip to the head office roughly once a quarter. And we got to do things together – adventures like walking across the Golden Gate Bridge and getting lost on the way to Sausalito – in a setting where he let me lead.

My father’s last several years were very hard, caused by a surgery that lead to a series of medical catastrophes. This man who had been vital and energetic until he was seventy-eight was left largely incapacitated. As much as the actual end makes me sad, I’m equally saddened by how much he – and we all – lost earlier.

What stands out about his personality to me is his humility and his innocent goodness. I realized that my Dad will always be the biggest part of my conscience. It is his voice in my head that I’m arguing with when I’m bending the rules or embarrassed about something I’ve done. And remembering his skeptical look of “Is that really what you want to do?” keeps me honest.

Dad, I love you and I miss you. And I always want to be the person you’d have wanted your son to be.

Three Moby Dicks of the Internet Age

Posted on January 8, 2018 by Paul

A couple of years ago, I developed a fascination with Moby Dick, thanks to three creative versions of the book that couldn’t have existed without the Internet. What I knew about the novel before then was just what you learn from American pop culture – white whale, Call me Ishmael, obsessed one-legged captain, etc. And I’d seen the Gregory Peck movie in a hotel room twentysomething years ago. But that was it – the book loomed as an edifice I had no interest in climbing.

The novel had not been in my consciousness for many years, when I saw the debut article from Clickhole go by in my Twitter feed. Seeing the headline “The Time I Spent On A Commercial Whaling Ship Totally Changed My Perspective On The World” made me think “They didn’t, did they?” And, sure enough, they did:

The Time I Spent On A Commercial Whaling Ship Totally Changed My Perspective On The World

This clickbait article is the entire text of the novel. For whatever reason, perhaps just because I found the idea so funny, I started reading it. And what I discovered was an approachable, entertaining voice that I enjoyed. The book moved to the “read this one day” category. (Alas, that day never comes for many books.)

A while later, I was working on an emoji-related project. (These things happen at Google.) After we launched, I was looking for gifts for the team and I came across Emoji Dick; or 🐳. Emoji Dick is a crowdsourced and Kickstarter-funded translation of the novel into, well, emoji. For example, the famous first line is rendered as “☎️👨🏻⛵️🐳👌.” You can argue with the translations – and good luck to anyone trying to read the book in emoji only – but it’s an impressive effort. And I thought that the book, along with some pillows, would be the perfect way to say thank you to the team:

Around the same time, I came across a mention of the Moby Dick Big Read, a 2011 podcast of all 136 chapters of Moby Dick. Each chapter is read by a different actor, writer, or scholar and each is accompanied by a piece of art from a different artist, all made available for free. So I started listening. And I was hooked.

The individual readers vary a lot in quality, but the best give riveting performances. I’d like to call out four of my favorites:

Tilda Swinton’s Loomings, (Chapter 1) immediately drew me in. Her voice is haunting and philosophical, almost eternal, and the egotism of Ishmael is right at the surface.

Simon Callow gives The Sermon (Chapter 9) as a thundering, fire and brimstone sermon.

John Cleave reads The Quarter-Deck (Chapter 36), where we first meet Ahab and the s**t gets real. Both the text and the reading are absolutely gripping.

And Will Self reads The Whiteness of the Whale (Chapter 42) with wide open vowels that seem to harken back from across the centuries.

The Moby Dick Big Read is a truly wonderful contribution to the world; if you’re looking for a long-but-compelling audiobook, I highly recommend it. And, of course, Moby Dick really is a book of astounding depth and humor. Describing it as “The Great American Novel” seems entirely deserved.

While none of these homages to Moby Dick could have existed without the internet, they also couldn’t have existed if Moby Dick were still in copyright. The public domain is a valuable space, called out by the US Constitution’s limitation of copyright and patent to “limited times.” Yet copyright has not expired on any works in the US since 1978, meaning that while remix culture can play with Moby Dick, works like The Great Gatsby or 1984 remain mostly out of bounds.

Though I’ve now listened to Moby Dick, I still haven’t read it. I want a little more time to pass since listening to the Big Read before I take that on. Maybe this year, maybe next, I’ll buy a nice hardbound edition and read the actual book. I’m looking forward to it.

Thanks to Fred Benenson for Emoji Dick, Angela Cockayne and Philip Hoare for the Big Read, and whichever unnamed prankster at Clickhole came up with the idea of turning Moby Dick into clickbait. And, of course, to Herman Melville. You’ve all given me much pleasure.

On Presidents’ Day, Appreciating Barack Obama

Posted on February 20, 2017 by Paul

No other President – no other politician – in my lifetime has meant as much to me as Barack Obama. While I think policy is important and I agreed wholeheartedly with his agenda, it is more than that. And while, as Kevin Drum writes, Obama was very effective in office, being pleased with what what he accomplished is not a sufficient reason either. Nor is the historic nature of his presidency.

Part of my connection to Obama is simply part of being the same generation – I could identify with him in a way I haven’t with many other politicians. But, in the end, it comes from respecting his approach and style. Obama’s ability to be the responsible adult, to approach the world rationally, to deal with crises without overreaction, and to treat the public intelligently is what I want in a civic leader. He’s the first President I’ve known that made me think “I want to act like him.”

The anger and hatred Obama generated in parts of America (and very few other places in the world) still astonishes, enrages, and saddens me. I realize that roughly half of America opposes modern liberalism, but the personal vitriol against a leader who was so smart and dignified in office will go down in history as a huge mistake, a resurgence of the worst of America.

I’m not beyond acknowledging Obama’s flaws. Primarily among them for me was his separation from the rest of the political system. That’s not an issue of his avoiding glad-handing on the Washington circuit, but his inability to bring electoral victories for his party when he was not on the ballot. A more successful version of Obama would have left a much stronger party behind. Yes, that blames the Democrats’ deficiencies on Obama, but an effective party leader can build a deep bench and Obama did not do that.

On Presidents’ Day, I need to acknowledge how much Obama and his presidency have meant to me. I do not expect to find a politician who I can feel that way about again, simply because it is so unlikely for another successful politician to bring together the same set of skills. But this was a special eight years, an era of optimism and promise.

“Thank you, Barack Obama.”

The Election

Posted on November 11, 2016 by Paul

I’m still reeling from the election of Donald Trump. I’m saddened, disillusioned, angry, and, most of all, scared. I don’t think it will lead to direct harm for me or my family, at least immediately, but I fear for America and the world. That sounds like hyperbole, but elections have consequences and this one looks all bad to me.

It’s hard to draw too many big conclusions from such a close election – especially one so close that popular vote likely disagreed with the electoral college – but there are two which come to mind.

First, the divisions in this country – between urban and rural, between feminism and traditional views of women, between the embrace and rejection of diversity – are both starker and more evenly balanced than I had ever thought. That I can’t imagine anyone actually thinking Trump would be a good President shows how far on one side of the divide I am. Of course, the country has been very divided before, but the worst previous period of division, the Civil War, is not a hopeful example of healing. (That today’s fault lines still largely follow those of the Civil War is not surprising.)

Second, celebrity and charisma are probably more important than political scientists have ever acknowledged. Jesse Ventura and Arnold Schwarzenegger were harbingers of the power of celebrity in elections – and, of course, Ronald Reagan started his career with his celebrity, before working his way up. But Trump’s rapid rise from a TV show to President-elect shows how powerful celebrity can be.

Charisma and celebrity are tightly intertwined. I don’t see Trump’s charisma. Every video of his rallies made me wince; I saw narcissism, vacuous promises, and incitement of hatred. But anyone who can carry a successful TV show for a decade clearly is attractive to a large number of people. And his rallies inspired throngs. It may be the charisma of a demagogue, but it is charisma.

Thinking about presidential elections, you probably have to go back to 1972 to find one where the less charismatic candidate won. The political scientists and insiders who believed that policy matters, that money matters, that Get Out The Vote matters, that endorsements matter were wrong, at least in a presidential election. At most, those can be proxies. Emotional connection to a large group of voters matters; charisma may be the most direct way for that to happen.

In the days leading up to the election, I was a mixture of complacent and panicked. I thought the complacency was rational, given both the polling and my belief that voters couldn’t really fall for Trump, and the panic was irrational, based more on the fear of a Trump presidency than its likelihood. I was wrong – panic was rational, complacency was irrational.

What now? First, family and friends. My whole community seems to be despairing. We need to strengthen and support each other.

But, also, I need to find ways to make the world a better place. I’m privileged in that my job lets me feel like I am doing good things – and I believe that I am. But it’s not sufficient now. I don’t know what else it will be, but I need to do more.

Mourning for the crew members of the Aqua Amazon

Posted on July 21, 2016 by Paul

The Aqua Amazon, the night we boarded.

My family and I recently returned from a three week trip to Peru, where a highlight of the trip had been an Amazon cruise on a ship called the Aqua Amazon. It was an amazing experience, filled with wildlife and scenery the likes of which I hadn’t seen before.

This past Saturday, there was an explosion while the ship was refueling. It sank and, according to news reports, eight crew members were lost and more were seriously injured.

I got to know some of these people a little and I’m shocked and sorrowed to hear of this tragic accident. This was a friendly, caring, talented group. I’m overcome with grief for them. My thoughts are with the injured and the families of the victims.

I’m excited to have voted for Hillary Clinton

Posted on June 7, 2016 by Paul

Vote for Hillary Clinton

Today is the California Primary. Usually, a presidential nominating contest is long over by the time California votes. And, in most ways, it is already over this year, too. But both candidates are campaigning as if California matters, so I voted that way.

I’m very excited to have voted for Hillary Clinton. She’s running as an unashamed pragmatic liberal, which is how I identify myself. She’s very savvy about how to make government work. From all perspectives I can see, she’d be an effective leader and would take the country in good directions on climate change, healthcare, the economy, foreign policy, and social justice.

On the other hand, I find Bernie Sanders very appealing, too. My beliefs on economic policy and foreign intervention are probably closer to his than to Clinton’s. And, if we were looking at a Senate with 65 Democratic votes and a 55% majority Democratic Congress, I could see voting for Sanders. But his agenda seems impossible to advance with a closely held legislature, let alone the Republican majorities we have today, and I don’t see him being effective in those circumstances. To use a term that’s usually pejorative, I want to elect someone from “The Establishment” now.

Is Hillary perfect from my perspective? Of course not. I think she’s too hawkish on foreign policy, as epitomized by her vote for the Iraq war, which was my most significant policy reason for not supporting her in 2008. And I have grave reservations about electing the spouse of a former president – America should not have dynastic habits. But, Hillary Clinton is smart, well qualified, and hard working enough to justify her election.

And, finally, it is important to remember that electing a woman to the American Presidency would be historic. When America didn’t allow women the vote for more than a hundred years and more than fifty Presidential elections have gone by with no women nominated by either major party, even her nomination is an important accomplishment to recognize.

The question I asked Justice Scalia

Posted on February 14, 2016 by Paul

When I was an undergrad, I took a Constitutional Interpretation class taught by Walter Murphy. For a guest lecture, Professor Murphy brought in Associate Justice Antonin Scalia, who had only joined the Supreme Court a couple of years earlier, to talk about originalism, his legal theory. Being taught by a Supreme Court Justice was, of course, a special occasion. Even more so by virtue of it being Scalia, who was already famous (or infamous) and controversial. And, to top it off, Justice Scalia would take questions from the class after his lecture.

To a liberal like me, the opportunity to ask Scalia a question was too good to pass up. And I knew exactly what I wanted to ask about. I spent a little time doing research so I could formulate the question well. In the end, I asked something like “The Constitution never mentions corporations or talks about giving the rights of persons to non-persons. Yet, the court ruled in Santa Clara County v. Southern Pacific Railroad Co. (1886) that corporations have the rights of persons. Doesn’t this go against the original meaning of the text?”

The Justice’s response was terse. I can’t claim to remember the exact wording, but it’s stuck with me as “That’s settled law. Move on.”

While I thought the “settled law” response was a little arbitrary, in a nation that values stare decisis and precedents, it makes sense. The question, of course, is how you decide that something is “settled law” and, therefore, should not be tampered with; or, in contrast, that a precedent so violates the original meaning of the Constitution that it must be overturned.

And there, lies, for me, Scalia’s hypocrisy. Where was the reverence for settled law in Heller or Citizens United? And why the respect for precedent in Obergefell v. Hodges?

In the end, I’m sure that Antonin Scalia – who criticized the opinion in Atkins v. Virgina for “rest[ing] so obviously upon nothing but the personal views of its members” – believed that he kept his political and legal beliefs separate. But the conclusions he reached about whether precedents were “settled law” or not appeared to coincide quite closely with his political views.