Matt Jadud2023-01-01T20:20:42+00:00https://jadud.com/Matt Jadudmjadud@bates.eduinfrastructure in a box2022-01-17T00:00:00+00:00https://jadud.com//p/infra-in-a-box<p>Some people might call it “digital infrastructure.” Maybe it is “information architecture.” Perhaps it is “tools to organize with.” Regardless, the question is “how do you manage comms and complexity in the digital world?”</p>
<p>I’m going to walk through some common pain points in organizing people around tasks, and offer a set of tools that help address those pain points. I’m taking an opinionated view in this case:</p>
<ul>
<li>I favor <strong>open tools</strong> over closed tools. This means that the software is free/open source, suggesting I won’t be stranded if the company producing the tool goes away, or have my data locked away from me.</li>
<li>I favor <strong>self-hosting</strong> over commercial offerings. The tools you use online are like the place you live: do you want to rent, or own? Each have their tradeoffs, but when I “own,” at least I have fewer concerns about a crappy landlord.</li>
</ul>
<h3 id="table-of-contents">table of contents</h3>
<ul>
<li><a href="#scheduling-things">scheduling things</a></li>
<li><a href="#announcing-things">announcing things</a>
<ul>
<li><a href="#broadcast">broadcast</a></li>
<li><a href="#the-problem-with-email-for-announcements">the problem with email for announcements</a></li>
<li><a href="#using-forums-to-clean-up-announcements">using forums to clean up announcements</a>
<ul>
<li><a href="#separate-channels">separate channels</a></li>
<li><a href="#invite-only">invite only</a></li>
<li><a href="#limit-read-access">limit read access</a></li>
<li><a href="#limit-posting">limit posting</a></li>
<li><a href="#controlled-subscriptions">controlled subscriptions</a></li>
<li><a href="#digests">digests</a></li>
<li><a href="#summary">summary</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#coauthoring-documents">coauthoring documents</a></li>
<li><a href="#making-lists-tracking-data-organizing-stuffs">making lists, tracking data, organizing stuffs</a></li>
<li><a href="#asking-for-input">asking for input</a></li>
<li><a href="#comms-tracking">comms tracking</a></li>
<li><a href="#getting-things-done-kanban">getting things done (kanban)</a></li>
<li><a href="#-and-the-kitchen-sink">… and the kitchen sink</a>
<ul>
<li><a href="#url-click-tracking">url click tracking</a></li>
<li><a href="#wiki">wiki</a></li>
<li><a href="#dropbox">dropbox?</a></li>
</ul>
</li>
<li><a href="#wrapping-up">wrapping up</a></li>
</ul>
<h2 id="scheduling-things">scheduling things</h2>
<p>Scheduling things is hard. How do you find a time for six people to all meet at a given time? <a href="https://doodle.com/en/">Doodle works</a>. I haven’t found a free/open alternative that is easy to set up yet. (<a href="https://cal.com/">cal.com</a> is a contender.)</p>
<p>So, when you want to find a time for a bunch of people to meet, use <a href="https://doodle.com">doodle.com</a> or <a href="https://cal.com">cal.com</a>.</p>
<h2 id="announcing-things">announcing things</h2>
<p>A big part of organizing is just <em>announcing things</em>. “We’re meeting to discuss X at time Y with agenda Z.”</p>
<p>This is hard. You’d think it would be easy, but it isn’t.</p>
<p>There’s a few different kinds of announcing things. Each depends on what kinds of communications you want as a result.</p>
<h3 id="broadcast">broadcast</h3>
<p>If you want to announce things, and are <em>not worried about responses</em>, then you have a couple of tools available.</p>
<ul>
<li><strong>A mailing list</strong>. It does not matter if you do this by putting everyone’s name in a <code class="language-plaintext highlighter-rouge">BCC</code> field, or if you use a fancy tool for doing mailings. When you broadcast information via email, lets agree to call it a mailing list.</li>
<li><strong>A blog</strong>. A blog is good for long-form pieces, content with images, or anywhere that you want to have your words have space to breathe and some notional permanence on the web. It can be coupled with a mailing list (the list can provide the TL;DR, and drive traffic to the blog), or it can stand alone. (All of these can be coupled, so I’ll stop saying that now…)</li>
<li><strong>Social media</strong>. It’s harder to restrict replies with social media; some tools can, some can’t. I consider all of them evil in one way or another, so I would say that if social media is your <em>primary</em> means of communicating your message, then you do not have control over your message, because you <em>absolutely do not</em> have control over your medium.</li>
<li><strong>Text</strong>. You can automate texts. You could treat them like really short posts, or perhaps use them to drive people to links, or remind them about meetings. This is a bit tricky, but it is imminently doable.</li>
<li><strong>Phone calls</strong>. Yes, phone calls can be automated, too. You could, perhaps, offer a subscription service where a bot calls someone and reads the message you have. :shrug: I’m not sure what this does for you, but it is possible.</li>
<li><strong>Podcasts</strong>. These are expensive, mostly in time. You do need a decent mic, and they get expensive because they <em>take so much time</em>. If you want a 20 minute message, you need to record, edit, and ship a 20 minute meesage. The same message that took (at least) 20m to record will also take 20m to listen to. But, in some cases, they might be a good broadcast medium.</li>
<li><strong>Vlog</strong>. Yes, you can record things and post them to YouTube. It’s… a commitment.</li>
</ul>
<p>For each of these, there are tools that can be used to help manage the process. I’m not going to go into any depth with any of them at this time. They are, however, the digital modalities I know of (off the top of my head) for broadcast media.</p>
<h3 id="the-problem-with-email-for-announcements">the problem with email for announcements</h3>
<p>It’s one thing to announce things. However, day-to-day, we often have to announce something, and <em>then</em> we need to circle around and handle people’s responses. In practice, you wish things looked like this:</p>
<ol>
<li>You correctly include everyone in your message.</li>
<li>You announce a meeting.</li>
<li>Everyone shows up.</li>
</ol>
<p>Unfortunately, the workflow for announcing things via email tends to look like this <em>in reality</em>:</p>
<ol>
<li>You try and remember to CC all the people who should know about the thing you’re announcing. (I’ll call this <em>the thing</em>.)</li>
<li>You need to include a subject that is descriptive, because this is an email. So, you do something like <code class="language-plaintext highlighter-rouge">[*the thing*] Next meeting agenda</code>.</li>
<li>You write the email. You try and put something nice at the top, and a link to the agenda, and a reminder about the zoom link. Perhaps you had a calendar link in there, too?</li>
<li>You hit send.</li>
<li>You forgot someone.</li>
<li>You send it again, but send it to everyone.</li>
<li>Someone asks (hitting “reply all”) if the meeting changed.</li>
<li>You write that person to tell them “no, you just needed to add someone to the event.”</li>
<li>You linked to the agenda, but forgot to set the sharing settings correctly. Three people hit “reply all” to say they can’t see the document.</li>
<li>You let everyone know the link is fixed, and repeat the info from the first message.</li>
<li>You made a copy-paste error on the zoom link the second time you sent it.</li>
<li>At the time of the meeting, you’re getting emails because the zoom link is broken.</li>
<li>You send a note with a correct zoom link to everyone 5 minutes into the meeting, but you’re missing two people either way, one because they got confused by the links, and another because the original message went into their spam folder.</li>
</ol>
<p>Should I go on?</p>
<p>There are a few problems here. Actually, there’s only <strong>one problem</strong>, and that is <strong>email</strong>. If you think about this from an “information architecture” perspective, your comms look like this:</p>
<pre style="font-size:small">
┌────────┐
│ │
│ agenda ├────────────────────┐
│ │ │ ┌────────────────────┐
└────────┘ │ │ │
│ ┌─────► blackberry holdout │
│ │ │ │
┌───────────┐ │ │ └────────────────────┘
│ │ │ │
│ zoom link ├──────┐ ┌───▼────┐ │ ┌────────────────────┐
│ │ │ │ │ │ │ │
└───────────┘ └┬─────► email ├──────┼─────► chromebook person │
│ │ │ │ │ │
│ └───▲────┘ │ └────────────────────┘
┌───────────────┐ │ │ │
│ │ │ │ │ ┌──────────────────────────────┐
│ date and time ├───┘ │ │ │ │
│ │ │ └─────► not so good with email person│
└───────────────┘ │ │ │
│ └──────────────────────────────┘
│
┌─────────┐ │
│ │ │
│ people ├───────────────────┘
│ │
└─────────┘
</pre>
<p>Yep. That’s right. You’ve crammed all the <em>different conceptual communications</em> channels about <em>the thing</em> into one channel (email), and then you sent it to a whole bunch of people. They have no way of replying back to a given “channel,” because it has all been <em>smooshed</em>. So, if someone replys about the date and time, it can confuse the issue on who is involved… or what the zoom link is…</p>
<p>So, you need to split apart your information architecture. You need to decouple the question of <em>who</em> from <em>what</em>, and even then, you need different kinds of “whatness” for different kinds of activities. (If that makes sense, you’re doing great.)</p>
<h3 id="using-forums-to-clean-up-announcements">using forums to clean up announcements</h3>
<p>A discussion forum is a good example of a tool that you can use for cleaning up your communication channels. Why?</p>
<ol>
<li><strong>separate channels</strong>. You can create separate “forums” or “channels” for different kinds of communications.</li>
<li><strong>invite only</strong>. You can invite people to take part, thus controlling who has access.</li>
<li><strong>limit read access</strong>. You can limit people’s access within the forum, so some people can see everything, but others might only see what they need to see.</li>
<li><strong>limit posting</strong>. You can give people permission to post in some places but not others.</li>
<li><strong>controlled subscriptions</strong>. People can subscribe to some or all of the forums.</li>
<li><strong>digests</strong>. People can get a digest of all of the activity in the forum (to which they have access) every day (or week, or…).</li>
</ol>
<p><img src="/images/posts/inabox/nodebb.png" alt="nodebb" /></p>
<p>What do these things do for us? One at a time.</p>
<h4 id="separate-channels">separate channels</h4>
<p>If you have a forum called “Meeting agendas,” and the only thing you post in that channel are meeting agendas, then everyone knows where to look for… wait for it… meeting agendas. Every time you have a meeting, you post a note in the agendas channel. Assuming everyone subscribes to that channel, they’ll get an email when you post (if they want it), and if you made a mistake… you just fix the post. If someone loses the email, they know they can go to the forum, look up the most recent agenda post, and have all of the correct information ready-to-go.</p>
<h4 id="invite-only">invite only</h4>
<p>You can control who has access to your forum. You can be the sole source of control, or you can allow anyone to invite friends… but you can be the gatekeeper on approving those invites. The level of control is much greater than firing an email off into the void.</p>
<h4 id="limit-read-access">limit read access</h4>
<p>If you have a “management team forum” for comms amongst your leadership team, you can create a group of users who have read/post access in that channel. For everyone else… they won’t even see that the forum exists.</p>
<h4 id="limit-posting">limit posting</h4>
<p>Some channels might be a “chat,” and everyone can post. Others might be for a subset of people to post to, but for everyone to read (e.g. for posting minutes from meetings observed in a community). Some channels might be for only one or two people to post to, but for everyone to read (e.g. agendas). Forums let you control who can read a channel as well as who can post. Some of this is also convention, and there may be some curation required (or “channel gardening”), but it’s not too hard.</p>
<h4 id="controlled-subscriptions">controlled subscriptions</h4>
<p>Some people like to have everything show up in their email all the time. Some people like a single note at the end of the day. Users can choose. Most forum software will allow you to interact with it via email. The nice thing is that it then manages this for you, so that users can chose to do this on a case-by-case basis.</p>
<h4 id="digests">digests</h4>
<p>What I just said. A person who gets one update per day is receiving a digest of the day’s activity on the forum. I personally like to receieve digests, because they get “pushed” to me. I can then go to the forum to read things in more detail if I need to.</p>
<h4 id="summary">summary</h4>
<p>We now have a different information architecture. Now, the agenda can be posted by a team lead to the agenda/meeting announcement channel, and everyone can read them. (But, not everyone can edit. They can’t “reply all,” etc. Only one person (or a specific group of people) can post agendas.) Meeting notes might come from a larger set of people, and they get posted into a team’s “meeting notes” area. Organization leads have a space where, if they want to post/discuss things, there is a single place for those comms… and those comms can only be read/replied to by other members of the leadership team.</p>
<pre style="font-size:small">
forum
CAN POST CAN READ
┌────────┐ ┌────────────────┐
│ │ │ │
│ agenda ├─────────────team lead─────────► agenda channel ├────►everyone
│ │ │ │
└────────┘ └────────────────┘
┌───────────────┐ ┌───────────────┐
│ │ │ │
│ meeting notes ├──────team member───────► public notes ├─────►everyone
│ │ │ │
└───────────────┘ └───────────────┘
┌───────────────────┐ ┌─────────────────┐
│ │ │ │
│ leadership stuff ├──team leads────────► planning, notes ├───►team leads
│ │ │ │
└───────────────────┘ └─────────────────┘
</pre>
<p>If you choose a tool that collapses your information architecture into a single communications channel, then you will always have communications hell. If you chose a tool that gives you multiple channels, you can use those channels to manage and compartmentalize your communications. Your comms needs may ultimately be different, or a forum may not work for some other reason. However, what is true is that if you can get your community used to interacting with a forum as opposed to a mailing list (or, worse, individually sent and maintained mailing groups), then you’re in for a win.</p>
<h2 id="coauthoring-documents">coauthoring documents</h2>
<p>It can be easy to co-author documents, as long as you’re focused on the content, and not the formatting.</p>
<ol>
<li>If everyone has Google accounts, you can use GDocs. However, this may be mixing work and personal life in a way that is uncomfortable for some.</li>
<li>If you’re just trying to generate text (but not format it/”make it pretty”), there’s some options.</li>
</ol>
<p>One option is a tool called <a href="https://comments.etherpad.com/">etherpad</a>. I’ve linked to a demo version of etherpad; click on it. If you copy-paste the URL to that page, and share it with a group of people, <em>everyone can write at the same time</em>. It is just like GDocs in that regard. When you’re done, you’ll have to download the text, and paste it into Word or GDocs to “make it pretty,” but etherpad words great for:</p>
<p><img src="/images/posts/inabox/etherpad.png" alt="nodebb" /></p>
<ol>
<li><strong>Taking notes as a group</strong>. If you’re in a meeting, and multiple people are taking notes, then etherpad lets you quickly create and share a notes doc.</li>
<li><strong>Taking notes for an interview</strong>. If you’re trying to capture someone’s words as part of a testimony, or are working on an interview, you could create this doc, and both the interviewer(s) and interviewee can see what is being written.</li>
<li><strong>Generating copy</strong>. If you’re trying to write an article, then this is a great way to get all of the distractions of formatting out of the way. Words, sentences, and paragraphs. Everything else can come later.</li>
</ol>
<p>This can be a bit painful, though: if you forget to download the text when you’re done… well, you’ll lose the text. Google Docs doesn’t have this problem. But, I’d rather not mix my Google account with every single (public, potentially political) project I engage in. As a result, having a place for people to collaboratively write without requiring accounts is <em>great</em>.</p>
<p>(The nice thing about etherpad is that every URL is unique, and short lived. Typically, after 24 hours of not being visited, a URL is “flushed.” As a result, it “disappears into the ether.” Very handy.)</p>
<h2 id="making-lists-tracking-data-organizing-stuffs">making lists, tracking data, organizing stuffs</h2>
<p>If you use Google Sheets (or Excel) to manage lists of things, then you’ll love <a href="https://baserow.io/">Baserow</a></p>
<p>Baserow is… well, it’s like <a href="https://www.airtable.com/">Airtable</a>. What is Airtable? You’re just going to have to browse around the sites for these two tools a bit. They’re both… kinda like what would happen if Google Sheets grew up and got some really useful tools for organizing and categorizing information. It’s like… a database in your browser.</p>
<p><img src="/images/posts/inabox/baserow.png" alt="nodebb" /></p>
<h2 id="asking-for-input">asking for input</h2>
<p>Use a Google Form. So often, I have asked for input, and then realized I now have 20 emails in my inbox. Better to post a link to a form, and ask people to fill it out. If you get your community used to <em>really short</em> forms, then they won’t view them as an imposition, and when you ask for input, you <em>might</em> get good response rates. (For small groups, like a leadership team, you really should be getting good response rates… your mileage may vary).</p>
<p>There are not many tools that are better/faster/easier to use at this point. When you’re done with the form, you can export the data you captured as a CSV (comma separated values) file, and take it anywhere. (CSV files are plain text, meaning you can easily then load them into Baserow, or Excel, or whatever.)</p>
<h2 id="comms-tracking">comms tracking</h2>
<p>Contact relations management. Or, whatever you call it. Customer? I don’t know. “Keeping track of people.”</p>
<p>Some people do this with a spreadsheet. You can. If you do, consider Baserow (above).</p>
<p>If you want a tool dedicated to tracking when you spoke to people, and about what… use a CRM.</p>
<p><a href="https://www.monicahq.com/">MonicaCRM</a> is a tool intended for relatively small use-cases. However, for small organizations, it may be more than enough. (“Relatively small” may still mean “10K or more people.”). <a href="https://www.espocrm.com/">EspoCRM</a> is another open tool in this space, but a bit more business-y.</p>
<p><img src="/images/posts/inabox/monicacrm.png" alt="nodebb" /></p>
<p>In both cases, they’re about tracking conversations, so that if you have multiple people involved in (say) making calls to a volunteer base, or … whatever you communicate with people about, these tools help a team keep track of who did what in terms of comms.</p>
<h2 id="getting-things-done-kanban">getting things done (kanban)</h2>
<p>In the agile software methodology, we work in two-week sprints. At the start of every two-week sprint, you look at your work from the past week, and “groom the backlog.” This determines what still needs to be done, what should be broken into smaller tasks, and what should be put in the icebox for consideration at some (indeterminite) time later. Then, tasks are divvied up amongst team members. New tasks (as they’re discovered) are put in the inbox throughout the sprint, and then worked into the pipeline at the start of the next sprint.</p>
<p>This methodology can be adapted to single-week sprints, or longer processes. (However, going out to a “month” is silly… the point is to keep tasks small and sprints achievable in scope.) And, a kanban board is often desirable for this kind of work.</p>
<p><a href="https://trello.com">Trello</a> is the most famous tool in this category. It is freely usable, and a good choice. Or, you can host your own, and use <a href="https://wekan.github.io/">wekan</a>. You can label things (perhaps with the category of work), you can assign tasks to people, you can set deadlines… it’s amazing.</p>
<p><img src="/images/posts/inabox/wekan-markdown.png" alt="wekan" /></p>
<p>Like a spreadsheet, there are many ways to use a kanban board. You can have one board per project, and instead of working in sprints, it is just a way to keep track of things in flight, things done, and who is doing them. Or… well, you’d have to play with one to see.</p>
<h2 id="-and-the-kitchen-sink">… and the kitchen sink</h2>
<p>After you handle the “must-haves,” these are the “nice to haves.”</p>
<h3 id="url-click-tracking">url click tracking</h3>
<p>Do you want to know when someone clicked on a URL?</p>
<p>Tools like <a href="https://kutt.it/">kutt</a> let you paste in a URL, and then it gives you a “short form” of that URL. (“bit.ly” is a commercial service like this.) If you have a document you’re sharing publicly, and you’d like to know how many hits it gets… kutt is four friend.</p>
<h3 id="wiki">wiki</h3>
<p>Do you want to host your own version of Wikipedia? No? No one does, so this is unsurprising. But, for some use cases, wikis are still useful. They let many people maintain content, with full history. Potentially useful for internal documentation, but there are probably better tools.</p>
<h3 id="dropbox">dropbox?</h3>
<p>Do you want to run your own version of Dropbox? Like, a private version, so that only you and your org has access to your files? That’s called <a href="https://nextcloud.com/">Nextcloud</a>.</p>
<p>At this point… if you are relying on a tool like Nextcloud, you need good backups… because you’re putting your business in the hands of a piece of software.</p>
<h2 id="wrapping-up">wrapping up</h2>
<p>That’s a lot of information, presented quickly, and possibly not clearly. However, the point is that there are a lot of tools out there that can be used to organize the work of the organizer. Yes, some people you work with (in your org, amongst your volunteers) may say “but I’m not very good with computers.” This is true. Some people aren’t. But, if you only give them email, and expect them to get <em>good</em> at managing their email… they’re doomed, and so are you.</p>
<p>If you instead say “here’s a tool that does <em>just one thing</em>, and it does it <em>pretty well</em>,” then you can help them learn to use those tools. And, if you’re using common, open source tools… there’s a good chance that there are already explainer videos out there that you can point to on Youtube. Yes, your colleagues and community may have to learn some new things, but the side-effect is that everything <em>should</em> get a bit easier in the long run.</p>
musing on sensor systems2022-01-08T00:00:00+00:00https://jadud.com//p/custompi<p>There are some sensors that are deployed into the world <em>unto themselves</em>. Even so, they can be regarded as a system: they have power management, storage, sensors, and an enclosure that must all work together effectively in order to accomplish their task.</p>
<p>As sensors grow more complex, they might draw continuous power, dropping batteries. They might not store data locally, but instead on a server. Through all of this, though, the sensor is part of a <em>system</em>, and that’s the critical realization of the last day or two of musing.</p>
<h2 id="sensors-as-systems">sensors as systems</h2>
<p>Imagine a sensor that monitors the presence of wifi devices. It does not do this to track individuals, or monetize anything, but instead to answer the question “how many people came and went from a place during a day?” The sensor has an ephemeral quality to it; it might “know” that the same device appeared more than once in a day, but it would not recognize the same device tomorrow. Nor does it care to record anything that might invade an individual’s privacy; it is enough to know they were present for 37 minutes between 1PM and 2PM (for example).</p>
<p>I want that sensor to report its data; there’s going to be hundreds, if not thousands, of them in the world, and it is too much to imagine collecting the data from the sensors individually. We’ll assume they have power and network, so they can POST their data to a server somewhere. So far, so good.</p>
<p>The big questions that bother me, though, are questions of <em>trust</em> and <em>deployment</em>. How do we know a sensor is a <em>trustworthy</em> sensor (and not one established by a vagabond seeking to inject <em>bad data</em> into our network), and how do we go from <em>zero</em> sensors to <em>thousands</em>… all while maintaining trust?</p>
<p>This is where the heading <strong>sensors as systems</strong> comes from. They are often referred to as <em>sensor networks</em>, although that term is sometimes used to denote a set of sensors that talk to each-other. In this case, the sensor do not talk to each-other, so I’ll use the term <em>sensor system</em> instead. The sensors cannot be thought of as independent of the server they talk to; they are not two separate pieces, but instead two pieces of a kind; they are part of the same <em>system</em>. To conceptualize them separately makes things harder, not easier, to design and build.</p>
<h2 id="the-topology">the topology</h2>
<p>Sensors send data to servers.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ┌──────┐
┌───┤sensor│
│ └──────┘
│
│
│
│
┌───▼────┐
│ │ ┌──────┐
│ server ◄──────┤sensor│
│ │ └──────┘
└───▲────┘
┌──────┐ │
│sensor├─────────────┘
└──────┘
</code></pre></div></div>
<p>That’s it. We assume servers are always present, always on, and have names on the network that can be resolved by DNS.</p>
<p>However, that’s a topology of the network when it is functioning and live. How does it come to be?</p>
<h3 id="the-server-births-sensors">the server births sensors</h3>
<p>How do we set this up?</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>owner agent server
────────────────────────────►
request sensor
◄────────────────────────────
alert sensor req
◄────────────────────────────
download image
───────────────► X
report data
────────────────────────────►
approve sensor req
────────────────────────────►
report data
</code></pre></div></div>
<p>An agent sets up a sensor configuration on the server and downloads an image. This is where the sensor is told where it will live, it’s “sequence ID,” and it is given a unique API key so it can submit data to the server. However, once set up, it will fail to submit data to the server; the API key is not yet approved.</p>
<p>The server owner receives a notice that a new sensor has been configured, and can use the information provided to decide if it is legitimate. If so, the server owner approves the sensor, and from that point forward, data will be successfully submitted and stored.</p>
<p>In this way, we establish the chain of trust between sensors and the server. The server creates the sensor image; when it creates the image, it can embed a secret that only that particular sensor knows. However, we do not yet <em>trust</em> that sensor, because the server owner has not acknowleged it.</p>
<p>How do we establish trust between the server owner and the creators of sensors? The “old fashioned” way might be to require sensor creators to provide a phone number, and the server owner <em>calls</em> those people. Another might be that we know the server creators will be a member of a known set of individuals. Therefore, we could share a “secret” with them: a passphrase, perhaps, that is used at time of sensor creation. This is not a “password” in the strictest sense, but instead something that indicates that the sensor creator has knowledge that is not “common.” It might be a single word (e.g. “bulldogs”), or it might be a phrase (“The celery stalks at midnight”). If the sensor creator can enter that “shared secret” at sensor creation time, that is enough for the server owner to accept or reject the sensor with a high degree of confidence that the creator can be trusted.</p>
<p>This may sound “low security,” but we are wondering how to establish a computational chain of trust within a known community. The particular use-case is such that a server owner will know <em>a priori</em> all the possible creators of sensors. A low-security “passphrase,” shared within that (high-trust, small) community, is enough to throttle or otherwise filter bogus requests for sensor creation.</p>
<h2 id="small-images-netinst">small images: netinst</h2>
<p>To build a sensor, we need to bootstrap a piece of hardware from a chunk of plastic and silicon to a functioning computer. This involves installing an OS on the device. But… where do we get the OS? Perhaps the server should provide a customized image, such that every sensor gets a unqiue bootstrap.</p>
<p>This is almost what openBalena does to set up a sensor network. openBalena establishes a server, and from it, you can download a “seed image” for the edge nodes in the network. The seed is a binary image to be flashed onto the HW, and there is only <em>one</em> seed image. However, we don’t want to create full binary images for these sensors (yet). Instead, we want to use an existing open platform to bootstrap the sensors, and make sure those sensors are configured securely and running our sensor software. And, we probably do need to generate a unique “seed” image for each sensor for our particualr use-case.</p>
<p>We can do this by letting each sensor bootstrap itself from a tiny netinstall image into a full device. There’s a <a href="https://github.com/FooDeas/raspberrypi-ua-netinst">nifty Raspberry Pi net install package on Github</a>. It works like this:</p>
<ol>
<li>You grab a FAT32-formatted SD card. (This is the typical filesystem on any card larger than… 2GB? 4GB? Something like that.)</li>
<li>You copy the files from the .zip onto the card.</li>
<li>Plug the Pi into a wired network with DHCP.</li>
<li>Stick it in the RPi, and the card bootstraps and proceeds to rebuild itself in-place.</li>
</ol>
<p>When you’re done, you have a full Raspberry Pi running the most recent version of Raspbian, configured and customized to your liking.</p>
<h3 id="customization">customization</h3>
<p>What’s fun about the netinstall is that it has multiple ways to customize the installation.</p>
<p>First, you can set a few parameters in an <code class="language-plaintext highlighter-rouge">installer-config.txt</code>. These parameters include things like the base set of packages to pull (<code class="language-plaintext highlighter-rouge">minimal</code>, <code class="language-plaintext highlighter-rouge">server</code>, etc.), timezone, the final action after setup (<code class="language-plaintext highlighter-rouge">reboot</code>, <code class="language-plaintext highlighter-rouge">poweroff</code>, etc.). There are more “advanced” parameters that can be configured, including providing a set of files that are available at time of configure (e.g. additional config files, binaries, and so on), a <code class="language-plaintext highlighter-rouge">post-install</code> script, and (particularly interestingly), the ability to provide a URL for an <code class="language-plaintext highlighter-rouge">online_config</code> that will be executed after the <code class="language-plaintext highlighter-rouge">installer-config</code> is executed.</p>
<p>The combination of these mean that we should, for example, be able to do the following:</p>
<ol>
<li>Add an additional repository to drive the installation of additional packages at time of device setup.</li>
<li>Place the API key (or, perhaps, <a href="https://jwt.io/introduction">signed JWT token</a>?) in the filesystem at a known location for later discovery by sensing software.</li>
<li>The ability to execute arbitrary code/scripts post-install, allowing for additional customization or lockdown.</li>
</ol>
<p><strong>The entire bundle is approximately 64MB</strong>. A full Raspbian image can be 10-20x larger. If 500 users tried to create sensors “at once,” we would only be storing 32GB of data. Bootstrap images can be removed after download, meaning we should not have to worry about disk capacity (modulo some moderate DDoS protections).</p>
<h3 id="automation">automation</h3>
<p>This can also be moved <em>off of the server</em>. The server owner can get an API key that allows them to run this entire process offline. In this way, if the server owner wants to (say) create 50 card images, a small script could be provided that:</p>
<ol>
<li>Creates the custom data from a CSV file,</li>
<li>Retrieves an API key (or signed token, or whatever) from the server,</li>
<li>Copies it to a uSD card, and</li>
<li>Saves the bundle of metadata to the sevrer.</li>
</ol>
<p>In this way, the server is not storing images, but just tracking the relevant API keys and “enabling” or otherwise “approving” the sensors in a bulk sequence of actions. The uSD cards can then be created as quickly as a server owner can insert and remove cards from their computer.</p>
<p>Because it is lightweight, it could even be that a Raspberry Pi is used as the “sensor creation” device.</p>
<p><strong>QUESTION</strong>: Should a small binary application be created that… creates the uSD cards? In this way, end-users do not go to a web page and download a zip (that must be decompressed and put on a uSD card), but instead download a small application that lets them enter their configuration data, and it writes it directly to a uSD card? Would this eliminate potential points of failure?</p>
<h2 id="aside-a-network-of-servers">aside: a network of servers</h2>
<p>It should be possible for an individual to set up a server easily, and once set up, establish their own sensor network. Given our use-case, each server might provide an easy way to provide its data via an open API. However, we also imagine these servers as part of a <em>federation</em> of data collection servers.</p>
<p>Because we imagine our servers as collecting and providing open, public data, we should have a “mirror” API endpoint. This endpoint might allow an upstream agent to request data from the server, and in doing so, create a centralized resource containing all of the data from all of the live servers in the network. It’s a small thing, and requires no new trust networks, but it is a component that might need to be considered as part of the whole.</p>
<h2 id="conclusion">conclusion</h2>
<p>The sensor network needs to be thought of as a whole. Sensors are not something that are built and configured independent of the server, but instead something that are created <em>by the server</em>, and in this way, their provenance and connection to the server is unquestionable.</p>
archiving email2022-01-06T00:00:00+00:00https://jadud.com//p/archiving<p>You have a Goog account, but you’re running out of space. You have two options.</p>
<ol>
<li>Pay Google money, and they’ll give you more space.</li>
<li>Get rid of shit.</li>
</ol>
<p>Unfortunately, your email is your life. <em>Kinda</em>. Or, it is your life in that it is everything from critical documents to dumb shit you don’t know how to unsubscribe from. In this regard, we’re going to have to take a series of steps to clean and organize.</p>
<h2 id="backup-first">backup first</h2>
<p>First, we’re going to back things up. Hopefully, you have space for this. In theory, you have space on your laptop for this exercise (probably 10-20GB), or you have an external drive with space for the cleanup operations. If you don’t, ask questions of someone who can help you out.</p>
<p>Assuming you’re on the computer where you normally check your email, log into Google Takeout.</p>
<p>https://takeout.google.com/</p>
<h2 id="making-your-selections">making your selections</h2>
<p>You’ll be presented with a pretty simple interface. It is a list of <em>all</em> Google services you can archive. We don’t want that (yet). First, we <em>only</em> want email. To do that, we’re going to click the button (it is almost impossible to tell it is a button) that says “deselect all.” It is near the top of the list.</p>
<p>Then, scroll down until you see Mail (roughly 3/4 of the way down the list), and select it.</p>
<p>Scroll the rest of the way to the bottom of the page, and click “next step.”</p>
<h2 id="frequency-and-type">frequency and type</h2>
<p>You’re now going to select “export once” (we don’t want regular/monthly exports), select <code class="language-plaintext highlighter-rouge">.zip</code> and <code class="language-plaintext highlighter-rouge">2GB</code> files. (This is the default, so you should have to take no action.)</p>
<h2 id="create-export">create export</h2>
<p>Finally, click the button that says “create export.”</p>
<h2 id="wait">wait</h2>
<p>Wait.</p>
<h2 id="check-your-mail-download">check your mail, download</h2>
<p>You’ll get an email from Google sometime later. It will take you to a page where you can download one or more files. Those files will be all of your mail, including text and attachments. However, they will be bundled up into a sequence of zip files. Pull down each zip file in turn, and make sure that none of the downloads fail. You’ll probably want to set your laptop up somewhere, leave it plugged in, and just click “download” on each link in sequence, allowing each download to complete.</p>
<h2 id="backup">backup</h2>
<p>Once you have downloaded all of those zip files, we need a place to put them. For a start, you could move those files onto an external drive or USB stick, so they’re in two places – one copy on your laptop, and one on another storage medium. We don’t want to delete either of these (yet), because…</p>
<h2 id="archive">archive</h2>
<p>We’re going to move them to a remote server that, itself, is backed up in multiple locations… but that is the topic of another post. To get access to that archive server, we need some new digital accounts, which we needed a password manager for. Why? Because we want the password manager to keep track of those passwords for us. So, it is all of a piece. One step at a time…</p>
safely managing passwords2021-09-25T00:00:00+00:00https://jadud.com//p/managing-passwords<p><em>The first in a series of posts to support family in moving to safer information management practices.</em></p>
<p>Identity theft is <em>hard</em> to recover from. Banks, governments... they really don't care if someone gets hold of enough information about you to create credit cards and bank accounts in your name. If that happens, you are kinda on your own, and (if things went poorly), you're broke.</p>
<p>How risky is this? If you are my parents, you’ve already been caught in (at least) 3 different data breaches. (Six, for my dad.) I discovered this by using the website <a href="https://haveibeenpwned.com/">https://haveibeenpwned.com/</a>. I entered their email addresses, and <a href="https://haveibeenpwned.com/">haveibeenpwned</a> tells me what breaches their email addresses appear in.</p>
<p><i>Nifty</i>.</p>
<p>Does that mean passwords were lost? No... maybe... well... in some cases, yes. This is only a concern if you reuse passwords, or generally have insecure passwords. How do you know if your password is insecure? One way would be to go to the <a href="https://www.uic.edu/apps/strong-password/">UIC Password Strength Tester</a> and try out a few of your passwords. (<b>NOTE</b>: NEVER ENTER YOUR PASSWORD INTO A RANDOM SITE. This is one I trust, but still, you have been warned.) If UIC says your password is anything less than <i>very strong</i>, I'd say you have a problem.</p>
<p>Passwords I rely on (and remember) have 30-40 characters because they are a combination of memorable words, phrases, numbers, and symbols, in a mix of lower and uppercase. However, I have given up on remembering most of my passwords, and instead use a <b>password manager</b>. This is a tool that remembers my passwords for me. This way, I remember one <i>really good</i> password, and I let it 1) generate and 2) remember lots of <i>insanely good</i> passwords on my behalf.</p>
<h2 id="using-bitwarden">using bitwarden</h2>
<p>Let’s look at using Bitwarden, an open source password manager.</p>
<p>First, go to <a href="https://bitwarden.com/">bitwarden.com</a>.</p>
<p>Click “Get Started,” and create an account. (Or, you may have been sent an invite via email. If so, use that.)</p>
<p>For the master password, <strong>you need to pick something secure</strong>. That means:</p>
<ol>
<li>Never previously used.</li>
<li>Long.</li>
<li>Mix of numbers, letters, and symbols.</li>
</ol>
<p>To start, you could go to <a href="https://diceware.dmuth.org/">https://diceware.dmuth.org/</a>. This gives you an example of what a passphrase looks like. You might pick five words of your own, or use a couple of runs from this to pick a collection of words.</p>
<p>There you go. You have a difficult-to-hack, nearly impossible to guess, reasonably secure password. Write it down, keep it safe, and use that as your master password for Bitwarden. You will, over time, memorize this password. Why? Becuase, if you are doing things correctly, this will be the <em>only</em> password that you keep in your head.</p>
<h2 id="just-in-case">just in case…</h2>
<p>It may not need to be emphasized, but:</p>
<ol>
<li>Don’t use this password anywhere else. If you use a password in multiple places, then that means there are multiple places that, when hacked, can lose your password for you.</li>
<li>Write it down, if you have to. However, if you’re going to do that, <em>be consistent about it</em>. What does that mean? Don’t write it on a ducking sticky note and lose it. Pick a small notebook, and treat that notebook like it is the holiest of holys. Never lose the goddamn thing. It’s your backup brain.</li>
<li>Don’t lose this password. Don’t forget this password. Don’t forget what notebook you wrote it in. This is about to become the key to your entire digital life.Your passwords are your only defense against losing your phone, bank accounts, retirement… you name it, you can lose it to a good scammer.</li>
</ol>
<p>So. ‘Nuff said.</p>
<h2 id="one-final-note">one final note…</h2>
<p>You can, if you want, chose not to use a password manager. It is a <em>reasonable</em> choice. But, if you make that choice, you really, <em>really</em> need to develop a good religious practice around maintaining a password notebook or similar. It can’t be half-assed or ducked up in some way.</p>
<p>There are three reasons to use a password manager:</p>
<ol>
<li>It can log into websites automatically for you.</li>
<li>It can generate really random, secure passwords for you.</li>
<li>Next of kin.</li>
</ol>
<p>With a password manager, you can designate next of kin. This way, if something happens to you, your spouse/family is not screwed. In my case, there are digital systems all over the place (bank accounts, photo archives, etc.) that only I can access. It is very, very hard to convince these places to update/cancel an account <em>once you are dead</em>. Sometimes, they don’t or won’t. It is a part of our digital society that we have not yet addressed.</p>
<p>I have moved many things into my password manager so that my spouse can, if I get hit by a bus, actually take care of closing out credit cards, managing bank accounts, and so on.</p>
<p>You could just use the diceware password generator, and put all your passwords in a single notebook. This has a few problems:</p>
<ol>
<li>Fire.</li>
<li>Theft.</li>
<li>Forgetfulness.</li>
</ol>
<p>If you don’t keep that notebook up to date, then everyone is screwed. If it burns, you’re screwed. If it is stolen, you’re screwed again. So, while you can just use something like diceware to generate passwords for everything, it probably is better to have <em>one</em> password for the password manager, and then we’ll use tools built into it to generate all your <em>other</em> passwords. (Which, is the subject of a later post as well…)</p>
<h1 id="for-next-time">for next time</h1>
<p>At this point, you have a Bitwarden account, but you’re not sure what to do with it. That will be the topic of the next post.</p>
masking and economics2021-08-25T00:00:00+00:00https://jadud.com//p/economic-impact<p>How much economic harm is the school committee in Lewiston, Maine willing to accept on behalf of the people they represent?</p>
<p>From <a href="https://censusreporter.org/profiles/97000US2307320-lewiston-me/">recent census data</a>, our town has a median age of 40, and a median household income of $44K. Put another way, more than half of the households in Lewiston have an annual income of less than $50K. Further, our poverty rate is around 18%, and the <i>per capita</i> income in Lewiston is close to $25K; this means that the distribution of income in Lewiston skews low. <em>We’re not a rich town</em>.</p>
<iframe id="cr-embed-97000US2307320-economics-income-household_distribution" class="census-reporter-embed" src="https://s3.amazonaws.com/embed.censusreporter.org/1.0/iframe.html?geoID=97000US2307320&chartDataID=economics-income-household_distribution&dataYear=2019&releaseID=ACS_2019_5-year&chartType=histogram&chartHeight=200&chartQualifier=&chartTitle=Household+income&initialSort=&statType=scaled-percentage" frameborder="0" width="100%" height="300" style="margin: 1em; max-width: 720px;"></iframe>
<script src="https://s3.amazonaws.com/embed.censusreporter.org/1.0/js/embed.chart.make.js"></script>
<p>There is substantial economic, public policy, and mental health research that explores Covid-19’s impact on families; it is non-trivial and substantial <sup id="fnref:e1" role="doc-noteref"><a href="#fn:e1" class="footnote" rel="footnote">1</a></sup><sup>,</sup><sup id="fnref:e2" role="doc-noteref"><a href="#fn:e2" class="footnote" rel="footnote">2</a></sup>. Our community has, and will continue to experience, many of these impacts as the pandemic continues.</p>
<p>In simple terms, though, <i>what does Covid-19 cost</i>? The average uninsured (or out-of-network) bill for a hospital stay related to Covid can easily run between $50,000 and $70,000; insured and in network, it is a much more “reasonable” $38K <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">3</a></sup><sup>,</sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup>. Medicare fee-for-service costs may be as low as $24K in some cases<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">5</a></sup>, but this is dependent on the level of care required; ventilation and ICU stays cost much, much more. While it is true that waiver programs and private insurance can bring the cost of treatment down to as little as $1300<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">6</a></sup>, that assumes limited, out-patient treatment, not a hospital stay.</p>
<p>A household income of $25K suggests one parent/guardian working 40 hours/week, 52 weeks per year, for $12/hour. A household income of $40K suggests one parent/guardian working 40 hours/week for 52 weeks/year for $20/hour. We can also get to $40K with two adults working for $12-15 hour for less than 52 weeks/year. That is, there may be illness or holiday time that drops the number of weeks worked.</p>
<p>If we quarantine a class of 25 students, that is 25 families that must deal with a child at home for two weeks. <b>This will cost Lewiston $25K - $40K in lost wages</b>, assuming someone must miss work to take care of a child, or pay someone to watch their children. Some families may have friends or a family member who can help (a grandparent or similar). Unfortunately, desperation may lead some families to leave young children at home alone, hoping for the best.</p>
<p>Quarantining has <strong>compound effects</strong>. If a child is quarantined, and (it turns out), Covid-positive, they may then infect a parent or guardian. In both the case of the child and the parent, there are potential health care costs. If they are lucky, this may be no more than $1,000 to $2,000. At $12/hour, a $1000 health bill is <strong>two full weeks of pay</strong>. If the case is more serious, a hospital stay can rapidly end up costing many tens of thousands of dollars. It is possible that people may have to chose between employment and caring for their family; lost jobs are difficult to recover from, and leave a family in a place where a child’s education is interrupted as well as new stressors around income and food security. (Or, if you prefer, <strong>income insecurity is food insecurity</strong>.)</p>
<p>In short, the school committee is not just making a decision about whether or not children in Lewiston Public Schools should or should not wear masks. They are making an economic decision for families and for our community. If everyone is healthy, there is no impact. Every classroom quarantined costs our community, in the first instance, thirty thousand dollars. However, many of those families may end up needing medical care, meaning that “economic impact number” could rapidly double (if minimal medical intervention is needed); <strong>a <em>single</em> hospitalization in a <em>single</em> quarantined classroom lifts the total economic impact of <em>just one classroom quarantining</em> to $120K or more</strong>.</p>
<p>None of this estimation considers the long-term health impacts (what is known as “long Covid”) or the possibility of job loss as a result of illness (if you are employed <em>at will</em>, you can be fired at any time, without reason). Nor does it consider the possibility that a <em>severe</em> outbreak might lead us to a place where we are doing hybrid learning; the social and economic impacts of that outcome are <em>massive</em>.</p>
<p>There is substantial, peer-reviewed research that illustrates the power of masks to limit the spread of Covid-19 <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">7</a></sup><sup>,</sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">8</a></sup><sup>,</sup><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">9</a></sup><sup>,</sup><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup><sup>,</sup><sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup>. Outbreaks in LPS endanger the health and lives of our children, stand in the way of a safe year <em>in the classroom</em>, and can cause substantial economic harm to our community. The Superintendent has recommended we use every tool we have to start, and if we do well, reconsider. I encourage the committee to support him in this decision, and think seriously about the impact your decisions can have on the city of Lewiston.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:e1" role="doc-endnote">
<p><a href="https://www.urban.org/research/publication/covid-19-pandemic-straining-families-abilities-afford-basic-needs">The COVID-19 Pandemic Is Straining Families’ Abilities to Afford Basic Needs</a>, Urban Institute, April 2020 <a href="#fnref:e1" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:e2" role="doc-endnote">
<p><a href="https://bfi.uchicago.edu/wp-content/uploads/2020/10/BFI_WP_2020143.pdf">Impact of the COVID-19 Crisis on Family Dynamics in Economically Vulnerable Households</a>, Becker Friedman Institute, October 2020 <a href="#fnref:e2" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p><a href="https://www.fairhealth.org/article/costs-for-a-hospital-stay-for-covid-19#:~:text=We%20found%20the%20average%20charge,out%2Dof%2Dnetwork%20benefit">FAIR Health</a>, <a href="https://s3.amazonaws.com/media2.fairhealth.org/brief/asset/COVID-19%20-%20The%20Projected%20Economic%20Impact%20of%20the%20COVID-19%20Pandemic%20on%20the%20US%20Healthcare%20System.pdf">full brief</a>, March 2020 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://www.healthcarefinancenews.com/news/average-cost-hospital-care-covid-19-ranges-51000-78000-based-age">Healthcare Finance News</a>, Nov 2020 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://www.healthsystemtracker.org/brief/unvaccinated-covid-patients-cost-the-u-s-health-system-billions-of-dollars/">Unvaccinated COVID-19 hospitalizations cost the U.S. health system billions of dollars</a>, August 2021 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://www.healthsystemtracker.org/brief/potential-costs-of-coronavirus-treatment-for-people-with-employer-coverage/">Potential costs of COVID-19 treatment for people with employer coverage</a>, March 2020 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="https://science.sciencemag.org/content/372/6549/1439">Face masks effectively limit the probability of SARS-CoV-2 transmission</a>, Science, May 2021 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p><a href="https://www.pnas.org/content/118/4/e2014564118">An evidence review of face masks against COVID-19</a>, Proceedings of the National Academy of Sciences of the United States of America, July 2020 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p><a href="https://www.nature.com/articles/s41598-021-84679-8">Masks and distancing during COVID‐19: a causal framework for imputing value to public‐health interventions</a>, Nature, March 2021 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p><a href="https://royalsociety.org/news/2020/09/set-c-covid-r-rate/">Reproduction number (R) and growth rate (r) of the COVID-19 epidemic in the UK: methods of estimation, data sources, causes of heterogeneity, and use as a guide in policy formulation</a>, The Royal Society, September 2020 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p><a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0237691">Mask or no mask for COVID-19: A public health and market study</a>, PLOS One, August 2020 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩︎</a></p>
</li>
</ol>
</div>
masking and public health2021-08-21T00:00:00+00:00https://jadud.com//p/masking<p><img src="/images/posts/masking-matrix.png" /></p>
<p><small><a href="https://docs.google.com/presentation/d/1-zAIo1D9S_1_HWMoAEA-3fwjh5cr8j72T6Yw5mt8EAM/edit?usp=sharing">GDoc slide version</a>.</small></p>
<p>Lewiston Public Schools has decided masks should be optional. The Superintendent, in his letter dated August 2nd, 2021, said he did not feel he was empowered to make health decisions for families. The school committee has supported this decision. However, <b>the choice to allow choice is a public health decision</b>. The Superintendent and school committee chose a path at odds with federal and state recommendations for school reopening. The CDC and Department of Education (at the federal and state level) recommend that we open with masks, with distancing, with improved ventilation in our schools, and screening measures (e.g. pooled testing).</p>
<p>The spread of Covid-19 (symptomatic and asymptomatic) will be <b>most disruptive</b> if a low percentage of students and staff wear masks (because it is optional), and if we test infrequently (or if many people opt out of testing). This combination of choices leads to a world where we have more, not less, disruption due to Covid-19. <b>The Superintendent and school committee may be committing us to this path, which could yield significant disruption to our students' education and the greatest (economic) harm to our community</b>.</p>
<p>The spread of Covid-19 will be <b>least disruptive</b> if families and children get vaccinated, wear masks, and we run frequent pooled tests of as many staff and students as possible. Currently, because the Superintendent and school committee have voted to make masks optional (a public health choice favoring faster/greater spread of Covid-19, greater disruption to student learning, and greater economic harm to our community), we cannot currently achieve this possible outcome.</p>
<p><b>I am not a lawyer</b>. Lewiston Public Schools is currently <em>choosing</em> to disregard the recommendations of the state and federal government regarding how to return to school safely. They have done so without any rationale rooted in law, logic, or evidence. When Covid-19 begins to spread amongst staff and students, will this open the school to individual and class action lawsuits on behalf of sick students, families, and staff? There are schools across the nation who chose similarly, and they are now experiencing thousands of students in quarantine, with children hospitalized or experiencing "long Covid" (given the more virulent nature of the Delta strain). <b>Will the Superintendent's choice to disregard health recommendations be demonstrable in a court of law as negligent or willfully harmful to the students of LPS?</b></p>
<p>I do not know. To find out, it suggests 1) the Superintendent and school committee must continue ignoring public health guidance; 2) Covid-19 spreads through our school population (I hope this does not happen); and 3) one or more lawsuits are brought forward naming the Superintendent, the school administration, and the school committee as defendents. This would come at a cost to us all, and suggests many poor choices were made before arriving here.</p>
<p>I would rather we don't find out. </p>
<p> </p>
<p><small>(Even if masks are required, the law leaves room for religious and medical opt-outs. It is <a href="https://www.poynter.org/reporting-editing/2020/can-the-government-legally-force-you-to-wear-a-mask/">highly unlikely</a> (I am not a lawyer) that requiring masks would be considered by any reasonable court to be <a href="https://clsbluesky.law.columbia.edu/2020/10/29/to-mask-or-not-to-mask-its-not-a-constitutional-question/">a violation of Constitutional rights</a>.)</small></p>
python: marshmallow fluff...2020-03-19T00:00:00+00:00https://jadud.com//p/so-much-fluff<p>So, I still like my metaprogramming tricks. It was fun. I learned things.</p>
<p>But, I went to <a href="https://pypi.org/">PyPi</a>, and discovered they have a very nice search feature. <a href="https://pypi.org/search/?q=marshmallow">I searched for marshmallow</a>. I found… 263 projects referencing marshmallow. It’s a unique enough word that I’m going to <em>guess</em> that they all interact with the <code class="language-plaintext highlighter-rouge">marshmallow</code> library in some way.</p>
<p>A lot of them do what I was exploring. For example, <a href="https://github.com/sv-tools/marshmallow-objects">marshmallow-objects</a> does <em>exactly</em> what I was doing, but better.</p>
<p>(Well, mostly. Kinda.)</p>
<p>Actually, it is different. You still have to define Python classes… but, you can subclass a marshmallow model that gives you serialization/deserialization without having to write a separate schema. It wouldn’t let me dynamically generate the classes from a YAML file (that’s a neat trick, I think), but it might be fine to write the class as code. I mean, it’s easier to test the class, whereas the dynamic trickery is just that…</p>
<p>So. Lesson learned. Or, if you prefer, a lesson I’ve always known, and taught my students many times: <em>do a search first</em>. Someone else has probably done it.</p>
<p>As the old joke goes:</p>
<p><img src="https://imgs.xkcd.com/comics/python.png" /></p>
python: metaprogramming marshmallow2020-03-19T00:00:00+00:00https://jadud.com//p/metapython<h2 id="tldr">tl;dr</h2>
<p>I used Python’s metaprogramming features to auto-generate Marshmallow schemas that correspond to <code class="language-plaintext highlighter-rouge">attrs</code>-derived data classes.</p>
<p>If you like the thought of thinking about metaprogramming as much as I do, you’ll grove on this post.</p>
<h2 id="a-theme-of-metaprogramming">a theme of metaprogramming…</h2>
<p><em>Oddly, related as a piece to my explorations of <code class="language-plaintext highlighter-rouge">tbl</code> in Python, as well looking at GraphQL, but still it’s own post…</em></p>
<p>It is hard to extend Python’s syntax, but that doesn’t mean you can’t engage in some dynamic metaprogramming in the language. While it isn’t always the first tool you should reach for, it can be nice for <strong>reducing boilerplate</strong>.</p>
<p>For example, I am staring down a bunch of JSON-y things. They come-and-go from the front-end to the back-end:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="err">email:</span><span class="w"> </span><span class="s2">"vaderd@empire.com"</span><span class="p">,</span><span class="w">
</span><span class="err">token:</span><span class="w"> </span><span class="s2">"89425abc-69f9-11ea-b973-a244a7b51496"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Let’s pretend that the front-end is <a href="https://reactjs.org/">React</a>, the storage layer is <a href="https://www.mongodb.com/">MongoDB</a>, and the middleware is <a href="https://palletsprojects.com/p/flask/">Flask</a> (a Python web framework).</p>
<p><img src="/images/posts/react-flask-mongo.png" alt="react<->flask<->mongo" /></p>
<p>At the Flask layer, there’s a lot of work that needs to be done: the JSON comes in, and in the first instance, it comes in as a dictionary. This is not very nice. By “not very nice,” I mean “dictionary convey no notion of types or the regularity of their contents, and therefore provide us with no notion of safety.” What I’d like is for the data coming from the front-end to be strongly typed and well described, the middleware to be aware of those types, and the database to help enforce them as well. (I’m thinking GraphQL starts to do things like this… almost.)</p>
<p>BUT, we have a RESTful web application sharing data in webby, untyped ways. This inspired me to do some digging. First, I found Flask Resful, which is a nice library. It lets you define a class, set up <code class="language-plaintext highlighter-rouge">get</code>, <code class="language-plaintext highlighter-rouge">put</code>, <code class="language-plaintext highlighter-rouge">post</code>, and other methods on endpoints, and register them with the app. Leaving a bunch of bits out, this looks like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">flask_restful</span> <span class="kn">import</span> <span class="n">Resource</span><span class="p">,</span> <span class="n">Api</span>
<span class="kn">import</span> <span class="nn">db.models</span> <span class="k">as</span> <span class="n">M</span>
<span class="kn">import</span> <span class="nn">db.db</span> <span class="k">as</span> <span class="n">DB</span>
<span class="k">class</span> <span class="nc">Tokens</span><span class="p">(</span><span class="n">Resource</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">email</span><span class="p">):</span>
<span class="c1"># Create a UUID string
</span> <span class="n">tok</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="p">.</span><span class="n">uuid1</span><span class="p">())</span>
<span class="c1"># Create a TimedToken object, with a current timestamp
</span> <span class="n">t</span> <span class="o">=</span> <span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="n">email</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">tok</span><span class="p">,</span> <span class="n">created_at</span><span class="o">=</span><span class="n">time</span><span class="p">())</span>
<span class="c1"># Grab the correct collection in Mongo for tokens
</span> <span class="n">collection</span> <span class="o">=</span> <span class="n">DB</span><span class="p">.</span><span class="n">get_collection</span><span class="p">(</span><span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">.</span><span class="n">collection</span><span class="p">)</span>
<span class="c1"># Save the token into Mongo by dumping the token through marshmallow
</span> <span class="n">as_json</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">dump</span><span class="p">()</span>
<span class="n">collection</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">as_json</span><span class="p">)</span>
<span class="c1"># Return the token as JSON to the client
</span> <span class="k">return</span> <span class="n">as_json</span>
<span class="n">mapping</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="n">Tokens</span><span class="p">,</span> <span class="s">"/token/<string:email>"</span><span class="p">]</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">add_api</span><span class="p">(</span><span class="n">api</span><span class="p">):</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">mapping</span><span class="p">:</span>
<span class="n">api</span><span class="p">.</span><span class="n">add_resource</span><span class="p">(</span><span class="n">m</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">m</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>
<p>which is in a module called “API”, and at the top level of the app:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">flask_restful</span> <span class="kn">import</span> <span class="n">Api</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="kn">import</span> <span class="nn">hydra</span>
<span class="kn">from</span> <span class="nn">api.api</span> <span class="kn">import</span> <span class="n">add_api</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="o">@</span><span class="n">hydra</span><span class="p">.</span><span class="n">main</span><span class="p">(</span><span class="n">config_path</span><span class="o">=</span><span class="s">"config.yaml"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">cfg</span><span class="p">):</span>
<span class="c1"># Dynamically define classes from the YAML config.
</span> <span class="n">M</span><span class="p">.</span><span class="n">create_classes</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span>
<span class="c1"># Set the Mongo params from the config.
</span> <span class="n">DB</span><span class="p">.</span><span class="n">set_params</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">host</span><span class="p">,</span> <span class="n">cfg</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">port</span><span class="p">,</span> <span class="n">cfg</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">database</span><span class="p">)</span>
<span class="c1"># Add the REST API to the app.
</span> <span class="n">A</span> <span class="o">=</span> <span class="n">Api</span><span class="p">(</span><span class="n">app</span><span class="p">)</span>
<span class="n">add_api</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
</code></pre></div></div>
<p>This is a lot to take in, but I’m actually trying to get to the good bit. The top level has an <code class="language-plaintext highlighter-rouge">init</code> function that reads in a configuration file (more on that later), and uses that to build a whole bunch of classes <em>dynamically at run time</em>. (This is the cool bit.) Those are instantiated in the <code class="language-plaintext highlighter-rouge">models</code> submodule of <code class="language-plaintext highlighter-rouge">db</code>, and they get used throughout the application.</p>
<p>Looking back at the first code block, it’s possible to see some of those uses. For example, I’m creating a timed token (e.g. a random string associated with a user that will ultimately have a finite lifetime).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">t</span> <span class="o">=</span> <span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="n">email</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">tok</span><span class="p">,</span> <span class="n">created_at</span><span class="o">=</span><span class="n">time</span><span class="p">())</span>
</code></pre></div></div>
<p>This class takes three parameters: <code class="language-plaintext highlighter-rouge">email</code>, <code class="language-plaintext highlighter-rouge">token</code>, and <code class="language-plaintext highlighter-rouge">created_at</code>. The whole purpose of the class is that I want it to serve as a <code class="language-plaintext highlighter-rouge">struct</code> (in Racket or C) or <code class="language-plaintext highlighter-rouge">record</code> (in… Pascal?). In Python, <code class="language-plaintext highlighter-rouge">namedtuple</code>s, <code class="language-plaintext highlighter-rouge">dataclass</code>es, and classes decorated with <code class="language-plaintext highlighter-rouge">attrs</code> are all examples of what I’m aiming for.</p>
<p>But… <strong>BUT</strong>… I also want easy marshalling to-and-from JSON. The front-end speaks it, and Mongo speaks it… but, while I’m in the middle, I need to interact with it. I would like it to be <em>typed</em> (in as much as Python is typed) while I am working with it in the middleware. And, I’d rather not do the conversions myself. (Why would I write code if I wanted to do all the hard stuff by hand?)</p>
<p>To solve this, enter <a href="https://marshmallow.readthedocs.io/en/stable/">marshmallow</a>. This Python library lets you define schemas for classes, and in doing so, leverage machinery to marshal JSON structures to-and-from those classes. For example, my <code class="language-plaintext highlighter-rouge">TimedToken</code> class looks looks (er, used to look like):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">attr</span><span class="p">.</span><span class="n">s</span>
<span class="k">class</span> <span class="nc">TimedToken</span><span class="p">:</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">attr</span><span class="p">.</span><span class="n">ib</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">)</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">attr</span><span class="p">.</span><span class="n">ib</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">)</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">attr</span><span class="p">.</span><span class="n">ib</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
</code></pre></div></div>
<p>To marshal this to-and-from JSON, I can use marshmallow. I need to create a schema first:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">marshmallow</span> <span class="kn">import</span> <span class="n">Schema</span><span class="p">,</span> <span class="n">fields</span>
<span class="k">class</span> <span class="nc">TimedTokenSchema</span><span class="p">(</span><span class="n">Schema</span><span class="p">):</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">fields</span><span class="p">.</span><span class="n">Str</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">fields</span><span class="p">.</span><span class="n">Str</span><span class="p">()</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">fields</span><span class="p">.</span><span class="n">Number</span><span class="p">()</span>
</code></pre></div></div>
<p>Once I have a schema, I can do things like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a_token</span> <span class="o">=</span> <span class="n">TimedToken</span><span class="p">(...)</span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">TimedTokenSchema</span><span class="p">()</span>
<span class="n">as_json</span> <span class="o">=</span> <span class="n">schema</span><span class="p">.</span><span class="n">dump</span><span class="p">(</span><span class="n">a_token</span><span class="p">)</span>
</code></pre></div></div>
<p>The machinery inside of marshmallow will take an object of type <code class="language-plaintext highlighter-rouge">TimedToken</code>, a schema describing them (<code class="language-plaintext highlighter-rouge">TimedTokenSchema</code>), and use the schema to walk through a <code class="language-plaintext highlighter-rouge">TimedToken</code> object to convert it to JSON (and, back, if you want).</p>
<p>This is cool.</p>
<p>But, it’s not automatic. And, for every data structure I want to create in my app, I need to write a schema. This is duplicating code. If I change a structure, I need to remember to change the corresponding schema. <em>That isn’t going to happen</em>. What’s actually going to happen is that I’ll forget something, and everything will break.</p>
<h2 id="enter-metaprogramming">enter metaprogramming!</h2>
<p>I wanted to be able to declare my data structures as YAML, and then have Python generate both the <code class="language-plaintext highlighter-rouge">attrs</code>-based class as well as the <code class="language-plaintext highlighter-rouge">marshmallow</code>-based schema. Is that so much to ask? No, I don’t think it is.</p>
<p>Using Facebook’s <a href="https://hydra.cc/">Hydra</a>, I created a config file. This important bit (for this discussion) looks like this:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">models</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">TimedToken</span>
<span class="na">fields</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">email</span>
<span class="pi">-</span> <span class="s">token</span>
<span class="pi">-</span> <span class="s">created_at</span>
<span class="na">types</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">String</span>
<span class="pi">-</span> <span class="s">UUID</span>
<span class="pi">-</span> <span class="s">Number</span>
</code></pre></div></div>
<p>Then, the fun bit is the function <code class="language-plaintext highlighter-rouge">create_classes</code>. It takes a config that includes the <code class="language-plaintext highlighter-rouge">models</code> key, and does the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">create_classes</span><span class="p">(</span><span class="n">cfg</span><span class="p">):</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">cfg</span><span class="p">.</span><span class="n">models</span><span class="p">:</span>
<span class="n">make_classes</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">fields</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">types</span><span class="p">)</span>
</code></pre></div></div>
<p>OK… so, <code class="language-plaintext highlighter-rouge">make_classes</code> must do the interesting work.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">make_classes</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">):</span>
<span class="c1"># Dynamically generate the marshmallow schema
</span> <span class="n">schema</span> <span class="o">=</span> <span class="n">make_schema</span><span class="p">(</span><span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">)</span>
<span class="c1"># Generate a base class, and wrap it with the attr.s decorator.
</span> <span class="n">base</span> <span class="o">=</span> <span class="n">attr</span><span class="p">.</span><span class="n">s</span><span class="p">(</span><span class="n">make_base</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">fs</span><span class="p">,</span><span class="n">ts</span><span class="p">,</span> <span class="n">schema</span><span class="p">))</span>
<span class="c1"># Insert the class into the namespace.
</span> <span class="nb">globals</span><span class="p">()[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">base</span>
</code></pre></div></div>
<p>This is probably <strong>really bad</strong>. But, it’s fun, so I’ll keep going.</p>
<p>I pass in the name of the class as a string (<code class="language-plaintext highlighter-rouge">"TimedToken"</code>), and then I pass in the fields as a list of strings, and their types as a list of strings. (These are given in the YAML, above). The last line here is where the evil happens. The function <code class="language-plaintext highlighter-rouge">globals()</code> returns the dictionary representing the current namespace. I proceed to overwrite the namespace; specifically, I insert a new class of the name <code class="language-plaintext highlighter-rouge">TimedToken</code> (in this example). (I <em>hope</em> the use of <code class="language-plaintext highlighter-rouge">global()</code> is restricted to the <em>module</em>, and not the entire <em>application</em>… I have some more reading/experimenting to do in that regard. It <em>seems</em> like it is the module…)</p>
<p>Backing up, I’ll start with <code class="language-plaintext highlighter-rouge">make_schema()</code>. It takes the fields and types, and does the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">make_schema</span><span class="p">(</span><span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">):</span>
<span class="c1"># Create an empty dictionary
</span> <span class="n">d</span> <span class="o">=</span> <span class="p">{}</span>
<span class="c1"># Walk the fields and types together (using zip)
</span> <span class="k">for</span> <span class="n">f</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">):</span>
<span class="c1"># Convert each type into the appropriate fields.X from marshmallow
</span> <span class="c1"># and insert it into the dictionary
</span> <span class="n">d</span><span class="p">[</span><span class="n">f</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_field_type</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="c1"># Use marshmallow's functionality to create a schema from a dictionary
</span> <span class="k">return</span> <span class="n">Schema</span><span class="p">.</span><span class="n">from_dict</span><span class="p">(</span><span class="n">d</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">get_field_type()</code> is pretty simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_field_type</span><span class="p">(</span><span class="n">t</span><span class="p">):</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="s">"Integer"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fields</span><span class="p">.</span><span class="n">Integer</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="s">"Float"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fields</span><span class="p">.</span><span class="n">Float</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="s">"String"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fields</span><span class="p">.</span><span class="n">String</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="s">"UUID"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fields</span><span class="p">.</span><span class="n">UUID</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="s">"Number"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fields</span><span class="p">.</span><span class="n">Number</span><span class="p">()</span>
</code></pre></div></div>
<p>(No, there’s no error handling yet. Not even a default case… <em>sigh</em>.)</p>
<p>The <code class="language-plaintext highlighter-rouge">make_schema</code> function literally returns a <code class="language-plaintext highlighter-rouge">class</code> that I can use to convert objects that match the layout of the dictionary that I built. That’s great… but what good is a <code class="language-plaintext highlighter-rouge">TimedTokenSchema</code> if I don’t have a <code class="language-plaintext highlighter-rouge">TimedToken</code> class in the first place? Hm…</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="o">@</span><span class="n">attr</span><span class="p">.</span><span class="n">s</span>
<span class="k">class</span> <span class="nc">Base</span> <span class="p">():</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">make_base</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">,</span> <span class="n">schema</span><span class="p">):</span>
<span class="n">cls</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">Base</span><span class="p">]),</span> <span class="p">{})</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="s">"schema"</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="s">"dump"</span><span class="p">,</span> <span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="bp">self</span><span class="p">.</span><span class="n">schema</span><span class="p">().</span><span class="n">dump</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="s">"collection"</span><span class="p">,</span> <span class="s">"{}s"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">name</span><span class="p">.</span><span class="n">lower</span><span class="p">()))</span>
<span class="k">for</span> <span class="n">f</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">fs</span><span class="p">,</span> <span class="n">ts</span><span class="p">):</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">attr</span><span class="p">.</span><span class="n">ib</span><span class="p">())</span>
<span class="k">return</span> <span class="n">cls</span>
</code></pre></div></div>
<p>The function <code class="language-plaintext highlighter-rouge">make_base()</code> does some heavy lifting for me. First, it uses the <code class="language-plaintext highlighter-rouge">type()</code> function in Python to dynamically generate a class. In this case, it will create a class with the name <code class="language-plaintext highlighter-rouge">TimedToken</code>, it will use <code class="language-plaintext highlighter-rouge">Base</code> as a superclass, and it will attach no attributes at time of creation. (I actually do not want to overwrite anything, because <code class="language-plaintext highlighter-rouge">attrs</code> does a lot of invisible work.)</p>
<p>The function <code class="language-plaintext highlighter-rouge">setattr</code> is, used casually, probably a bad thing. It literally reaches into a class (not an <em>object</em>, but a <em>class</em>) and attaches attributes to the class. If you’re not used to metaprogramming, this is like… writing the code for the class on-the-fly.</p>
<p>I attach three attributes:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">schema</code> is a field that will hold a marshmallow <code class="language-plaintext highlighter-rouge">Schema</code> class. (Because, in Python, classes are objects too! Wait…) If you look back, you can see that I pass it in after creating it in <code class="language-plaintext highlighter-rouge">make_classes()</code>.</li>
<li><code class="language-plaintext highlighter-rouge">dump</code>, which is a function of zero arguments. It takes a reference to <code class="language-plaintext highlighter-rouge">self</code> (because this class will get instantiated as an object), and it instantiates the <code class="language-plaintext highlighter-rouge">schema</code> that I’ve stored, and then invokes <code class="language-plaintext highlighter-rouge">dump()</code> on… itself. This feels metacircular, but fortunately marshmallow knows to only look for fields that are in the schema. Therefore, we don’t get an infinite traversal here.</li>
<li><code class="language-plaintext highlighter-rouge">collection</code>, which is so I can map directly into Mongo. I take the name of the class, lowercase it, and add an ‘s’. So, <code class="language-plaintext highlighter-rouge">TimedToken</code> becomes <code class="language-plaintext highlighter-rouge">timedtokens</code> as a collection name. I like the idea of the object knowing where it should be stored, so I don’t have to think about it.</li>
</ul>
<p>Once I have these things set up, I walk the fields, and add them to the class. For each, I add a (currently) untyped <code class="language-plaintext highlighter-rouge">attr.ib()</code> to the field. This way, the <code class="language-plaintext highlighter-rouge">TimedToken</code> class will act like a proper <code class="language-plaintext highlighter-rouge">attrs</code> class.</p>
<p>Finally, I return this class, which then gets attached (back in <code class="language-plaintext highlighter-rouge">make_classes()</code>) to the <code class="language-plaintext highlighter-rouge">global()</code> namespace.</p>
<h2 id="what">what?</h2>
<p>If you like the thought of thinking about metaprogramming as much as I do, you’re excited at this point. If you’re wondering why I would do this… well, I’ll go back to my REST handler for TimedTokens:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">flask_restful</span> <span class="kn">import</span> <span class="n">Resource</span><span class="p">,</span> <span class="n">Api</span>
<span class="kn">import</span> <span class="nn">db.models</span> <span class="k">as</span> <span class="n">M</span>
<span class="kn">import</span> <span class="nn">db.db</span> <span class="k">as</span> <span class="n">DB</span>
<span class="k">class</span> <span class="nc">Tokens</span><span class="p">(</span><span class="n">Resource</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">email</span><span class="p">):</span>
<span class="c1"># Create a UUID string
</span> <span class="n">tok</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="p">.</span><span class="n">uuid1</span><span class="p">())</span>
<span class="c1"># Create a TimedToken object, with a current timestamp
</span> <span class="n">t</span> <span class="o">=</span> <span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="n">email</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">tok</span><span class="p">,</span> <span class="n">created_at</span><span class="o">=</span><span class="n">time</span><span class="p">())</span>
<span class="c1"># Grab the correct collection in Mongo for tokens
</span> <span class="n">collection</span> <span class="o">=</span> <span class="n">DB</span><span class="p">.</span><span class="n">get_collection</span><span class="p">(</span><span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">.</span><span class="n">collection</span><span class="p">)</span>
<span class="c1"># Save the token into Mongo by dumping the token through marshmallow
</span> <span class="n">as_json</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">dump</span><span class="p">()</span>
<span class="n">collection</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">as_json</span><span class="p">)</span>
<span class="c1"># Return the token as JSON to the client
</span> <span class="k">return</span> <span class="n">as_json</span>
<span class="n">mapping</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="n">Tokens</span><span class="p">,</span> <span class="s">"/token/<string:email>"</span><span class="p">]</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">add_api</span><span class="p">(</span><span class="n">api</span><span class="p">):</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">mapping</span><span class="p">:</span>
<span class="n">api</span><span class="p">.</span><span class="n">add_resource</span><span class="p">(</span><span class="n">m</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">m</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>
<p>The function <code class="language-plaintext highlighter-rouge">create_classes(cfg)</code> is in the <code class="language-plaintext highlighter-rouge">db.models</code> module. I import that as <code class="language-plaintext highlighter-rouge">M</code>. Because I created classes in this module at the point that Flask was initialized, I now have a whole bunch of dynamically generated classes floating around in there. Those classes were generated <em>from a YAML file</em>, and can be used anywhere in the application.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">models</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">TimedToken</span>
<span class="na">fields</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">email</span>
<span class="pi">-</span> <span class="s">token</span>
<span class="pi">-</span> <span class="s">created_at</span>
<span class="na">types</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">String</span>
<span class="pi">-</span> <span class="s">UUID</span>
<span class="pi">-</span> <span class="s">Number</span>
</code></pre></div></div>
<p>To add a new class to my application, I add it to the YAML file, and restart Flask. This will call <code class="language-plaintext highlighter-rouge">create_classes</code> as part of the init, and the new class will be generated in the <code class="language-plaintext highlighter-rouge">db.models</code> module. I can then use those classes just as if I had written them out, by hand, duplicating the effort of defining both the <code class="language-plaintext highlighter-rouge">attrs</code> class and the marshmallow <code class="language-plaintext highlighter-rouge">Schema</code> class.</p>
<p>In my REST handler, this is where this dynamic programming comes into play:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1"># Create a TimedToken object, with a current timestamp
</span> <span class="n">t</span> <span class="o">=</span> <span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="n">email</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">tok</span><span class="p">,</span> <span class="n">created_at</span><span class="o">=</span><span class="n">time</span><span class="p">())</span>
<span class="c1"># Grab the correct collection in Mongo for tokens
</span> <span class="n">collection</span> <span class="o">=</span> <span class="n">DB</span><span class="p">.</span><span class="n">get_collection</span><span class="p">(</span><span class="n">M</span><span class="p">.</span><span class="n">TimedToken</span><span class="p">.</span><span class="n">collection</span><span class="p">)</span>
<span class="c1"># Save the token into Mongo by dumping the token through marshmallow
</span> <span class="n">as_json</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">dump</span><span class="p">()</span>
<span class="n">collection</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">as_json</span><span class="p">)</span>
<span class="c1"># Return the token as JSON to the client
</span> <span class="k">return</span> <span class="n">as_json</span>
</code></pre></div></div>
<p>I create the object. Then, I use the <code class="language-plaintext highlighter-rouge">collection</code> attribute to ask for a database connection to the collection that holds objects of this type (this is like a table in relational databases). Next, I convert the object to JSON by invoking the <code class="language-plaintext highlighter-rouge">.dump()</code> method, which was added dynamically. In fact, it is using a Schema class that was created dynamically as well, and then embedded in the enclosing object for later use. Finally, I insert this JSON into the Mongo database, and return it to the client, because both Mongo and the client speak JSON natively.</p>
<p>The result is that I’ve metaprogrammed my way around <code class="language-plaintext highlighter-rouge">attrs</code> and <code class="language-plaintext highlighter-rouge">marshmallow</code> to create a dynamic middleware layer that can marshal to-and-from JSON. In doing this, I’ve saved myself a large amount of boilerplate, and I have a single point of control/failure for all of my class definitions, which is external to the code itself. (I think I still need to add the marshalling <em>from</em> JSON, but that won’t be hard.)</p>
<h2 id="what-will-you-do-with-this-matt">what will you do with this, matt?</h2>
<p>Personally, I haven’t found anything on the net that eliminates the boilerplate in marshmallow. In the world of open source, I’d say this is an “itch” that I scratched. It might be an itch other people have.</p>
<p>Perhaps my next post will be about packing code for <code class="language-plaintext highlighter-rouge">pip</code>?</p>
gql: resolvers2020-03-17T00:00:00+00:00https://jadud.com//p/gql-02<p>The circumstances under which I have time are frightening, but I have time to do some learning and programming. So, there we go.</p>
<p>In the previous post, I suggested a GraphQL syntax for Racket that would look like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">define-type</span> <span class="nv">Starship</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">define-enum</span> <span class="nv">Episode</span>
<span class="nv">NEWHOPE</span> <span class="nv">EMPIRE</span> <span class="nv">JEDI</span><span class="p">)</span>
<span class="p">(</span><span class="nf">define-type</span> <span class="nv">Human</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)</span>
<span class="p">(</span><span class="nf">appearsIn</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Episode</span><span class="p">))</span>
<span class="p">(</span><span class="nf">starship</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Starship</span><span class="p">))</span>
<span class="p">))</span>
<span class="p">(</span><span class="nf">define-type</span> <span class="nv">Query</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">human</span> <span class="p">([</span><span class="nf">id</span> <span class="nv">ID!</span><span class="p">])</span> <span class="nv">-></span> <span class="nv">Human</span><span class="p">)</span>
<span class="p">))</span>
</code></pre></div></div>
<p>Before I think about a syntax for resolvers, I need to things. 1) I need some data, and 2) I need to write some functions that do the resolving by hand, so I can see what patterns emerge, as those become either functional or syntactic abstractions.</p>
<p>I’ll take them in reverse order.</p>
<h2 id="a-resolver-for-appearsin">a resolver for ‘appearsIn’</h2>
<p>I’m working from <a href="https://graphql.org/learn/execution/">this page</a> on the GraphQL site, which suggests that a resolver for <code class="language-plaintext highlighter-rouge">appearsIn</code> might look like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">Human</span><span class="p">:</span> <span class="p">{</span>
<span class="nx">appearsIn</span><span class="p">(</span><span class="nx">obj</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">appearsIn</span> <span class="c1">// returns [ 4, 5, 6 ]</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>At first glance, it might be that a resolver for <code class="language-plaintext highlighter-rouge">appearsIn</code> could be:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">appearsIn</span> <span class="nv">o</span> <span class="nv">a</span> <span class="nv">c</span> <span class="nv">i</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">Q</span> <span class="p">(</span><span class="nf">select</span> <span class="nv">episode</span>
<span class="nt">#:from</span> <span class="nv">people_episode</span>
<span class="nt">#:where</span> <span class="p">(</span><span class="nb">=</span> <span class="nv">human_id</span> <span class="o">,</span><span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">o</span> <span class="ss">'id</span><span class="p">))))</span>
<span class="c1">;; If you want to see the SQL generated...</span>
<span class="c1">;; (printf "Q: ~a~n" Q)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">result</span> <span class="p">(</span><span class="nf">query</span> <span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">c</span> <span class="ss">'conn</span><span class="p">)</span> <span class="nv">Q</span><span class="p">))</span>
<span class="c1">;; This gives me a set of rows from the people_episode table.</span>
<span class="c1">;; Then, I want to turn that into a list of enum elements.</span>
<span class="p">(</span><span class="nf">for/list</span> <span class="p">([</span><span class="nf">row</span> <span class="p">(</span><span class="nf">rows-result-rows</span> <span class="nv">result</span><span class="p">)])</span>
<span class="p">(</span><span class="nf">ndx->enum</span> <span class="nv">Episode</span> <span class="p">(</span><span class="nb">vector-ref</span> <span class="nv">row</span> <span class="mi">0</span><span class="p">))))</span>
</code></pre></div></div>
<p>In use, it looks like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">></span> <span class="p">(</span><span class="nf">appearsIn</span> <span class="p">(</span><span class="nf">make-hash</span> <span class="o">'</span><span class="p">((</span><span class="nf">id</span> <span class="o">.</span> <span class="mi">1</span><span class="p">)))</span>
<span class="nv">NULL</span>
<span class="p">(</span><span class="nf">make-hash</span> <span class="o">`</span><span class="p">((</span><span class="nf">conn</span> <span class="o">.</span> <span class="o">,</span><span class="nv">conn</span><span class="p">)))</span>
<span class="nv">NULL</span><span class="p">)</span>
<span class="nv">Q:</span> <span class="o">#</span><span class="nv"><sql-statement:</span> <span class="nv">SELECT</span> <span class="nv">episode</span> <span class="nv">FROM</span> <span class="nv">people_episode</span> <span class="nv">WHERE</span> <span class="p">(</span><span class="nf">human_id</span> <span class="nv">=</span> <span class="nv">?</span><span class="p">)</span> <span class="mi">1</span><span class="nv">></span>
<span class="o">'</span><span class="p">(</span><span class="nf">NEWHOPE</span> <span class="nv">EMPIRE</span> <span class="nv">JEDI</span><span class="p">)</span>
<span class="nv">></span> <span class="p">(</span><span class="nf">appearsIn</span> <span class="p">(</span><span class="nf">make-hash</span> <span class="o">'</span><span class="p">((</span><span class="nf">id</span> <span class="o">.</span> <span class="mi">2</span><span class="p">)))</span>
<span class="nv">NULL</span>
<span class="p">(</span><span class="nf">make-hash</span> <span class="o">`</span><span class="p">((</span><span class="nf">conn</span> <span class="o">.</span> <span class="o">,</span><span class="nv">conn</span><span class="p">)))</span>
<span class="nv">NULL</span><span class="p">)</span>
<span class="nv">Q:</span> <span class="o">#</span><span class="nv"><sql-statement:</span> <span class="nv">SELECT</span> <span class="nv">episode</span> <span class="nv">FROM</span> <span class="nv">people_episode</span> <span class="nv">WHERE</span> <span class="p">(</span><span class="nf">hum</span>
</code></pre></div></div>
<p>You can see that I included a <code class="language-plaintext highlighter-rouge">printf</code> statement to see the SQL being generated. This says that Luke was in all three movies, and the Ewok was only in the third movie. I forget the Ewok’s name… <em>Nugget</em>? <em>Wikkit</em>? I have no memory at this point…</p>
<p>However, my suspicion is that this is <strong>wrong</strong>. Specifically, I think I want to resolve from the top down, and do things in a <em>lazy</em> manner (from a “lazy evaluation” perspective). The resolver for a <code class="language-plaintext highlighter-rouge">human</code> should return a minimal object, with promises on all of its fields. Then, when it is time to do something with the fields, those promises are forced. The reason for this is not all fields are always used in a query, so there is no sense in doing the work if the user doesn’t want the data.</p>
<p>The example above might be the code that could be promised, but it isn’t actually the resolver. The resolver should be a trivial resolver that is automatically forced when needed.</p>
<p>A good, short exploration, helping me see that GraphQL implementations really are about lazy performance in the face of large data.</p>
<h2 id="the-data">the data</h2>
<p>To continue exploring, I set up a small SQLite database. I wrote it to a temporary file, but you might want to do it differently. For now, it’s hardcoded to <code class="language-plaintext highlighter-rouge">/tmp</code> on a Linux machine.</p>
<p>The reason for using an SQL database as the “backend” was to force me to think about GraphQL resolvers. That is, if I had just stored things in memory as hash tables for this exploration, there would be less work to do. By storing the data in an SQL table, I’m forced to think about how GraphQL queries are processed, and how they are actualized by resolvers (or something lower down the chain) in the event that you’re operating over a traditional/relational dataset, instead of a no-SQL dataset.</p>
<p>The code is in a Github Gist. I need to dump the code into a repository at this point, as the exploration went beyond a 1-hour exploration into something that I might continue poking at.</p>
<script src="https://gist.github.com/jadudm/fa1d7355546621c70350a889bd67465e.js"></script>
gql: what is graphql?2020-03-16T00:00:00+00:00https://jadud.com//p/gql-01<p>I’m familiar with REST services. I’m not familiar with GraphQL.</p>
<p>My first thought (which might be wrong on multiple levels) is that GraphQL is a way of defining a services by having a single endpoint. Queries are declarations of the structure you want returned. Servers are functional mappings from types to values.</p>
<p>Racket doesn’t have a GraphQL implementation. In particular, I’m interested in a server implementation. (A client implementation looks like you squirt JSON documents at a server, and get JSON documents back. Although, the query language actually has a syntax, so it might not be strictly JSON…)</p>
<p>I’ll start with <a href="https://graphql.org/learn/execution/">the GraphQL site’s explanation on execution</a>. They use Star Wars examples, which is good, because I have seen Star Wars.</p>
<p>The authors of GraphQL say that you cannot execute a query without a type system, and then they present an example type system. I think this might also be a <em>schema</em>. The top level of the schema in GraphQL is the <em>Query</em>, which apparently all servers must expose. These are the entrypoints to any query that a client might make. So, in the example, it would seem you can query <code class="language-plaintext highlighter-rouge">human</code>s.</p>
<p>I’ve suggested an s-expression based syntax here.</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">define-type</span> <span class="nv">Query</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">human</span> <span class="p">(</span><span class="nf">id</span> <span class="nv">ID</span> <span class="nt">#:not-nullable</span><span class="p">)</span> <span class="nv">-></span> <span class="nv">Human</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<p>The syntax given says that a query for a human must include an <code class="language-plaintext highlighter-rouge">id</code>. So, given this schema, the following query would return no results (or, perhaps, an error?):</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">query</span> <span class="p">{</span>
<span class="nx">human</span> <span class="p">{</span>
<span class="nx">name</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>would fail, but</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">query</span> <span class="p">{</span>
<span class="nx">human</span> <span class="p">(</span><span class="nx">id</span><span class="p">:</span> <span class="mi">1002</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">name</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>would succeed.</p>
<p>The other types might look like</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">define-type</span> <span class="nv">Human</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)</span>
<span class="p">(</span><span class="nf">appearsIn</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Episode</span><span class="p">))</span>
<span class="p">(</span><span class="nf">starship</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Starship</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div></div>
<p>It may be that there are other ways to do this, but for the moment, I’m going to turn this into an “object.” In Javascript land, this means a “hash table.”</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">(</span><span class="k">define-syntax</span> <span class="p">(</span><span class="nf">define-type</span> <span class="nv">stx</span><span class="p">)</span>
<span class="p">(</span><span class="nf">define-syntax-class</span> <span class="nv">field-defn</span>
<span class="nt">#:description</span> <span class="s">"field declaration"</span>
<span class="p">(</span><span class="nf">pattern</span> <span class="p">(</span><span class="nf">field:id</span>
<span class="p">(</span><span class="nf">~optional</span> <span class="nv">args</span> <span class="nt">#:defaults</span> <span class="p">([</span><span class="nf">args</span> <span class="o">#'</span><span class="p">()]))</span>
<span class="nv">-></span> <span class="nv">ret-type</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">syntax-parse</span> <span class="nv">stx</span>
<span class="p">[(</span><span class="nf">_</span> <span class="nv">type</span> <span class="p">(</span><span class="nf">fields</span> <span class="nv">f:field-defn</span> <span class="o">...</span><span class="p">))</span>
<span class="p">(</span><span class="k">with-syntax</span> <span class="p">([</span><span class="nf">field-hashes</span>
<span class="o">#`</span><span class="p">(</span><span class="k">let</span> <span class="p">()</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">f*</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nf">f</span><span class="o">.</span><span class="nv">field</span> <span class="o">...</span><span class="p">)))</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">a*</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nf">f</span><span class="o">.</span><span class="nv">args</span> <span class="o">...</span><span class="p">)))</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">rt*</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nf">f</span><span class="o">.</span><span class="nv">ret-type</span> <span class="o">...</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">for/list</span> <span class="p">([</span><span class="nf">_f</span> <span class="nv">f*</span><span class="p">]</span>
<span class="p">[</span><span class="nf">_a</span> <span class="nv">a*</span><span class="p">]</span>
<span class="p">[</span><span class="nf">_rt</span> <span class="nv">rt*</span><span class="p">])</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">fh</span> <span class="p">(</span><span class="nf">make-hash</span><span class="p">))</span>
<span class="p">(</span><span class="nf">hash-set!</span> <span class="nv">fh</span> <span class="ss">'field</span> <span class="nv">_f</span><span class="p">)</span>
<span class="p">(</span><span class="nf">hash-set!</span> <span class="nv">fh</span> <span class="ss">'args</span> <span class="nv">_a</span><span class="p">)</span>
<span class="p">(</span><span class="nf">hash-set!</span> <span class="nv">fh</span> <span class="ss">'return-type</span> <span class="nv">_rt</span><span class="p">)</span>
<span class="nv">fh</span><span class="p">))])</span>
<span class="o">#`</span><span class="p">(</span><span class="k">define</span> <span class="nv">type</span>
<span class="p">(</span><span class="k">let</span> <span class="p">()</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">h</span> <span class="p">(</span><span class="nf">make-hash</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">_fh</span> <span class="nv">field-hashes</span><span class="p">)</span>
<span class="p">(</span><span class="nf">hash-set!</span> <span class="nv">h</span> <span class="ss">'fields</span> <span class="nv">_fh</span><span class="p">)</span>
<span class="nv">h</span><span class="p">))</span>
<span class="p">)])</span>
<span class="p">)</span>
</code></pre></div></div>
<p>This is a <code class="language-plaintext highlighter-rouge">syntax-parse</code> macro. It allows me to define a new syntactic form in Racket, and generate code based on my new syntax. In this case, I’m going to generate a hash table from this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">define-type</span> <span class="nv">Human</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)</span>
<span class="p">(</span><span class="nf">appearsIn</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Episode</span><span class="p">))</span>
<span class="p">(</span><span class="nf">starship</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Starship</span><span class="p">))</span>
<span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<p>that looks like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ss">'#hash</span><span class="p">((</span><span class="nf">fields</span>
<span class="o">.</span>
<span class="p">(</span><span class="nf">#hash</span><span class="p">((</span><span class="nf">args</span> <span class="o">.</span> <span class="p">((</span><span class="nf">id</span> <span class="nv">ID!</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">field</span> <span class="o">.</span> <span class="nv">name</span><span class="p">)</span>
<span class="p">(</span><span class="nf">return-type</span> <span class="o">.</span> <span class="nv">String</span><span class="p">))</span>
<span class="o">#</span><span class="nv">hash</span><span class="p">((</span><span class="nf">args</span> <span class="o">.</span> <span class="p">())</span>
<span class="p">(</span><span class="nf">field</span> <span class="o">.</span> <span class="nv">appearsIn</span><span class="p">)</span>
<span class="p">(</span><span class="nf">return-type</span> <span class="o">.</span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Episode</span><span class="p">)))</span>
<span class="o">#</span><span class="nv">hash</span><span class="p">((</span><span class="nf">args</span> <span class="o">.</span> <span class="p">())</span>
<span class="p">(</span><span class="nf">field</span> <span class="o">.</span> <span class="nv">starship</span><span class="p">)</span>
<span class="p">(</span><span class="nf">return-type</span> <span class="o">.</span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Starship</span><span class="p">))))))</span>
</code></pre></div></div>
<h2 id="where-this-gets-me">where this gets me</h2>
<p>From my reading, a GraphQL server is, in the first instance, an interpreter of queries. It carries out that interpretation against a backdrop of the type definitions, using the information in the schema to make sure the results coming back are correct. (And, perhaps, guiding the execution.)</p>
<p>A complete schema (or set of types) in this language I’m making up looks like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">define-type</span> <span class="nv">Starship</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">define-enum</span> <span class="nv">Episode</span>
<span class="nv">NEWHOPE</span> <span class="nv">EMPIRE</span> <span class="nv">JEDI</span><span class="p">)</span>
<span class="p">(</span><span class="nf">define-type</span> <span class="nv">Human</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">name</span> <span class="nv">-></span> <span class="nv">String</span><span class="p">)</span>
<span class="p">(</span><span class="nf">appearsIn</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Episode</span><span class="p">))</span>
<span class="p">(</span><span class="nf">starship</span> <span class="nv">-></span> <span class="p">(</span><span class="nf">listof</span> <span class="nv">Starship</span><span class="p">))</span>
<span class="p">))</span>
<span class="p">(</span><span class="nf">define-type</span> <span class="nv">Query</span>
<span class="p">(</span><span class="nf">fields</span>
<span class="p">(</span><span class="nf">human</span> <span class="p">([</span><span class="nf">id</span> <span class="nv">ID!</span><span class="p">])</span> <span class="nv">-></span> <span class="nv">Human</span><span class="p">)</span>
<span class="p">))</span>
</code></pre></div></div>
<p>This renders out as a set of hash tables. The new form, <code class="language-plaintext highlighter-rouge">define-enum</code>, looks like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">define-syntax</span> <span class="p">(</span><span class="nf">define-enum</span> <span class="nv">stx</span><span class="p">)</span>
<span class="p">(</span><span class="nf">syntax-parse</span> <span class="nv">stx</span>
<span class="p">[(</span><span class="nf">_</span> <span class="nv">type</span> <span class="nv">fields:id</span> <span class="o">...</span><span class="p">)</span>
<span class="o">#`</span><span class="p">(</span><span class="k">define</span> <span class="nv">type</span>
<span class="p">(</span><span class="k">let</span> <span class="p">([</span><span class="nf">h</span> <span class="p">(</span><span class="nf">make-hash</span><span class="p">)])</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">_f</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nf">fields</span> <span class="o">...</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">for</span> <span class="p">([</span><span class="nf">f</span> <span class="nv">_f</span><span class="p">]</span>
<span class="p">[</span><span class="nf">ndx</span> <span class="p">(</span><span class="nf">range</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">_f</span><span class="p">))])</span>
<span class="p">(</span><span class="nf">hash-set!</span> <span class="nv">h</span> <span class="nv">ndx</span> <span class="nv">f</span><span class="p">))</span>
<span class="nv">h</span><span class="p">))]))</span>
</code></pre></div></div>
<p>and produces a hash table like this:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ss">'#hash</span><span class="p">((</span><span class="nf">0</span> <span class="o">.</span> <span class="nv">NEWHOPE</span><span class="p">)</span> <span class="p">(</span><span class="nf">1</span> <span class="o">.</span> <span class="nv">EMPIRE</span><span class="p">)</span> <span class="p">(</span><span class="nf">2</span> <span class="o">.</span> <span class="nv">JEDI</span><span class="p">))</span>
</code></pre></div></div>
<h2 id="why-macros">why macros?</h2>
<p>I want as few macros as possible in this system. That seems to be a good rule when developing in Racket. It might be nice if I … leveraged the actual language a bit more. For example, at the moment, if I spell “Episode” as “Eipsode”, nothing is caught in Racket. It would be nice if things would fail immediately.</p>
<p>But, what this does (as a step… I’m just exploring here…) is get me to a space where I can write <em>functions</em> to do all of my work, using these definitions to power the functions. I can write a function that consumes a “type” and a hash table, and check if it is the right “type”. For example:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">->boolean</span> <span class="nv">o</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">o</span> <span class="nv">true</span> <span class="nv">false</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">is-type?</span> <span class="nv">typeh</span> <span class="nv">h</span><span class="p">)</span>
<span class="p">(</span><span class="nf">->boolean</span>
<span class="p">(</span><span class="k">and</span>
<span class="p">(</span><span class="nb">hash-has-key?</span> <span class="nv">h</span> <span class="ss">'type</span><span class="p">)</span>
<span class="p">(</span><span class="nb">hash-has-key?</span> <span class="nv">typeh</span> <span class="ss">'type</span><span class="p">)</span>
<span class="p">(</span><span class="nb">hash-has-key?</span> <span class="nv">typeh</span> <span class="ss">'fields</span><span class="p">)</span>
<span class="p">(</span><span class="nb">equal?</span> <span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">typeh</span> <span class="ss">'type</span><span class="p">)</span>
<span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">h</span> <span class="ss">'type</span><span class="p">))</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">([</span><span class="nf">fields-in-h</span> <span class="p">(</span><span class="nf">hash-keys</span> <span class="nv">h</span><span class="p">)]</span>
<span class="p">[</span><span class="nf">field-names-in-type</span>
<span class="p">(</span><span class="nb">cons</span> <span class="ss">'type</span>
<span class="p">(</span><span class="nb">map</span> <span class="p">(</span><span class="k">λ</span> <span class="p">(</span><span class="nf">h</span><span class="p">)</span> <span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">h</span> <span class="ss">'field</span><span class="p">))</span>
<span class="p">(</span><span class="nf">hash-ref</span> <span class="nv">typeh</span> <span class="ss">'fields</span><span class="p">)))])</span>
<span class="p">(</span><span class="nb">andmap</span> <span class="p">(</span><span class="k">λ</span> <span class="p">(</span><span class="nf">k</span><span class="p">)</span>
<span class="p">(</span><span class="nb">member</span> <span class="nv">k</span> <span class="nv">field-names-in-type</span><span class="p">))</span>
<span class="nv">fields-in-h</span><span class="p">)))))</span>
</code></pre></div></div>
<p>When run on some test cases:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">></span> <span class="p">(</span><span class="nf">is-type?</span> <span class="nv">Starship</span> <span class="p">(</span><span class="nf">make-hash</span> <span class="o">'</span><span class="p">((</span><span class="nf">type</span> <span class="o">.</span> <span class="nv">Starship</span><span class="p">)</span>
<span class="p">(</span><span class="nf">name</span> <span class="o">.</span> <span class="s">"Aluminum Falcon"</span><span class="p">))))</span>
<span class="no">#t</span>
<span class="nv">></span> <span class="p">(</span><span class="nf">is-type?</span> <span class="nv">Starship</span> <span class="p">(</span><span class="nf">make-hash</span> <span class="o">'</span><span class="p">((</span><span class="nf">type</span> <span class="o">.</span> <span class="nv">Starship</span><span class="p">))))</span>
<span class="no">#t</span>
<span class="nv">></span> <span class="p">(</span><span class="nf">is-type?</span> <span class="nv">Starship</span> <span class="p">(</span><span class="nf">make-hash</span><span class="p">))</span>
<span class="no">#f</span>
<span class="nv">></span>
</code></pre></div></div>
<p>The function isn’t complete, but it demonstrates a point. Given a “type” (which is a hash table) and another hash table, we can ask “is that hash table of the given type?” To be of a given type, it has to have a key called <code class="language-plaintext highlighter-rouge">type</code>, the value of that key must be the same as the value stored in the <code class="language-plaintext highlighter-rouge">typeh</code> hash, and all of the fields in the hash table in question must be in the <code class="language-plaintext highlighter-rouge">typeh</code> hash. (Because fields are nullable, I assume this means that a Starship does not have to have a name, but it can still be a starship. Some fields are not nullable, and really, this is where I need to do more reading on GraphQL and its schema language.)</p>
<p>So, put simply, <em>a few macros lift me from syntax to data</em>, and then I can write functions to process that data any way I want. And, I’d rather be writing functions than macros, because they’re… more obvious. The macros are a “little language,” and the functions are what will give meaning to that language.</p>
<h3 id="things-not-handled">things not handled…</h3>
<p>There’s a ton of things not handled in this example. For example, I don’t actually handle types right. (The <code class="language-plaintext highlighter-rouge">(listof x)</code> is not handled at all, for example…) But, this was me doing a quick dive to see where it would lead me.</p>
<h2 id="will-this-continue">will this continue?</h2>
<p>I don’t know. It might be nice to see if I can stand up a small GraphQL server. At least as a “proof of concept.” I still have my <code class="language-plaintext highlighter-rouge">tbl</code> exploration to continue, so I don’t want to get too distracted, but it seemed like something interesting to explore for a bit while I’m holed up at home.</p>
jekyll: static websites with mustaches2020-03-13T00:00:00+00:00https://jadud.com//p/jekyll-01<div class="alert">
<small>
This is the first post for <a href="https://opensource.com/">opensource.com</a> in a series on <a href="https://jekyllrb.com/">Jekyll, a static website generator</a>.
</small>
</div>
<p>In this post, I’ll start you down the road to building sites with Jekyll, a set of tools and modules that leverage templates, variables, and data in powerful ways for the construction of static websites.</p>
<p>In the next post, I’ll tell you a bit of a story about why I like using Jekyll. However, I know you want to dive in, and don’t want to listen to me telling stories. But, my time will come…</p>
<h2 id="installation">installation</h2>
<p>First, you need to install Jekyll. I can’t help you here. You can <a href="https://jekyllrb.com/docs/installation/">follow Jekyll’s documentation</a> for your operating system. If you do a <a href="https://www.youtube.com/results?search_query=jekyll+install">YouTube search</a> for “jekyll install,” you’ll find a bazillion videos that will walk you through this.</p>
<p>Go ahead. I’ll wait.</p>
<h2 id="getting-started">getting started</h2>
<p>No one I know “just builds a website.” I mean, no one wakes up in the morning and says “I NEED TO BUILD A WEBSITE.” We can argue the details, but I will be right. Now, that said, roughly 73% of the world’s population wakes up every day and says <em>I wish I could build a website about late 1990’s cat memes.</em> I have talked to people from all over this crazy world, and that is one fundamental constant… even for college students born in the year 2000. <strong>Late 90’s cat memes.</strong> They’re a thing.</p>
<p>I’ll use asciinema throughout this post series to demonstrate things. Here, I’m going to write the landing page for my new cat memes website. I’m going to write it in Markdown. If you’re not familiar with Markdown, this is a good time. <a href="https://opensource.com/article/19/9/introduction-markdown">Juan Islas</a> wrote an opensource.com article on it back in September of 2019, so you might go check that out. Come back when you’re ready.</p>
<p>You’ll also see that there is some “frontmatter.” This is actually some <a href="https://rollout.io/blog/yaml-tutorial-everything-you-need-get-started/">YAML</a> at the front, which lets me define variables. The only variable I’m defining is a page title. More on that in a moment.</p>
<script id="asciicast-jqbaP37HXUJlO3JHfUVJos0Ju" src="https://asciinema.org/a/jqbaP37HXUJlO3JHfUVJos0Ju.js" async=""></script>
<p>In this terminalcast, you’ll see I ran into an error. The problem was I was running Jekyll in another terminal so that I could preview <em>this</em> post. So, now you’ve seen what happens if you run Jekyll twice… by running <code class="language-plaintext highlighter-rouge">jekyll s</code> (or <code class="language-plaintext highlighter-rouge">jekyll serve</code>), you are running a webserver on your local machine that only you can see. But, you can’t run two at the same time… which I tried to do. I went off, and behind the emerald curtain, I stopped the one for this post, and started the one for my new cat meme site.</p>
<p>The resulting website is, as you can see, <em>amazing</em>.</p>
<p><img src="https://jadud.com//images/posts/oscom-jekyll/20200314-minimal_site.png" alt="a minimal website" /></p>
<h2 id="markdown">markdown</h2>
<p>I like working with Jekyll because it will render my Markdown to HTML. I would rather not write HTML if I can avoid it, especially when I want to focus on content. My first page is a Markdown file with almost no Markdown in it, but as the site grows, I’ll take more advantage of this markup language. Markdown language? Anyway… moving on…</p>
<h2 id="templates">templates</h2>
<p>The content for this site is going to come later; I am going to want to drive this site with data, and that’s not part of step one. Before I get there, I’m going to focus on creating a common look for the site, and making sure that every page will consistently render as validating HTML. This involves one of my favorite features of Jekyll: <strong>templates</strong>.</p>
<p>Right now, I have a single page. While it is… <em>minimalist</em>, it could use a touch of styling. In fact, it could actually use something to make it valid HTML. Right now, Jekyll takes my page (<code class="language-plaintext highlighter-rouge">index.md</code>) and turns it into <code class="language-plaintext highlighter-rouge">index.html</code>, which contains the following:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><p></span>The beginnings of a legend. Or… legends revisited.<span class="nt"></p></span>
<span class="nt"><p></span>Insert content here.<span class="nt"></p></span>
</code></pre></div></div>
<p>This is HTML, but not valid HTML. We can’t have that. At the least, there needs to be an <code class="language-plaintext highlighter-rouge">html</code> tag, and a <code class="language-plaintext highlighter-rouge">head</code> tag, and a <code class="language-plaintext highlighter-rouge">body</code> tag… a whole bunch of stuff.</p>
<p>I’m going to create a new directory called <code class="language-plaintext highlighter-rouge">_layouts</code>. In that folder, I’ll place a file called <code class="language-plaintext highlighter-rouge">default.html</code>. Watch some terminal TV:</p>
<script id="asciicast-cj3EKdEq3Q7qlvogWLBgBMmxE" src="https://asciinema.org/a/cj3EKdEq3Q7qlvogWLBgBMmxE.js" async=""></script>
<p>Here, I’ve created a valid HTML5 “template.” This is the shell that will be wrapped around the pages that I write. Now, if I build the site (either by saying <code class="language-plaintext highlighter-rouge">jekyll build</code>, or if I want to view the site live, <code class="language-plaintext highlighter-rouge">jekyll serve</code>), the contents of <code class="language-plaintext highlighter-rouge">index.html</code> will look like this:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp"><!doctype html></span>
<span class="nt"><html></span>
<span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">></span>
<span class="nt"><title></span> 90's Cat Memes <span class="nt"></title></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
<span class="nt"><p></span>The beginnings of a legend. Or… legends revisited.<span class="nt"></p></span>
<span class="nt"><p></span>Insert content here.<span class="nt"></p></span>
<span class="nt"></body></span>
<span class="nt"></html></span>
</code></pre></div></div>
<p>Oooh.</p>
<p>What happened?</p>
<p>Jekyll took my content, which was in <code class="language-plaintext highlighter-rouge">index.md</code>:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">title</span><span class="pi">:</span> <span class="s">90's Cat Memes</span>
<span class="na">layout</span><span class="pi">:</span> <span class="s">default</span>
<span class="nn">---</span>
The beginnings of a legend. Or... legends revisited.
Insert content here.
</code></pre></div></div>
<p>Jekyll “pushed” this content through the template, <code class="language-plaintext highlighter-rouge">_layouts/default.html</code>:</p>
<!-- -->
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp"><!doctype html></span>
<span class="nt"><html></span>
<span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">></span>
<span class="nt"><title></span> {{ page.title }} <span class="nt"></title></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
{{ content }}
<span class="nt"></body></span>
<span class="nt"></html></span>
</code></pre></div></div>
<!-- -->
<!-- -->
<p>Jekyll looked for <strong>template variables</strong>. These are indicated via the <em>mustaches</em>, or double-curly-brackets (e.g. <code class="language-plaintext highlighter-rouge">{{</code> and <code class="language-plaintext highlighter-rouge">}}</code>). The first variable Jekyll encountered was called <code class="language-plaintext highlighter-rouge">page.title</code>. This variable will look at the frontmatter on a Markdown file (which is YAML), and try and find a <code class="language-plaintext highlighter-rouge">title</code> variable. In the case of my index page, <code class="language-plaintext highlighter-rouge">title</code> “90’s Cat Memes”. Then, Jekyll finds the variable <code class="language-plaintext highlighter-rouge">content</code>, which is set to the body of the page it is processing.</p>
<!-- -->
<p>When Jekyll is done, it emits <code class="language-plaintext highlighter-rouge">index.html</code> in the <code class="language-plaintext highlighter-rouge">_site</code> directory. This file is the combination of the template and the content. The contents of the final <code class="language-plaintext highlighter-rouge">index.html</code> look like this:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp"><!doctype html></span>
<span class="nt"><html></span>
<span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">></span>
<span class="nt"><title></span> 90's Cat Memes <span class="nt"></title></span>
<span class="nt"></head></span>
<span class="nt"><body></span>
<span class="nt"><p></span>The beginnings of a legend. Or… legends revisited.<span class="nt"></p></span>
<span class="nt"><p></span>Insert content here.<span class="nt"></p></span>
<span class="nt"></body></span>
<span class="nt"></html></span>
</code></pre></div></div>
<p>The resulting site is no more impressive than it was before (I mean, it was <em>amazing</em> to begin with, so the bar is already set high), but is now a valid HTML5 webpage. And, for every new page I write in Markdown, I can reuse that template, and be guaranteed a validating results.</p>
<h2 id="more-variables-data-hosting">more variables, data, hosting</h2>
<p>There’s a lot more to Jekyll. This was <em>step one</em>. I’m not going to cover everything in the first post, but you’re on your way. In the next post, I’ll spend more time with variables and the organization of your site data in Jekyll. And, I’ll start pointing to resources online that can carry you further.</p>
<h2 id="get-the-code">get the code</h2>
<p>This code is avaialble on Github in the repository <a href="">opensourcecom-catmemes</a>. The code that was written for this post can be referenced via commit 8ff8202. That is, if you browse or clone <a href="https://github.com/jadudm/opensourcecom-catmemes/tree/8ff82024e6b3e7e50c13d488e08df03580e3a3d2">this URL</a>, you will get the code as it existed when this post was written. The code is licensed CC0, so you can use it any way you want.</p>
tbl: a slice of cake2020-03-10T00:00:00+00:00https://jadud.com//p/slice-of-cake<p>A colleague of mine, <a href="https://dmarsee.carbonmade.com/">Dave Marsee</a>, talks about building a “slice of cake.” When building out a prototype, think about the minimum vertical that you need to demonstrate your ideas, and do that. Now, I’ve already “built one to throw away” (in another language) with regards to <code class="language-plaintext highlighter-rouge">tbl</code>, but a slice of cake is still a good idea here.</p>
<p>What I know about <code class="language-plaintext highlighter-rouge">tbl</code> is that I want it to be an abstraction for working with data. There are fundamental operations I want to be able to do on a <code class="language-plaintext highlighter-rouge">tbl</code>, and my goal is for those to be <em>conceptual</em> in nature. That is, I want a programmer to be able to say “give me all of the data in this set where a person’s age is greater than 18.” I don’t, for the moment, want them to be thinking about arrays, or SQL, or anything else… just the idea of <em>filtering data</em>.</p>
<p>And, in going back to some of my <a href="https://bitbucket.org/jadudm/tbl/src/master/_scribblings/designing-data/dwd.scrbl">thoughts on this prior</a>, I’m reminded that I imagined this as a very <em>functional</em> library. I imagined having:</p>
<ul>
<li>a <code class="language-plaintext highlighter-rouge">tbl</code> structure</li>
<li>functions that consume <code class="language-plaintext highlighter-rouge">tbl</code>s and produce <code class="language-plaintext highlighter-rouge">tbl</code>s</li>
<li>functions that consume <code class="language-plaintext highlighter-rouge">tbl</code>s and produce lists</li>
<li>functions that consume <code class="language-plaintext highlighter-rouge">tbl</code>s and produce values</li>
</ul>
<p>I’m already, in my first exploration, heading down an OO path, where the programmer creates a <code class="language-plaintext highlighter-rouge">tbl</code> structure, and then manipulates it. Hence, I need Dave’s “slice of cake.” I’ve set up my project directory, I’ve written some tests, but now I need to do a deep dive down one implementation pathway, and see where it takes me.</p>
<p>I could design this out from the top down, but that really isn’t the purpose of the exercise. Instead, I want to do a series of structured explorations, and in doing so, demonstrate how to explore software development in an “agile” manner. It means developing stories, prototypes, and tests, and being willing to walk ideas back when they don’t work. (There’s more to it than that, but this is a small exploration for now, and I think that story about agile will suffice.)</p>
<h2 id="all-about-the-lobsters">all about the lobsters</h2>
<p>I live in Maine at the moment, so my running example is going to be about lobsters. It’s either that, or tourists from Massachusetts… so lobsters it is.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tbl</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/..."</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="n">pets_url</span><span class="p">)</span>
<span class="n">a_tbl</span><span class="p">.</span><span class="n">show_columns</span><span class="p">()</span>
</code></pre></div></div>
<p>First, I want to imagine my code differently. I think I’d like it to look like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tbl</span> <span class="kn">import</span> <span class="n">tbl</span><span class="p">,</span> <span class="n">show_columns</span><span class="p">,</span> <span class="n">keep_rows</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/..."</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">(</span><span class="n">pets_url</span><span class="p">)</span>
<span class="n">show_columns</span><span class="p">(</span><span class="n">tbl</span><span class="p">)</span>
<span class="n">new_tbl</span> <span class="o">=</span> <span class="n">keep_rows</span><span class="p">(</span><span class="n">tbl</span><span class="p">,</span> <span class="n">weight</span> <span class="o">></span> <span class="mf">0.8</span><span class="p">)</span>
</code></pre></div></div>
<p>I’ve done a few things here:</p>
<ol>
<li>I don’t want keyword parameters for common cases. Keyword parameters have “weird” scoping semantics in different languages (e.g. Python, R), and if they aren’t necessary for the common case, then they shouldn’t be used.</li>
<li>I’d like to have an expression syntax for filtering queries, so that the idea of keeping rows where “weight > 0.8” is easy. A “say what you mean” (or SWYM?) principle of sorts is at work here. I was able to do this in the Racket version with macros, but the Python case is a bit more subtle.</li>
</ol>
<p><em>an hour later</em></p>
<h3 id="i-really-want-macros">i really want macros</h3>
<p>This is the side effect of spending 20 years of working in a language like Racket: I want to re-invent my language from within my language. This is a classic case where I’d love to be able to design a small query language to simplify the expressions that novices could write, and then walk the AST and “do the right thing” for them. A Scheme- or LISP-based language would make this easy-peasy.</p>
<p>But, that’s a bit trickier to do in Python (not impossible… just tricky). I could do this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">T</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="n">pets_url</span><span class="p">)</span>
<span class="n">show_tbl</span><span class="p">(</span><span class="n">T</span><span class="p">)</span>
<span class="n">newT</span> <span class="o">=</span> <span class="n">T</span> <span class="o">|</span><span class="n">keep_rows_where</span><span class="o">|</span> <span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">weight</span> <span class="o">|</span><span class="n">gt</span><span class="o">|</span> <span class="mf">0.8</span><span class="p">)</span>
<span class="n">show_tbl</span><span class="p">(</span><span class="n">newT</span><span class="p">)</span>
</code></pre></div></div>
<p>because I can override the <code class="language-plaintext highlighter-rouge">|</code> operator. In doing so, I can introduce infix “operators” that are really just functions being applied left-to-right. This has some serious pitfalls waiting in the wings, but for the moment, I’m going to play with it in the “slice of cake.”</p>
<h2 id="a-test">a test</h2>
<p>The work is going to be done in the left-most operator. In this case, it is <code class="language-plaintext highlighter-rouge">|keep_rows_where|</code>. It will make sure that the LHS is a <code class="language-plaintext highlighter-rouge">tbl</code>, and the RHS is an expression of some sort. I’m essentially giong to be building a little language and interpreter. The language will be made up of “infix operators” that return data structures, and then functions like <code class="language-plaintext highlighter-rouge">keep_rows_where</code> will interpret those structures.</p>
<p>The first implementation will have no error checking. We’re going for a “slice of cake.”</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tbl</span> <span class="kn">import</span> <span class="n">tbl</span><span class="p">,</span> <span class="n">show_columns</span><span class="p">,</span> <span class="n">show_tbl</span>
<span class="kn">from</span> <span class="nn">tbl.queries</span> <span class="kn">import</span> <span class="n">gt</span><span class="p">,</span> <span class="n">keep_rows_where</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"http://bit.ly/2IzVqoV"</span>
<span class="k">print</span><span class="p">(</span><span class="s">"The original table:"</span><span class="p">)</span>
<span class="n">T</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="n">pets_url</span><span class="p">)</span>
<span class="n">show_tbl</span><span class="p">(</span><span class="n">T</span><span class="p">)</span>
<span class="k">print</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"The new table:"</span><span class="p">)</span>
<span class="n">newT</span> <span class="o">=</span> <span class="n">T</span> <span class="o">|</span><span class="n">keep_rows_where</span><span class="o">|</span> <span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">weight</span> <span class="o">|</span><span class="n">gt</span><span class="o">|</span> <span class="mf">0.8</span><span class="p">)</span>
<span class="n">show_tbl</span><span class="p">(</span><span class="n">newT</span><span class="p">)</span>
</code></pre></div></div>
<p>This works. It works because I implemented the function <code class="language-plaintext highlighter-rouge">gt</code> as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">GT</span> <span class="o">=</span> <span class="n">NT</span><span class="p">(</span><span class="s">"GT"</span><span class="p">,</span> <span class="p">[</span><span class="s">"lhs"</span><span class="p">,</span> <span class="s">"rhs"</span><span class="p">])</span>
<span class="o">@</span><span class="n">Infix</span>
<span class="k">def</span> <span class="nf">gt</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">):</span>
<span class="k">return</span> <span class="n">GT</span><span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">)</span>
</code></pre></div></div>
<p>All the function does is build a GT data structure, which has two fields: a left-hand side, and a right-hand side. I’m doing no checking. It just shoves data into the fields of the structure.</p>
<p>The function <code class="language-plaintext highlighter-rouge">keep_rows_where</code> looks like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">Infix</span>
<span class="k">def</span> <span class="nf">keep_rows_where</span> <span class="p">(</span><span class="n">lhs</span><span class="p">,</span> <span class="n">rhs</span><span class="p">):</span>
<span class="n">T</span> <span class="o">=</span> <span class="n">lhs</span>
<span class="c1"># The LHS needs to be a tbl, the RHS
</span> <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="ow">is</span> <span class="n">tbl</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"The left-hand side of |keep_rows_where| must be a tbl."</span><span class="p">)</span>
<span class="n">newT</span> <span class="o">=</span> <span class="n">copy</span><span class="p">.</span><span class="n">deepcopy</span><span class="p">(</span><span class="n">T</span><span class="p">)</span>
<span class="c1"># Get the row index based on the field,
</span> <span class="c1"># which will be a Column() structure.
</span> <span class="n">col_index</span> <span class="o">=</span> <span class="n">T</span><span class="p">.</span><span class="n">_get_column_index</span><span class="p">(</span><span class="n">rhs</span><span class="p">.</span><span class="n">lhs</span><span class="p">)</span>
<span class="n">new_rows</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">T</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
<span class="c1"># FIXME: I want to import values appropriately, and I want
</span> <span class="c1"># to have checking integrated somewhere up the chain so
</span> <span class="c1"># that 'gt' comparisons don't happen on the
</span> <span class="c1"># wrong types of data.
</span> <span class="k">if</span> <span class="nb">float</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="n">col_index</span><span class="p">])</span> <span class="o">></span> <span class="nb">float</span><span class="p">(</span><span class="n">rhs</span><span class="p">.</span><span class="n">rhs</span><span class="p">):</span>
<span class="n">new_rows</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="n">newT</span><span class="p">.</span><span class="n">_set_rows</span><span class="p">(</span><span class="n">new_rows</span><span class="p">)</span>
<span class="k">return</span> <span class="n">newT</span>
</code></pre></div></div>
<p>Essentially, I’m writing functions to build up and/or interpret abstract syntax trees. <code class="language-plaintext highlighter-rouge">gt</code> is a function that builds a single node of an AST, which represents the “greater than” operator. The <code class="language-plaintext highlighter-rouge">keep_nodes_where</code> function takes a <code class="language-plaintext highlighter-rouge">tbl</code> and an AST, and interprets the AST in light of the <code class="language-plaintext highlighter-rouge">tbl</code>.</p>
<h2 id="summary">summary</h2>
<p>This slice of cake, when executed, outputs the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The original table:
Name Weight Color
Bart 0.75 Muddy
Flick 1 Muddy
Bubbles 1.2 Blue
Crabby 0.5 Muddy
The new table:
Name Weight Color
Flick 1 Muddy
Bubbles 1.2 Blue
</code></pre></div></div>
<p>The new table is generated functionally via this expression:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">newT</span> <span class="o">=</span> <span class="n">T</span> <span class="o">|</span><span class="n">keep_rows_where</span><span class="o">|</span> <span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">weight</span> <span class="o">|</span><span class="n">gt</span><span class="o">|</span> <span class="mf">0.8</span><span class="p">)</span>
</code></pre></div></div>
<p>Do I like this? I don’t know. Have I thoroughly explained this slice of cake? No. I have spent many, many years writing linkers, interpreters, compilers, and transpilers for all manner of weird systems, and the idea of “creating a language” on-the-fly to solve a problem is natural to me. If I want to continue down this road, I have to think about the potential implications for novices. What kinds of errors might they discover with this approach? How can I support them in their learning? Will I need multiple approaches… that is, will this work for simple things, but not complex things?</p>
<p>The answer to the last question is almost always “yes.” Developing simple things that scale up in complexity gracefully is difficult, but arguably easier than developing a complex thing that also (just happens) to express simple things, simply. So, there is hope, but it will take more explorations.</p>
<p>Today, I explored in the “expressions” branch. I don’t know if I’ll keep this exploration, but for now, it’s in the repository.</p>
tbl: testing round one2020-03-09T00:00:00+00:00https://jadud.com//p/tbl-testing<p>In the previous post, I rearranged the structure of the code to align it more closely with what we might expect for a Python package that can be installed via <code class="language-plaintext highlighter-rouge">pip</code>. It is never too early to begin arranging the structure of a project appropriately, and it is never too early to begin <strong>testing</strong>.</p>
<p>I have a clear idea of what I expect this project to be, because I’ve written it before, and written tests, documentation, and a draft paper on it. However, coming soon, I’ll need to get those ideas expressed here in something resembling a coherent design. For now, though, I’ll continue exploring a bit. But, I’ll do it properly.</p>
<p>My “driver” code right now looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tbl</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pub?gid=0&single=true&output=csv"</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="n">pets_url</span><span class="p">)</span>
<span class="n">a_tbl</span><span class="p">.</span><span class="n">show_columns</span><span class="p">()</span>
</code></pre></div></div>
<p>And, when executed, it outputs this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Name : None
Weight : None
Color : None
</code></pre></div></div>
<p>That’s fine, because those are the contents of the header row of my spreadsheet. But, saying “it looks right” is no way to test software. Although I haven’t articulated a complete design, one thing I know my library will need to be able to do is read a CSV file from a URL and convert it into a <code class="language-plaintext highlighter-rouge">tbl</code> (a structure that is yet to be fully described).</p>
<p>So, for my next trick, I’ll put some light testing in place. Even though the structures might change, and this might require me to re-write some tests, there are two important things to committing to testing early: first, I can continue exploring with confidence that the code I’ve written so far is working the way I expect. Second, even if I change the structures over which I’m writing the tests, it should remain true that the tests themselves are “good.” Or, put another way, I may have to rewrite the tests, but the tests will be a framework that will remain constant regardless of how the structures change. This will again provide confidence in the library in the face of refactoring, and give me velocity both in terms of development and the development of future tests.</p>
<p>(Some googling suggests that <a href="https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure">there may be more than one way to do it</a>, and that I may have some additional refactoring to do. But, I’ll proceed with documentation from <a href="https://pytest.readthedocs.io/en/2.7.3/index.html">pytest</a> for now, knowing that a first step that is reasonable is better than no step at all.)</p>
<h2 id="my-friend-pytest">my friend pytest</h2>
<p>There are many testing frameworks in many languages. I’m going to leverage pytest here because it is lightweight to leverage, and I’ll take speed over complexity early in any programming endeavor. And, in truth, I may never need anything more complex than pytest, because it really is quite capable.</p>
<h3 id="first-problem-not-importable">first problem: not importable</h3>
<p>Uh-oh.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(venv) jadudm@lego:~/git/pytbl$ pip install -e .
Directory '.' is not installable. File 'setup.py' not found.
</code></pre></div></div>
<p>It looks like I need a <code class="language-plaintext highlighter-rouge">setup.py</code>. My initial setup looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">setuptools</span> <span class="kn">import</span> <span class="n">setup</span>
<span class="n">setup</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'tbl'</span><span class="p">,</span>
<span class="n">version</span><span class="o">=</span><span class="s">'0.1'</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s">'A tabular way to think about data.'</span><span class="p">,</span>
<span class="n">url</span><span class="o">=</span><span class="s">'http://github.com/jadudm/pytbl'</span><span class="p">,</span>
<span class="n">author</span><span class="o">=</span><span class="s">'Matt Jadud'</span><span class="p">,</span>
<span class="n">author_email</span><span class="o">=</span><span class="s">'matt@jadud.com'</span><span class="p">,</span>
<span class="n">license</span><span class="o">=</span><span class="s">'MIT'</span><span class="p">,</span>
<span class="n">packages</span><span class="o">=</span><span class="p">[</span><span class="s">'tbl'</span><span class="p">],</span>
<span class="n">zip_safe</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<p>And, now:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(venv) jadudm@lego:~/git/pytbl$ pip install -e .
Obtaining file:///home/jadudm/git/pytbl
Installing collected packages: tbl
Running setup.py develop for tbl
Successfully installed tbl
</code></pre></div></div>
<p>That ‘pip install’ command creates a symlink to my package directory in the venv. This way, I can keep developing the code and running tests, and I will always be testing against the “live” code.</p>
<p>I can now run <code class="language-plaintext highlighter-rouge">python3 -m pytest</code>, and get:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(venv) jadudm@lego:~/git/pytbl$ python3 -m pytest
======================================= test session starts ========================================
platform linux -- Python 3.6.9, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /home/jadudm/git/pytbl
collected 0 items
====================================== no tests ran in 1.03s =======================================
</code></pre></div></div>
<p>This is good.</p>
<h2 id="testing-one-function">testing one function</h2>
<p>At this point, I want to test the import of the CSV file. There’s a lot of tests I can run at this point, because (really) my first function is almost too complex.</p>
<p>What if the programmer using <code class="language-plaintext highlighter-rouge">tbl</code>…</p>
<ul>
<li>gives me a bad URL?</li>
<li>says there is a header, but there isn’t?</li>
<li>says there isn’t a header, but there is?</li>
<li>gives me a spreadsheet with a header and no data?</li>
<li>gives me a URL to more data than I can hold in memory?</li>
<li>gives me a URL to more data than I can store on disk?</li>
<li>gives me a URL to something that is not a CSV/spreadsheet?</li>
<li>gives me a spreadsheet with data, and a header, and says it has a header?</li>
</ul>
<p>The last one is actually the easy/ideal case. The others are failure cases, some of which might be difficult to catch early. But, anywhere you give a programmer the ability to pull data in—especially over the network—you have to begin thinking in a <em>really</em> paranoid way. And, when dealing with novice programmers, they might be taking random stabs at things, or (more likely) really trying hard to figure things out, but this will still be in the space of “desperate guessing” in some cases.</p>
<p>So, time to write some tests.</p>
<h3 id="a-bad-url">a bad URL</h3>
<p>What is a “bad” URL? In this case, we’ll call it a URL that does not point to a CSV file, or (worse) is simply not a URL. This could look like the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"lobster"</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="bp">True</span><span class="p">])</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="s">"lobster"</span><span class="p">])</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="s">"https://lobster.org/northaven.csv"</span><span class="p">])</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"http"</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https"</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster"</span><span class="p">)</span>
<span class="c1"># Technically, this is a good URL, but we have no idea if
# it serves up a CSV file.
</span><span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/"</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/northhaven"</span><span class="p">)</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/northhaven.txt"</span><span class="p">)</span>
</code></pre></div></div>
<p>This has begun to suggest what we’re going to consider a “good” URL. This may not be obvious, but I’m going to bet money that <em>validating URLs is hard</em>. There’s whole specs on how to format a URL/URI, so… why would I want to try and write this myself? A bit of googling confirms that Python has what I want: a <a href="http://bit.ly/2vVxzx4">validation</a> package for URLs (and other stuff). I found this from a Stack Overflow thread, which (had I followed the first recommendation), I would have ended up implementing my own. <em>Not a good idea</em>.</p>
<p>I’m going to have to do the checking inside of the call to the <code class="language-plaintext highlighter-rouge">tbl</code> constructor, but I’ll farm it out to a helper function. I’ve created a module called <code class="language-plaintext highlighter-rouge">validation.py</code> that will contain all of my validation code, so that the class doesn’t get too heavy. (Is this good OOP? Probably not.)</p>
<p>My first validation function looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">validators</span> <span class="k">as</span> <span class="n">val</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span> <span class="k">as</span> <span class="n">NT</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="n">OK</span> <span class="o">=</span> <span class="n">NT</span><span class="p">(</span><span class="s">"OK"</span><span class="p">,</span> <span class="p">[])</span>
<span class="n">KO</span> <span class="o">=</span> <span class="n">NT</span><span class="p">(</span><span class="s">"KO"</span><span class="p">,</span> <span class="p">[</span><span class="s">"code"</span><span class="p">,</span> <span class="s">"message"</span><span class="p">])</span>
<span class="c1"># Error Codes
</span><span class="n">BAD_URL</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">DOES_NOT_END_IN_CSV</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">URL_NOT_A_STRING</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">def</span> <span class="nf">_check_from_sheet</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">has_header</span><span class="p">):</span>
<span class="c1"># These will "fail fast."
</span> <span class="c1"># Make sure it is a string.
</span> <span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="n">KO</span><span class="p">(</span><span class="n">URL_NOT_A_STRING</span><span class="p">,</span> <span class="s">"The URL you passed does not look like a string: {}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">url</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">val</span><span class="p">.</span><span class="n">url</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
<span class="k">return</span> <span class="n">KO</span><span class="p">(</span><span class="n">BAD_URL</span><span class="p">,</span> <span class="s">"The URL '{}' appears to be invalid."</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">url</span><span class="p">))</span>
<span class="c1"># Should the URL end in CSV? Am I guaranteed that a Google Sheets
</span> <span class="c1"># CSV URL will end this way? This might get tricky.
</span> <span class="c1"># If it is a sheets URL, and the letters "csv" appear in the URL, it will be OK.
</span> <span class="k">if</span> <span class="p">(</span><span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="s">"docs.google.com"</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span>
<span class="ow">and</span> <span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="s">"spreadsheets"</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span>
<span class="ow">and</span> <span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="s">"csv"</span><span class="p">,</span> <span class="n">url</span><span class="p">)):</span>
<span class="k">return</span> <span class="n">OK</span><span class="p">()</span>
<span class="c1"># If it isn't a sheets URL, then perhaps it is a valid URL that
</span> <span class="c1"># just points to a CSV. Therefore, it should end in '.csv'.
</span> <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="s">".csv$"</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span> <span class="ow">or</span> <span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="s">".CSV$"</span><span class="p">,</span> <span class="n">url</span><span class="p">)):</span>
<span class="k">return</span> <span class="n">KO</span><span class="p">(</span><span class="n">DOES_NOT_END_IN_CSV</span><span class="p">,</span> <span class="s">"The file you linked to does not end in '.csv'."</span><span class="p">)</span>
<span class="k">return</span> <span class="n">OK</span><span class="p">()</span>
</code></pre></div></div>
<p>I’ve created two unique types – OK and KO – and started defining some error codes. I don’t know how I’ll use them yet, but I do like the idea of being able to ask if something is <code class="language-plaintext highlighter-rouge">validation.OK()</code>. Now, I need to see if I can write test code for all of the above examples, and get back responses that I expect.</p>
<p>This has turned into the following test file:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tbl</span>
<span class="kn">from</span> <span class="nn">tbl</span> <span class="kn">import</span> <span class="n">validation</span> <span class="k">as</span> <span class="n">V</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pub?gid=0&single=true&output=csv"</span>
<span class="k">def</span> <span class="nf">test_bool_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_int_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_url_str_not_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"lobster"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_url_list_bool</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="bp">True</span><span class="p">])</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_url_list_int</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_url_list_str</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="s">"lobster"</span><span class="p">])</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_list_good_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="p">[</span><span class="s">"https://lobster.org/northaven.csv"</span><span class="p">])</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_protocol</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"http"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_protocol_s</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_partial_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="c1"># Technically, this is a good URL, but we have no idea if
# it serves up a CSV file.
</span><span class="k">def</span> <span class="nf">test_good_url_not_csv</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_good_url_not_csv2</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/northhaven"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_good_url_txt</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://lobster.org/northhaven.txt"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_goog_url_incomplete</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pub?gid=0&single=true"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">KO</span>
<span class="k">def</span> <span class="nf">test_complete_goog_url</span><span class="p">():</span>
<span class="n">a_tbl</span> <span class="o">=</span> <span class="n">tbl</span><span class="p">.</span><span class="n">tbl</span><span class="p">(</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pub?gid=0&single=true&output=csv"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">a_tbl</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="ow">is</span> <span class="n">V</span><span class="p">.</span><span class="n">OK</span>
</code></pre></div></div>
<p>It feels repetitious. In fact, I now realize that I could create some tables/lists of input data, and do all of this testing in a loop. However, I’ll leave this for the moment: I now have good tests over the possible inputs a user might throw my way, and that makes me happy.</p>
<h3 id="error-codes-or-exceptions">error codes, or exceptions?</h3>
<p>What should this library do if a user provides a bad URL? Should a_tbl be an object that is in a bad state, but the object knows it, and therefore won’t do bad things? Or, should the object throw exceptions, causing the user’s code to crash out?</p>
<p>This question has answers that are less obvious than I would like. There are different schools of thought on language/class design around the topic of exceptions. Python likes exceptions, golang prefers error codes. This will require some thought and a bit more reading, as I prefer the latter, but wonder if it is more important to be Pythonic.</p>
<p>And, regardless of whether it is “Pythonic,” the question really is “what would be most usable to a novice programmer working with data?”</p>
<h2 id="still-not-done-testing">still not done testing…</h2>
<p>And, if you’re still reading, you’ll realize that I’m not done testing. That is, I had imagined more ways the user might try and abuse my library than I actually tested for. So far, I’ve only handled the “bad URL” condition. What if the CSV they hand me is malformed? That’s another whole round of validation that has to come after I check if the CSV URL is even valid. Then, I have to check if I can fetch the URL, and if it is a reasonable size, and …</p>
<p>For tomorrow. For tonight, I’ve made progress.</p>
tbl: structuring the project2020-03-09T00:00:00+00:00https://jadud.com//p/tbl-structure<p>It helps, early, to structure a project well.</p>
<p>Having written a version of <code class="language-plaintext highlighter-rouge">tbl</code> in another language once before, and now revisiting the design and implementation in Python, I know I should think about how the project is structured from the start.</p>
<p>I find <a href="https://docs.python-guide.org/writing/structure/">The Hitchhiker’s Guide to Python</a> to be a wholly remarkable book, like it’s namesake (<em>The Hitchhiker’s Guide to the Galaxy</em>). As a result, I’ll restructure the code around their recommended format for a Python module at this point. <code class="language-plaintext highlighter-rouge">tbl</code> will become a module that I want to <code class="language-plaintext highlighter-rouge">pip install</code>, so it makes sense to clean it up now.</p>
<h2 id="the-layout">the layout</h2>
<p>First, I’m going to move some things around. The project directory looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>drwxr-xr-x 9 jadudm jadudm 4096 Mar 8 20:10 .
drwxr-xr-x 7 jadudm jadudm 4096 Mar 8 20:07 ..
drwxr-xr-x 2 jadudm jadudm 4096 Mar 8 20:08 docs
drwxr-xr-x 8 jadudm jadudm 4096 Mar 9 08:47 .git
-rw-r--r-- 1 jadudm jadudm 25 Mar 8 15:03 .gitignore
-rw-r--r-- 1 jadudm jadudm 1093 Mar 8 14:42 LICENSE
-rw-r--r-- 1 jadudm jadudm 239 Mar 8 20:42 lobsters.py
-rw-r--r-- 1 jadudm jadudm 79 Mar 8 20:11 Makefile
drwxr-xr-x 2 jadudm jadudm 4096 Mar 8 15:01 __pycache__
-rw-r--r-- 1 jadudm jadudm 1762 Mar 9 08:47 README.md
-rw-r--r-- 1 jadudm jadudm 15 Mar 8 14:45 requirements.txt
drwxr-xr-x 3 jadudm jadudm 4096 Mar 9 08:07 tbl
drwxr-xr-x 2 jadudm jadudm 4096 Mar 8 20:08 tests
drwxr-xr-x 6 jadudm jadudm 4096 Mar 8 13:53 venv
drwxr-xr-x 3 jadudm jadudm 4096 Mar 9 08:12 .vscode
</code></pre></div></div>
<p>Because I want this to become a library that I can <code class="language-plaintext highlighter-rouge">pip install</code>, I’ve taken a few necessary steps in that direction. First, I’ve created a subdirectory called <code class="language-plaintext highlighter-rouge">tbl</code>, and in that directory, I moved the file previously called <code class="language-plaintext highlighter-rouge">main.py</code> and called it <code class="language-plaintext highlighter-rouge">__init__.py</code>. The secret here is that, in Python, any directory containing a file called <code class="language-plaintext highlighter-rouge">__init.py</code> is considered a <em>module</em>. Modules are the fundamental unit of organization for libraries of code, so this is a clear and necessary step.</p>
<p>Running <code class="language-plaintext highlighter-rouge">ls -al tbl</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(venv) jadudm@lego:~/git/pytbl$ ls -al tbl/
total 20
drwxr-xr-x 3 jadudm jadudm 4096 Mar 9 08:07 .
drwxr-xr-x 9 jadudm jadudm 4096 Mar 8 20:10 ..
-rw-r--r-- 1 jadudm jadudm 2077 Mar 9 08:19 __init__.py
drwxr-xr-x 2 jadudm jadudm 4096 Mar 9 08:14 __pycache__
-rw-r--r-- 1 jadudm jadudm 406 Mar 9 08:14 util.py
</code></pre></div></div>
<p>I also, in the last commit, created a small utility library. I’ll blog about that later.</p>
<p>At the top level, there are directories for <code class="language-plaintext highlighter-rouge">tests</code> and <code class="language-plaintext highlighter-rouge">docs</code>, which I’ll begin filling in soon.</p>
<p>The <code class="language-plaintext highlighter-rouge">.gitignore</code> is an important file; it says which files and directories I never want to put under version control. For example, my <a href="http://bit.ly/2v6zyON">venv</a> is something I never want to see in the repository… it’s a local working environment for my Python interpreter, so that when I install libraries to support the use of <code class="language-plaintext highlighter-rouge">tbl</code>, I don’t install them globally… instead, they get installed in the <code class="language-plaintext highlighter-rouge">venv</code> directory. (This, too, is probably a good subject for another post… or, at least, a few more links.)</p>
<p>The <code class="language-plaintext highlighter-rouge">requirements.txt</code> says which libraries are needed to support <code class="language-plaintext highlighter-rouge">tbl</code>. Right now, I have the <a href="https://github.com/facebookresearch/hydra">hydra</a> library from Facebook (I think I’m going to need it later) and the <a href="https://requests.readthedocs.io/en/master/">requests</a> library, which makes working with content over the ‘net a lot easier.</p>
<p>It turns out (for those following along) <em>that the structure of code is often as important, if not moreso, than the code itself</em>. If I don’t place files in the right places, with the right names, then my code is not, and cannot, become a Python library. Similarly, if I want to write an application in Java for Android… some files have to be named specific ways, and be placed in particular places in order for them to be assembed into an app. This is a critical, but sometimes invisible, part of writing code that is too often glossed over when students are getting started programming.</p>
<h2 id="structure-complete">structure, complete</h2>
<p>This is a first step in shifting the structure of the project around. There will be more, but for now it brings <code class="language-plaintext highlighter-rouge">tbl</code> one step closer to being installable as a Python package via <code class="language-plaintext highlighter-rouge">pip</code>.</p>
tbl: abstractions and imports2020-03-08T00:00:00+00:00https://jadud.com//p/tbl-import<p>There is a debate in the data science community (and, in particular, in the R community) as to whether one should learn libraries or a core language when working with data. For R programmers, it is a question of learning the dyplr-family of libraries vs. working directly in the language without those tools. This is, from what I can gather, a sometimes divisive argument.</p>
<p>As an educator and a developer, I’ve come to appreciate the power of a good abstraction and tools that support that abstraction. I want tools that help me map the way I think about a problem directly into code. Or, I want tools that will shape the way I think about problems, so that I can more concisely express solutions using those tools. Here, “tools” means “libraries” or “programming languages.”</p>
<p>My approach to working on <code class="language-plaintext highlighter-rouge">tbl</code> has therefore been to think about how to make it easy for beginners to work with interesting data. “Interesting” might mean it is personally meaningful, and possibly a small amount of data. “Interesting” might mean it is large and complex data… but, important to the developer. This means <code class="language-plaintext highlighter-rouge">tbl</code> needs to support data that is both small and big, and it needs to be easy for a developer to get started.</p>
<h2 id="imports-the-tbl">Imports: The <code class="language-plaintext highlighter-rouge">tbl</code></h2>
<p>I want my beginner to be thinking about tabular data. So, I want a <code class="language-plaintext highlighter-rouge">tbl</code> to make it easy to turn a spreadsheet into something that they can do meaningful work with. In this way, the first abstraction that a programmer sees with <code class="language-plaintext highlighter-rouge">tbl</code> is the spreadsheet, and they can map that abstraction directly into the library. A <code class="language-plaintext highlighter-rouge">tbl</code> is, in the first instance, a spreadsheet.</p>
<p><img src="/images/posts/20200308-blue-lobster.jpg" align="right" width="20%" alt="Blue lobster photo by David Clode on Unsplash." />
Here, for example, is the <a href="https://docs.google.com/spreadsheets/d/1aCjhAepc2Ms-eIr97hPb2FvzEKmqWK1w6mH8MnVUtRs/edit?usp=sharing">spreadsheet</a> I use to keep track of my pet lobsters.</p>
<div style="width: 100%;">
<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pubhtml/sheet?headers=false&gid=0&range=A1:C5" style="display:block;margin: 0 auto;"></iframe>
</div>
<p>In the case that I have small, but interesting data, it would be nice if I could have a GUI for manipulating/entering that data, and could quickly pull it into a program that I’m writing without having to go through lots of hoops. <strong>If I want a good GUI for manipulating tabular data, I should use a spreadsheet!</strong> As it happens, not only can I use Google Sheets for this, but Sheets will let me publish my data to the web for embedding, and Sheets also makes it easy for me to pull the CSV directly. But, I don’t want a programmer to know that there’s a CSV file waiting for them… I just want them to be able to import the data.</p>
<p>Something that might look like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tbl</span>
<span class="n">pets_url</span> <span class="o">=</span> <span class="s">"https://docs.google.com/spreadsheets/d/e/2PACX-1vSK2rd47ogfI2CpQi2L6HDo9AOEhnhqBN4zR4kLPUO28vBzmlc8XQWrvTfBYCU0ePf478yxcNKdOy5m/pub?gid=0&single=true&output=csv"</span>
<span class="n">pets_tbl</span> <span class="o">=</span> <span class="n">tbl_from_sheet</span><span class="p">(</span><span class="n">pets_url</span><span class="p">)</span>
</code></pre></div></div>
<p>To test this out, I’ll drop some code in <code class="language-plaintext highlighter-rouge">lobsters.py</code> and <code class="language-plaintext highlighter-rouge">tbl.py</code>.</p>
<script src="https://gist.github.com/jadudm/c1e4f42ff3b1abb58f1875e13a646cf1.js"></script>
<p>When I run</p>
<p><code class="language-plaintext highlighter-rouge">python lobsters.py</code></p>
<p>I get the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(venv) jadudm@lego:~/git/pytbl$ python lobsters.py
['Name', 'Weight', 'Color']
['Bart', '0.75', 'Muddy']
['Flick', '1', 'Muddy']
['Bubbles', '1.2', 'Blue']
['Crabby', '0.5', 'Muddy']
</code></pre></div></div>
<p>Now, this doesn’t get us all the way, but it takes the first step: I’ve created a data table in Google Sheets, and I can pull it in via the Requests library as a CSV document that is parseable and iterable. So far, so good.</p>
<h2 id="abstractions">Abstractions</h2>
<p>The next step is to design the abstraction for a <code class="language-plaintext highlighter-rouge">tbl</code>.</p>
<p>So that no one post gets too long, this will be the subject of tomorrow’s explorations. The goal here will be to avoid creating an <a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/">abstraction that is overly leaky</a>, to use Joel Spolsky’s terminology. I’m going to want a way to work with this data that:</p>
<ul>
<li>Can store the data locally or remotely</li>
<li>Can work with centralized and decentralized data</li>
<li>Can leverage multiple concrete representations, invisibly</li>
<li>Can operate on the data closer conceptually rather than syntactically</li>
<li>Can support programmers at multiple levels of experience and expertise</li>
</ul>
<p>These are going to be a complex set of requirements, and I’ll miss the first time. (This is actually the <em>second time</em> I’ve explored this idea; I’ve already done a deep dive in the programming language Racket, so in truth, I’ve got some ideas in my back pocket.)</p>
<p>For the exercise that this is, I’ll probably do the following:</p>
<ul>
<li>Explore SQLite for local and CockroachDB for remote/distributed data</li>
<li>Use an ORM (SQLalchemy, Peewee, or similar) to manage those relationships</li>
<li>Use R, Python (pandas), Pyret, and other data languages/frameworks as inspiration</li>
<li>Choose some authentic use-cases to drive the development (e.g. perhaps interface into some of my own research data to drive both the research and the development of <code class="language-plaintext highlighter-rouge">tbl</code> forward)</li>
</ul>
<h2 id="get-the-code">Get the code!</h2>
<p>It’s early days, but you can get the code. This work will be open (as all of my work is, whenever possible), at Github. I’ll call the project <a href="https://github.com/jadudm/pytbl">pytbl</a>.</p>
tbl: thinking about data2020-03-07T00:00:00+00:00https://jadud.com//p/tbl<p>For the past six months, I have been working in a space where all of my intellectual output was owned by the company I worked for. As a result, there were projects that simply had to sit. That time has passed, and I have two that I want to revisit: the teaching and learning of data science in the broader context of computing, and my own explorations regarding the principles and practices that tooling can embody when it comes to working with data. I might even sneak some IoT/embedded systems in, but there are only so many hours in the day.</p>
<p>I’ll probably sneak some articles about hardware and firmware design in here as well, because that’s part of the data chain, so-to-speak.</p>
<h2 id="teaching-and-learning-of-data">teaching and learning of data</h2>
<p>Last spring and summer, I was thinking hard about the teaching and learning of <em>data</em>.</p>
<p><a href="https://photos.google.com/share/AF1QipM4nF5IbEJk0q4EiMI6V1XxYRkkyKpoLCOqjEEnpjwtkOJL7kb4ahZOEUF65Xq5Ow?key=RlNDSUNSZjUxc1dqQ0lfWjFiT2hsYkI0RURodWpn&source=ctrlq.org"><img src="https://lh3.googleusercontent.com/7_4aQQPN_0_ppLSk-nH8JxGJIX8wjsRk4MAP84SBg--IJ0HXZwXA0BWHawnrHf1JgzHhmfeGYsD31wD_rDTrVWSe0ghN6lnzin9WlNo6TizymBqPjmIIVhtlkQtFvTqOq9ICXMdvQik=w2400" /></a></p>
<p>Along with my colleague at Fulbright University Vietnam, <a href="https://fulbright.edu.vn/our-team/sebastian-dziallas/">Sebastian Dziallas</a>, we began laying out a two-course sequence that would introduce students to human-centered principles of collectiong, working with, and questioning data in deep and meaningful ways. Who asks the questions? Who collects the data? How is it collected? What biases do we bring to the analysis? How do we report our findings, and to whom? What hardware and software is needed to support this learning in active and meaningful ways?</p>
<p>This is one space that I will begin documenting and unpacking here. Sebastian and I spent a year discussing related topics prior, and put in weeks of intense work on this during the summer. It has been unpacked (in part) in notes and documents, but should be unpacked more fully before the memory fades completely.</p>
<h2 id="embodied-ideas-in-tooling">embodied ideas in tooling</h2>
<p>One red thread to my time at Bates was thinking hard about how you introduce programming and the analysis of data to students from across the full breadth of the liberal arts. Computation has a place in every discipline, but <em>how</em> and <em>why</em> it is employed varies greatly. Artists might work with real-time data as part of performance, while social scientists generate their data through survey and interview, while natural scientists might use experiment or simulation to develop the data that informs their analysis. The context to each of these matters, the computational tools are not strictly the same, and the metalearning is drastically different in each case. What, then, become the driving <em>principles</em> that might unify these kinds of inquiry, and how can those principles be exemplified in the teaching and tooling that we bring to our students?</p>
<p>To explore this, I began work on <code>tbl</code>, a library of code in Racket that explores these concepts.</p>
<p>Now that I am once again free to author open code and write about my ideas without them explicitly being owned by others, I will be revisiting this work here over the coming weeks.</p>
Debugging I2C2019-03-09T00:00:00+00:00https://jadud.com//p/i2c<p><img src="/images/blog/sensing/i2cdriver.jpeg" align="right" width="400px" /></p>
<p>The <a href="https://i2cdriver.com/">I2CDriver</a> is a new product from James Bowman of <a href="https://www.excamera.com/sphinx/index.html">Excamera Labs</a>. It is pictured at right. I’m going to get around to talking about it shortly.</p>
<p>Our controller board has two components on it: a processor and a clock. The processor talks to the clock using a two-wire protocol called <a href="https://i2c.info/">I2C</a>. I2C allows a controller to talk to peripherals using a low-speed serial protocol. More importantly, it allows for multiple peripherals to be on the same bus. Here, “bus” means “everyone is talking on the same wires,” but each peripheral has a unique address, so that they only listen for (and respond to) their own messages.</p>
<p>On our controller board, we can do things like set the clock by sending I2C messages to the clock chip. Or, we can plug additional layers into our sensor stack, and use I2C to talk to other peripherals on other layers. Most recently, we have been working on a board that has only one component on it: a barometric pressure and temperature sensor.</p>
<p><img src="/images/blog/sensing/ms5803.jpg" align="right" width="300px" /></p>
<p>The sensor is small; while it looks huge (at right), it is actually only around 6mm in diameter. It has 8 connections underneath it, and two of them are for a processor to talk to it using I2C.</p>
<p>We have tried, repeatedly, to build boards that use this sensor. We have assembled three, and each one failed to work. We put some code on our controller, ran it, and <em>nothing worked</em>. We have one board with the same sensor from Sparkfun, and when we run our code, it <em>works perfectly</em>. So, we knew something was wrong with our board.</p>
<p>One of our first steps was to reflow the board. We got out our trusty air rework station, held the part in place, and carefully heated the pads. The solder reflowed under the pads, further securing the sensor to the board we designed. We tested continuity—making sure that the sensor is actually connected, electrically, to the controller—and everything tested out.</p>
<p>Once we did this, we plugged in our I2CDriver. This allowed us to send individual I2C messages to the sensor. Based on our reading of the datasheet, we should probably send a reset when the sensor first powers up. So, we did.</p>
<pre>i2ccl /dev/cu.usbserial w 0x76 0x1E</pre>
<p>This command uses the i2ccl program (which James Bowman provides with his I2CDriver) to write the message 0x1E (which is hex, and represents the pattern 0001110 in binary) to the I2C device that has address 0x76 (which our sensor does).</p>
<p><strong>IT WORKED!</strong></p>
<p>This was exciting. We had a sensor that worked! It hadn’t worked before! Clearly, it was our soldering!</p>
<p>So, we sent another command. This command told the sensor to measure the temperature using it’s build-in analog to digital converter (ADC).</p>
<pre>i2ccl /dev/cu.usbserial w 0x76 0x48</pre>
<p><strong>IT WORKED!</strong></p>
<p>This was awesome. We’ve been trying to get these sensors to work for weeks. We were feeling awesome. So, we then told the sensor to set its read pointer to memory location 0x00, which is where the result of the temperature read is stored on the sensor. We had to do this in order to read the temperature value out.</p>
<pre>i2ccl /dev/cu.usbserial w 0x76 0x00</pre>
<p>This caused i2ccl to crash. (I need to figure out why, and submit a bug report.)</p>
<p>We tried this multiple times, and every time, it caused it to crash. The same sequence worked fine on the board from Sparkfun, but it did not work on our board.</p>
<p>After roughly 1.5 hours of investigation, Maddie and I had to move on. Later that day, however, I decided to stare at the PCB layout again. Our boards have two GND pins, and two 3.3V connections; this is to simplify (arguably) our layout on individual layers of the sensor.</p>
<p>We had connected GND to GND, but we had not connected 3.3V to 3.3V. Why? Because we knew they were connected on the controller board.</p>
<p>But, we were testing without the controller board.</p>
<p>So, as a result, our sensor was not getting power.</p>
<p>However, when we plugged in the I2CDriver, our sensor was getting <em>just enough power</em> from the I2C lines themselves that it was able to turn on and respond to some low-power commands (like reset). However, asking it to take a measurement put it into a mode where it required roughly 1.5mA. This is a <em>very small</em> amount of current… but, it was more than could be drawn “parasitically” from the I2C pins. And, therefore, it was probably enough to kill the processor that is inside of the sensor. As a result, our sensor “crashed” whenever we asked it to take a reading, or worse, it “crashed part way,” and which would put it in a state of sending garbage back to the I2CDriver after we attempted a read.</p>
<p>To be honest, I don’t know whether our sensor crashed “all the way” when we asked it to do a temperature reading without enough current avaialable, or if it ended up in some half-way, indeterminate state… my guess is, based on how it failed, that we managed to only <em>partially crash</em> the sensor. I suppose we might say that the sensor was only <em>mostly dead</em>.</p>
<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/SamgviMdxes" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</center>
<p> </p>
<p>Either way, figuring out what went wrong took a fair bit of debugging to find the error in our board. In the sensor stack, it wouldn’t have been a problem. On the breadboard, while testing, it was a huge problem.</p>
<p>The fix? Connect 3.3V to 3.3V on the layers of the sensor, and all will be fine in the stack and in testing.</p>
<hr />
<p>The takeaways of this story:</p>
<ol>
<li>
<p><strong>Debugging hardware is really hard</strong>. You have limited tools to debug what is going on, and when it comes to digital communications, you need a logic analyzer of some sort. We have both a <a href="https://www.saleae.com/">Saleae Logic 8 Pro</a> in the lab, and now the <a href="https://i2cdriver.com/">I2CDriver</a>. We have used both in our work on these sensors, and they’re both invaluable.</p>
</li>
<li>
<p><strong>Debugging hardware takes patience</strong>. You have to read documentation carefully, test and probe systematically, and question every single thing about your design and build. There is nothing that can be taken as a given.</p>
</li>
<li>
<p><strong>Debugging hardware is a game of constant learning</strong>. I have been learning for the past 10 years the ways that hardware can fail, the mistakes you can make in a board design, and how to use the tools I have to debug problems when they arise. I will have to keep learning, because I expect I will keep making new mistakes.</p>
</li>
</ol>
<p>I was, however, very glad to have the I2CDriver. At just under $25, <strong>it is a no-brainer tool to have on the bench</strong>. I probably could have debugged this with the Saleae, but being able to use the I2CDriver to send commands one-at-a-time was what let us figure out exactly where the failure was. I’m tempted to get a second for the lab, just so I can leave one connected up to a test/dev machine at all times.</p>
<p>Our next step is to get our battery charging circuit working. I have no idea what will be wrong with it, or how we will debug it. But, no doubt, something will be wrong…</p>
On Evidence in Learning Programming2018-11-15T00:00:00+00:00https://jadud.com//p/onevidence<p>As we build a new digital and computational program (Digital and Computational Studies, or DCS) at Bates, I wrestle with a simple question: <i>what role should evidence play in the design and development of DCS?</i> Specifically, today, my reflection will ultimately focus on the <em>language</em> and <em>tools</em> we use to introduce students to computing. However, I will begin with a short digression regarding teaching in the classroom, because most people will be familiar with the experience of “taking in a class,” whether it is at the primary, secondary, or post-secondary level.</p>
<h3>Evidence-Based Practices In the Classroom</h3>
<p>Next fall I will be teaching a <b>CURE</b>: a <b>C</b>ourse-based <b>U</b>ndergraduate <b>R</b>esearch <b>E</b>xperience. This is a course that is structured around students engaging in research inquiry: we will attempt to answer a question the answer to which is unknown (and, ideally, of interest to people other than just us). There is <a href="https://serc.carleton.edu/curenet/pedagogy.html">evidence in the literature</a> regarding the value of research experiences for undergraduates, which is why these kinds of experiences are being pushed into the student’s curricular experiences. There are also <a href="https://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_177288.pdf">questions we still do not know the answer to</a> regarding the efficacy of CUREs as an instructional vehicle. (That makes sense; there is a great deal we do <em>not</em> know about teaching and learning, so to say there are unknowns or things yet to learn anywhere in the space of teaching and learning is, in truth, not meant to be a stone thrown.)</p>
<p>There are, however, many things I need to keep track of in the classroom if I value evidence in my practice as an educator. For example, my classroom practice—how I interact with my students—is a critical space for me to focus. Just a few examples:</p>
<ul>
<li>When I ask questions to the class, I should make sure I count (<em>one Mississippi, two Mississippi...</em> to roughly 10 seconds), and give students time to think.</li>
<li>Or, perhaps I have my students think, then pair up and discuss, and then share out. This lets them explore ideas in a small group before hazarding the (to some, intimidating) sharing of ideas in a large group.</li>
<li>I should randomize my selection of students using an external aid—perhaps a deck of cards with their names on the cards—so that I don't make a habit of calling on only women in the class, or men, and so on.
</li>
</ul>
<p>(As an aside, regarding the last bullet… I had a colleague who learned she only called on students on the left-hand side of the class… and she only learned that after years of teaching because she allowed her classroom to be videoed. It was an ingrained habit that was invisible to her, and clearly left the right-hand side of the room out of every conversation she facilitated in her class.)</p>
<p>This list of evidence-based practices is actually small; there’s 20-30 I could keep track of, and at this point in my career, I use many of them on a regular basis. There are still practices I don’t use on a reflexive basis, and that’s something to continue working on. (And, I don’t even track their use in a way that I could consider evidence if I was going to write up a report on the work that I do in the classroom.) In other words, I am aware of places in my practice where my engagement with students can still be improved based on evidence-based practices, and the amount of work I would need to do in order to communicate that evidence to others.</p>
<h3>Evidence-Based Practices in Teaching Programming</h3>
<p>All of this, however, leads up to a space that has been invariably difficult in every program and department I have ever taught in: <b>the choice of the programming language we use to teach novice programmers</b>. In truth, it is more complex than “just the language.” We need to consider:</p>
<ul>
<li>the tools we use to program in that language</li>
<li>the computing environment that exists around those tools (be it UNIX, Windows, Mac, or the WWW)</li>
<li>the text(s) that support the learning of those tools.</li>
<li>the resources available in the community-at-large (video, weblogs, etc.) from learners and practitioners to support ongoing exploration and multiple perspectives</li>
<li>support for transitions to/from the language and tools</li>
<li>support from colleagues across campus in the foundational choices being made outside of their department(s)</li>
</ul>
<p>The list can get very long. My point here that these are complex tools, that involve complex ideas at every level, and that complexity is a cross product of tools, languages, environment, support resources, and the socio-cultural context of the institution, meaning the complexity is (in no small part) a result of the <em>system</em> of considerations that need to be made, and not just any one dimension. It is often the case that our literature regarding novice programmers fails to peel apart this complexity, or worse, fails to engage in good scholarship, and instead appeals authority as a rationale for our actions when it comes to teaching novices.</p>
<h4>On Authority and Evidence</h4>
<p>When it comes to discussing these languages and tools, it is commonplace for computer scientists (and any practitioners who work in computational spaces) to appeal to <em>authority</em> when making decisions about how (and what, and why) to teach programming. That authority <em>might</em> be in the literature, but more often than not, it is personal authority (years of experience), or a limited set of experiences with a particular environment or book (but no evidentiary inquiry), or (perhaps the most dangerous) an appeal to the current marketplace: what is “popular” right now with employers in the post-graduate marketplace, as opposed to what tools are best for introducing students to the learning of computing and programming.</p>
<p>My colleague Mark Guzdial, recently moved to Michigan State from Georgia Tech, <a href="https://cacm.acm.org/blogs/blog-cacm/229965-moving-computing-education-past-argument-from-authority-stuart-reges-and-women-who-code/fulltext">wrote a piece for the Communications of the ACM</a> that began to explore the idea of authority and evidence in the teaching of programming. His article was essentially exploring two themes.</p>
<ol>
<li>One theme of Mark's article was to rebut the myopic and sexist perspectives in an article that was making the rounds at the time. It is important that Mark engaged in this rebuttal, but I don't want to give more oxygen to the small-minded and rediculous belief that women—simply because they are women—cannot excel in computing. There has never been any evidence of this, nor will there be. This is an important theme unto itself, I agree 100% with Mark that there is nothing in the learning sciences literature that even remotely suggests any biological/physiological difference between human beings when it comes to learning programming, and it is not the thread of my argument here.</li>
<li>The second theme of Mark's article was the preoccupation of computing, as a discipline, to <em>appeal to authority</em>. I want to explore this further.</li>
</ol>
<p>This came up just yesterday (November 14th, 2018) on a disciplinary mailing list; in particular, the question was asked:</p>
<blockquote>
I’m teaching an intro to programming class this coming spring for students with zero background in coding. I plan to use Python to ease them into the basic programming concepts (not sure about the IDE yet), and then transition to Visual Basic to give them access to a nice GUI builder and also the ability to use some of these skill for possible scripting in MS Office or other automation tasks. The second language also serves to demonstrate how much of the knowledge learned in one language can transfer to another.
<br />
...
<br />
Finally, if anyone would be willing to share their syllabus, or project ideas that were highly engaging and fun for students in a similar course I would be very appreciative. Right now I'm thinking data manipulation/analysis type tasks mostly for Python, while VB and the GUI might be nice for some small utility or db type programs perhaps - open to suggestions.
</blockquote>
<p><em>There’s so much to unpack in this question</em>. I won’t do it justice, but I’ll try and summarize the key issues.</p>
<ol>
<li><b>Language Choice</b>. What rationale does the asker have for using Python? What evidence is there to support its use in the classroom? They do go on to mention that the rest of the curriculum is taught in Java...</li>
<li><b>Tools</b>. The asker has no idea what tools they will use for teaching Python... yet, <em>tools matter a great deal</em> when learning to program. We'll come back to this.</li>
<li><b>Multiple Languages</b>. What rationale does the asker have for using two (very) different programming languages in a 15-week span of time?</li>
<li><b>Motivation</b>. The asker suggests that they want "fun" projects. What does the asker mean by "fun?" How does this relate to their goals and outcomes for the course, and (more broadly) for their department and institution?
</li>
</ol>
<p>There is more to unpack in those two paragraphs, but this is a starting point that gets to the core issues and challenges I see in using evidence-based practices in the first teaching of programming at the college level. Mark responded to the thread (referencing his previous CACM article), and reminded us of some importing points (which I paraphrase/expand on here):</p>
<ul>
<li><b>The language matters</b>. It shapes how students think about what they are doing, there <b>are</b> languages that are easier to learn than others (because they were <em>designed</em>, <em>intentionally</em> for learners), and that we can study this (and have).</li>
<li><b>The UNIX command-line is not simple</b>. It was developed by experts for experts. There are many HCI design principles that are <b>not</b> at work in the UNIX command line. It is effectively a language unto itself, and therefore should be treated as a complex learning space just like the act of programming itself.</li>
<li><b>Professional programming environments are too complex</b>. Environments like <a href="https://www.rstudio.com/">R Studio</a>, which is a popular choice (or nearly <em>the only choice</em>) for writing R scripts for data analysis was designed by and for experts. (Actually, it is unclear whether the people who developed R were expert software developers with any knowledge of usability. They may have been biologists who learned to write code.)</li>
<li><b>There are programming environments <em>designed</em> for novices</b>. There are environments like <a href="https://bluej.org/">BlueJ</a> and <a href="https://racket-lang.org/">Dr. Racket</a>, <a href="https://www.microsoft.com/en-us/makecode">MakeCode</a>, and <a href="https://scratch.mit.edu/">Scratch</a>, and <a href="http://appinventor.mit.edu/explore/">App Inventor</a> (to name a few) that are designed, top-to-bottom, with the beginner in mind. We have good research about (some/most) of these environments, and we have empirical evidence they make a difference in the learning our students engage in, the ability for our students to retain that learning, and their desire to keep on taking courses with us and continue learning more.</li>
</ul>
<p>We can dive deep into any of these dimensions, but I want to continue to pause on the original question posed on the SIGCSE mailing list: what language do I choose? In particular, I’m going to reflect briefly on the kinds of pressures we often feel as educators in an institutional context when it comes to these kinds of decisions.</p>
<h4>Language Choice: Pressures</h4>
<p>The rationales for language choice are often motivated by pressures from colleagues, students, and the marketplace. I want to consider each of these briefly.</p>
<p>The <b>marketplace</b> is fickle: every few years, something new is “hot,” and “the thing to learn.” Currently, the flavor-of-the-week might be Google’s Go, which is intended to be a concurrent answer to systems programming languages like C. Or, perhaps it isn’t a language, but instead “machine learning,” suggesting that it is important to know how to use Tensor Flow (a library for doing machine learning work), or some other tool that was just released last week that I haven’t heard about yet. Either way, the marketplace has nothing to do with the teaching of people who have never written code before; it is the space of experts who spend 40+/hours week on their task, and have the time to master complex, and sometimes rapidly changing, tools.</p>
<p>While I have a great deal of respect for my <b>students</b>, the few who have strong opinions about what language we should be using probably have had minimal experience using the tools they profess would be best. Or, they have read a blog about the most recent Thing to appear in the marketplace, and therefore they believe that is critical for us to learn. Students do <em>not</em> walk into Calculus and insist we use some new notation; they expect the Leibniz notation to be used (if they have any expectations at all), and that’s that. But they walk into courses involving programming full of ideas. That’s wonderful, but it isn’t evidence.</p>
<p><b>Colleagues</b> know the tools they know. They’re generally overworked, and rarely have interest in learning new tools. From their perspective—especially if your course is a “feeder” to their courses—it would be best if your course taught the tools they are using. It does not matter if your institution has faculty using multiple tools… any one colleague will want your students to learn the tool they use. The choice of tool that your colleague uses is rarely evidence based, but instead is what their research group used, or what they learned as an undergraduate, or what the marketplace is currently centered on within their discipline.</p>
<p>At Bates, we use STATA in Economics (and some R), R in Politics, SPSS is used in Psychology, Python and Matlab in Mathematics and Physics (and probably some C/C++), and Isadora and Max/MSP (amongst other programmatic tools for multimedia work) in Art/Music/Dance. No one is casually prepared to retool their teaching or research, but it is probably the case that most faculty would prefer that, if there is going to be an introduction to computation and programming, that it would prepare students for <em>their</em> particular flavor of computation and programming. The fact that these are radically different contexts, with radically different tools being used is generally secondary in the thinking of any one faculty member or department.</p>
<p>If it was so simple as to make an evidence-based choice, I would likely ground students’ experiences in a block-based environment in a first course, and have two courses that further introduced them to the structured approach to programming that is epitomized in <a href="https://htdp.org/">How to Design Programs</a>, which anchors the (evidence-based) Bootstrap curriculum (for middle-school learners) and a design-centric approach to software construction at the college level. However, these choices (when made in a department or on campus) tend to be political and negotiated, and it isn’t clear that the notion of research and evidence <em>necessarily</em> is enough to convince colleagues that the tools and environments they know might not be the right tools and environments for their students when <em>their students are taking their first steps on a journey that the faculty took so long ago, they’ve forgotten what it was like.</em></p>
<h4>Language Choice: Evidence</h4>
<p>Weintrop and Wilensky recently <a href="https://ccl.northwestern.edu/2017/a3_weintrop_wilensky.pdf">published a marvelous study of 4000+ students and their first learning of programming using block-based languages</a>. Their question was the following:</p>
<blockquote>
How does block-based programming compare to text-based programming in highschool introductory computer science classes with respect to learning outcomes, attitudes, and interest in the field of computer science?
</blockquote>
<p>The paper is worth a read. The essence of their results is that students gained more confidence pre/post with block-based environments, demonstrated greater learning gains on content using block-based environments, enjoyed themselves more (block-based), and were <b>substantially</b> more interested in taking further computing courses.</p>
<p>There are few other programming languages and environments that have a body of research around them that is coherent and evidentiary. <a href="https://bluej.org">BlueJ</a> has scholarship around its objects-first approach, including the very coherent <a href="https://dl.acm.org/citation.cfm?id=1513597">STREAM process</a> that Michael Caspersen and Michael Kolling have published (which effectively represents a culmination—though not a stoppoing point—of this line of work). A great deal of research undergirds the development of the Racket programming language, its associated (free) text <a href="https://htdp.org/">How To Design Programs</a>, and the tower of languages that are provided to support learners (from the Beginner language, to the Intermediate language, and so on)—each of which was designed, based on evidence from use, to support learners from the syntax and structure through to the kinds of errors they can experience. Kathi Fisler’s work around the Rainfall problem (<a href="https://web.cs.wpi.edu/~kfisler/Pubs/icer14-rainfall/icer14.pdf">The Recurring Rainfall Problem</a>, <a href="https://cs.brown.edu/~kfisler/Pubs/icer17-rainfall.pdf">Sometimes Rainfall Accumlulates</a>) are studies that capture the current state of inquiry around this ecosystem of language and environment that has seen continuous use, development, and study for over 20 years. (Arguably, because Racket is a close design descendant of Scheme, we have been studying these tools and their use with students since the late 1960’s.)</p>
<h3>In Closing: What To Do?</h3>
<p>Sally Fincher et al. looked at how we, as educators, change our practice. In their paper <a href="https://carpentries.github.io/instructor-training/files/papers/fincher-stories-change-2012.pdf">Stories of Change: How Educators Change their Practice</a>, they asked 99 educators (mostly computer science educators or closely related) to address the following question:</p>
<blockquote>
Can you think of a time when something—an event, an article, a conversation, a reflection, an idea, a meeting, a plan—caused you to make a change in your teaching? What was it? What happened?
</blockquote>
<p>The work led them to the following result:</p>
<blockquote>
Of the 99 change stories analyzed, only three demonstrate an active search for new practices or materials on the part of teachers, and published materials were consulted in just eight of the stories. Most of the changes occurred locally, without input from outside sources, or involved only personal interaction with other educators.
</blockquote>
<p>Bringing this all the way back from the global to the local, I would claim Fincher’s article should give us pause as we develop a new computational program at Bates. The article raises difficult questions regarding the role of evidence in the design and development of courses, our choices of tools and languages in teaching computing, and how we engage across disciplinary boundaries as we engage in the design and development of a new computational program at Bates.</p>
<p>Perhaps, through intentional design, and a willingness to commit to new learning on the part of ourselves and our colleagues (an expensive proposition in time), we might decide that evidence matters. However, we might also decide that the evidence is not “good enough,” in which case we will help ourselves feel comfortable doing what we “know best,” because we decide the evidence is not of sufficient quality or rigor. In other words, it is easy for all of us to make the comfortable choice of privileging our own knowledge and expertise, making a kind of “internal appeal to authority” when faced with change or the unknown.</p>
<p>I believe the most dangerous reason to make choices is because <em>because we are in a hurry</em>. If we rush, we are unlikely to actually explore and discuss evidence-based practices in computing, and will instead “just teach Python and R,” because it is a safe set of choices in the current climate, both campus and in the marketplace. (These languages are, after all, the languages of machine learning and data science!) But neither of these tools have a rich base of evidentiary research in the novice programming context, and both lack infrastructure to scaffold the learner well. We <em>could</em> build that infrastructure, and develop the associated research… but that, itself, is a monumental undertaking.</p>
<p>In short, as a computer scientist and computing education researcher who cares deeply about understanding the what, the why, and the how of my teaching… I’m uncertain what is the best course of action when it comes to engaging in what feels very much like a campus-wide (or certainly multiple-department) dialogue around the teaching and learning of programming. How (and, even, if) to advance the state of evidence, and weather the attendant questioning and attacks, is hard.</p>
<p>The question is, in short, <b>should evidence play a role in language and tool choice as we design a new digital and computational program at Bates?</b> I feel like I know how <em>I</em> would want to engage with that question, but that is different than what the <em>department</em> or even <em>community</em> might want to engage its time and energies.</p>
Motivation for a New Embedded Formfactor2018-06-05T00:00:00+00:00https://jadud.com//p/motivation<p>I want to revisit a fundamental design premise explicitly for the purpose of soliciting feedback. If you’ve got some background in this space, please drop me a note (mjadud at bates dot edu or @jadudm on Twitter). My students and I would welcome your input</p>
<hr />
<p>I want a low-cost environmental sensing solution. My design criteria/constraints:</p>
<ul>
<li><strong>BASE COST</strong>. It should be possible for me to put a bare-bones sensor together (eg. a controller and enclosure) for less than $10.</li>
<li><strong>ENCLOSURE</strong>. The enclosure should be submersible, and able to be built easily with COTS components that can be sourced at a typical US hardware store.</li>
<li><strong>MODULAR</strong>. It should be possible for me to build a sensor by adding what I need, and nothing more. For example, if I need WiFi, I should be able to add it; it should not be present “by default” (thus adding cost, complexity). This goes for all aspects of the design: sensing, power, timekeeping, storage, and so on.</li>
<li><strong>ABSTRACTED</strong>. Modular layers should be abstracted in hardware and software for ease of use. For example, every hardware component in the sensor stack should implement a common “goToSleep()” command; it should not be different on a per-layer basis, nor should it be something that the programmer needs to look up for different kinds of components. Common interfaces should be common, even if they turn out to be a “no-op” for some classes of hardware.</li>
<li><strong>BATTERY-FIRST</strong>. Everything should be designed with the thought that we will be powering our board with a budget of 1000-2000mAh of power, and that we want to last at least 6mo to 1yr on that budget for most applications. At the least, we want to last a summer season.</li>
<li><strong>OPEN</strong>. The hardware and software should be free and open.</li>
</ul>
<p>The current thinking is that we will design against our enclosure choice, which is 2” PVC. It is cheap, commonly available everywhere (it is in most every home and building in the USA), and it is easy to work with (glue, drill, cut, etc.). Smaller-diameter tubing makes design hard, and larger gets expensive. An enclosure like the mockup below costs around $4.</p>
<p><a href="https://photos.google.com/share/AF1QipOye6UXBSKhEt8hYvMuyWIQmWBxfl5CIJWFtMNsoIqHpU8kxBqXxvg1_eQgd4Wa9g?key=X1FJTzM3U1lPMHYwVS0tS2g5bnFQQU5VSGd0RE5n"><img src="https://lh3.googleusercontent.com/Kg40G6fgSONbiGiVKGjJ59dU_0imJdscVtNcElnRA71EI5gBPqy87p21UHcrWAMWXbYl38Xhho3Gy4Wi-Qhs0HxOTkph3GZnyLxvMBrXSLeCMLrRxDbLHB_nAITnCXrRgXpDfUVakfQ=w2400" /></a></p>
<p>This design choice restricts our selection of electronics hardware.</p>
<ul>
<li><strong>Adafruit Feathers</strong>. The feather is a 2” board, which will not fit in a 2” diameter cylinder. While it could be rotated 90°, this limits the height of the stack. I also worry about power consumption/deep-sleep on Feather boards in the general case.</li>
<li><strong>Sparkfun Qwiic</strong>. Sparkfun’s design only specifies the cable parameters, and nothing about the physical design of components. Mounting the free-form modules in an enclosure becomes difficult. I<sup>2</sup>C as a standard is promising, but boards (generally speaking) lack the modularity and power management features that we require.</li>
<li><strong>Seeed Grove</strong>. Nothing about Grove is designed with these parameters in mind; they are all modules of different sizes, and the interconnects vary. We do not win on power management and modularity.</li>
<li><strong>Pi Hats</strong>. There’s nothing in the Raspberry Pi space that is appropriate for this kind of work, given the power consumption on the Pi.</li>
</ul>
<p>The physical board is currently shaping up to be a 1.8” wide octagon (allowing space for cabling around the board in a 2” diameter cylinder), and two headers spaced 1.4” apart. Header 1 gives us VREG, GND, “wake”, SCL, SDA, and VBATT. Header 2 gives us VREG, GND, SS, MOSI, MISO, and SCK. We get both unregulated and regulated power on each layer (VREG and VBATT), I<sup>2</sup>C, and SPI. The “wake” pin will be pulled low (or high, TBA), and toggling the line will signal to the entire stack that it should begin wakeup. (This behavior is tentative at the moment.)</p>
<p><a href="https://photos.google.com/share/AF1QipPb1Bu36fEzdRkeKORYlA-lGl0bSLqjYBYpuaIAK9Zevi50UJ4oXMJwoG8rLoVBiA?key=Q3BpM3U0QUZTaEN3UV9KV2swVFBhckhWUHVSRld3"><img src="https://lh3.googleusercontent.com/ZkmtvR1K5cgVFt47M90_gY_MZeEmldEtsJvRmQbL9qLupsUCzCHDP4B09bdIyt4eC1Us3ZQD_A82AMN76zWiof3yS50aSsuAX3488o0S2n4ZnGsLOa-5UBq7inxXqsqpVoH4BmlbKmc=w2400" /></a></p>
<p>(As I look at the board, I worry about plugging this in backwards. I might want to offset one of the headers by a single 0.1” step, so that it is impossible to plug them into each-other “wrong,” but they remain breadboard compatible.)</p>
<p>My concern is that I’ve failed to ask myself enough questions, and that I’m reinventing wheels that don’t need to be reinvented. That said, I don’t believe there’s a platform that I can pull “off the shelf” and use robustly/reliably, semester-after-semester and year-after-year, with my students on a wide variety of (as-of-yet unspecified) environmental sensing projects.</p>
<h2 id="next-steps">Next Steps</h2>
<p>I have three students working with me this summer on this project; we’ll be blogging more about it as we proceed, and we’ll be building prototypes with our <a href="https://www.voltera.io/">Voltera V-One</a>. Hopefully, we’re on a sane path…</p>
Thoughts on Sensor Design2018-05-02T00:00:00+00:00https://jadud.com//p/design<p>I’m launching a summer of environmental sensor work, and I want to make sure I’m not reinventing a wheel. Given that I don’t have any close collaborators at this point who are deep into sensor design and development, I’m asking the cloud for feedback and critique.</p>
<!--description-->
<h2 id="the-things-were-sensing-temperature-depth">The Things We’re Sensing: Temperature, Depth</h2>
<p><strong>The fundamental sensing questions involve water: temperature, salinity, and depth</strong>.</p>
<p>There’s multiple locations that we might want to investigate, and we might have different sensing desires at different locations; for example, in Casco Bay, we might not want salinity, but we absolutely need depth and temperature, while in the salt marshes of the Morse Mountain Conservation Area, we <em>need</em> salinity, because we’re tracking the influx of tides.</p>
<h2 id="our-budget-target-20-40">Our Budget Target: $20-40</h2>
<p>We’re also sensitive to cost: we’d like to have more, rather than fewer, sensors. This means that $50 enclosures are not OK. A $5 enclosure, if possible, would be lovely. Commercial-off-the-shelf salinity and temperature sensors can run anywhere from $1000 - $2500, but they’re not reusable… so, strictly speaking, our budget targets are rediculously high. But, if we want to have broad sensor coverage, or engage in community-engaged environmental sensing projects, we need to get down into the tens-not-hundreds-of-dollars range for our platform.</p>
<h2 id="our-scale-10s-of-sensors">Our Scale: 10s of Sensors</h2>
<p>This suggests, ultimately, the scale. We’d like to be working in the many 10s of sensor range. To scale to hundreds of sensors, we’ll have to think about both the design of our electronics being amenable to automated fabrication, as well as having enclosures produced rather than to hand-made.</p>
<h2 id="sensor-life-6-months">Sensor Life: 6 months</h2>
<p>Our sensors should last at least 6 months without intervention. This suggests we’re deploying after thaw and before freeze.</p>
<p>It would be cute if they could be recharged without being opened. It isn’t clear this is critical at this time. If it is, we’ll consider sticking a Qii charging receiver inside of the sensor. That, however, suggests a battery recharging wafer as well…</p>
<p>Once retrieved, we don’t want them to be thrown away; so, it should be possible to recharge them. Whether this means it is disassembled and the batteries are charged manually, or there are external charging points, or wireless charging… these are all possibilities. Ultimately, this will be determined in part by scale: for our initial testing, we will likely have 1-5 sensors. For scaling, we’ll need to think about having a way to recharge the sensors without opening them up. (This suggests other design constraints/considerations as well…)</p>
<h2 id="data-small-100s-of-kb-local-wifi">Data: Small (100s of KB), Local, WiFi</h2>
<p>In our first target, we’d rather like real-time-ish data. However, the “ish” part means that our sensor will be underwater, and only come up once every two days. Therefore, we need local storage, and we need to squirt over the radio when we’re brought to the surface. In an ideal world, we detect an extended shake (as we’re pulled up from the floor of the bay), and use that as our “wake event” to indicate that we should begin firing up the radio and looking for a base station.</p>
<p>We think we can use a COTS WiFi to cellular bridge module on the boat. Once the sensor is up, it will be above-surface for ample time (tens of minutes) to find the base station and send its current datastore.</p>
<p>We expect to be recording temperature and pressure on a roughly 10-minute cycle. This is a small amount of data; we can leverage either flash/FRAM technologies for storage, or we can use something bigger (uSD), which would allow us to have a filesystem interface (but with a much larger power consumption when we wake up the uSD card to write our data). However, it would be easier to recover the data at the end of a season if we discover that we had radio issues at any point (or multiple points) during the season if it is stored on a removable medium.</p>
<h1 id="sensor-design">Sensor Design</h1>
<p>We are intentionally designing our sensor electronics against our enclosure design. That is, the enclosure choices are driving design choices in the electronics.</p>
<h2 id="enclosure-design-2-pvc">Enclosure Design: 2” PVC</h2>
<p><a href="https://photos.google.com/share/AF1QipOye6UXBSKhEt8hYvMuyWIQmWBxfl5CIJWFtMNsoIqHpU8kxBqXxvg1_eQgd4Wa9g?key=X1FJTzM3U1lPMHYwVS0tS2g5bnFQQU5VSGd0RE5n"><img src="https://lh3.googleusercontent.com/Kg40G6fgSONbiGiVKGjJ59dU_0imJdscVtNcElnRA71EI5gBPqy87p21UHcrWAMWXbYl38Xhho3Gy4Wi-Qhs0HxOTkph3GZnyLxvMBrXSLeCMLrRxDbLHB_nAITnCXrRgXpDfUVakfQ=w2400" /></a></p>
<p>We are envisioning our enclosure design to be based on 2” PVC. It is easily obtained, easy to assemble, easy to machine/manipulate, and easy to make waterproof. In a perfect world, we’d design against 1.5” or 1” PVC, but for our first show, we’ll start with something “large enough” to be sloppy in our electronic design, but still have sensors “small enough” to be manageable.</p>
<h2 id="electronic-design-custom-328p-based">Electronic Design: Custom 328P-based</h2>
<p>We would like to use COTS components. For example, it would be nice to use Adafruit Feathers, or Sparkfun Quiic components. However, none of these boards are designed for extreme low power consumption; they fundamentally assume a hobbyist who is exploring programming embedded systems. <em>This is not a criticism of these products</em>, but it is not clear that (unmodified) we can say “purchase the 328P Feather and stack it up with…” as a sensor solution. I have to look closely at the Feathers, and do some testing to see “how low they can go” before I claim we cannot use them, but my concern is there is more going on on the board</p>
<p>(Part of “verbalizing” this process is so someone might say “but, Matt, you’re wrong…”)</p>
<p><a href="https://photos.google.com/share/AF1QipNWfK9lrCksCYi4OQ-HeRBpM1tqh8bT7xT3nqOiDFoWHvjuB4a5UefhVaK7oHIr9Q?key=OGdzVWc2bDl3b0pVdEV0Qmx5UDVFb21sYUN4d3BB"><img src="https://lh3.googleusercontent.com/laZwwFWdx0dEj4a44p3oevdojFgOPajK46YOEqvvdWrlkbpD0wjYDMeQ7qEvoeu6VT6BbELc3RA51rbYkfMkBOMY9v-IQN3MYbsMs7Pfln1TX-g1ITNt7OGX83G3XiPz52krpIOeHNQ=w2400" /></a></p>
<p>The idea is to have a stackable set of boards that are just under the 2” inside diameter of PVC. (This drawing is based on the OD, which is something like 2.375”, but it’s close.) I’m proposing the following:</p>
<h3 id="a-main-board">A Main Board</h3>
<ul>
<li><strong>CPU</strong>. The main board has a 328P. The 328P will run at 3.3V/8MHz, reducing part count and power consumption.</li>
<li><strong>Bus Header</strong>. A 2x5 pin header will provide stackability, and through that header we will run VBATT, VREG, GND, I2C, and TWI/SPI.</li>
<li><strong>ISP</strong>. A six-pad (pogo-compatible) connection for the AVR ISP.</li>
<li><strong>USB-Serial</strong>. A six-pin (90˚ male) header for USB to serial.</li>
<li><strong>Status</strong>. A status LED.</li>
</ul>
<p>This would be the “command” layer in a sensor stack. It has no functionality other than to have a CPU. For configurability, every other wafer in the stack has a single function as well, and stacks via the bus header. The rationale is that every single board will either have 1) an I2C-based device, 2) a SPI-based device, or 3) another 328P configured as an I2C listener device. (I’m going to reject the language of “master” and “slave”, and instead use “speaker” and “listener.”)</p>
<p><strong>Note</strong>: The header might rather be breadboard compatible, instead of a 2x5? For example, a 1x5 row on the “top” and “bottom” of the board? Each side would want… VBATT, VREG, GND… and then control lines. On one side, SPI, on the other, I2C? Or, duplicate everything on both sides?</p>
<h3 id="a-clock-board">A Clock Board</h3>
<ul>
<li><strong>Clock</strong>. A DS3231M. This has a lower part count than the non-M variant; specifically, it does not require a crystal.</li>
<li><strong>Backup Battery</strong>. A CR2032 Battery.</li>
</ul>
<p>Because the 3231 is already an I2C device, we know that, when plugged in, we can tell it to go into its deep sleep mode from the main board.</p>
<p>It can run off the VREG line in the bus header.</p>
<p><strong>Note</strong>: This raises a question… should every board be populated so that it <em>might</em> have a regulator? Should that be something that is a “given” in the board design for a wafer, but we don’t choose until we actually populate a board and are laying out a sensing problem? For example, you might discover you <em>must</em> but the clock on VBATT with a regulator, because you simply have too much going on in your sensor stack to run everything off a single regulator. Or, perhaps (if there are two stacks, for breadboard compat.) we put two regulators on the distribution board, and the user can choose, with a jumper, whether one or both are active. That way, power from one side and power from another… No. There’s no easy way to choose which VREG you get your power from. Each wafter will be designed and implemented, and we don’t want to have jumpers everywhere…</p>
<p><strong>Note</strong>: How does the DS3231 wake the controller? We might have to pass a “clock wakeup” GPIO line through the header?</p>
<h3 id="an-analog-sensor-board">An Analog Sensor Board</h3>
<ul>
<li><strong>Sensor Connectors</strong>. Three-Pin Connector(s)</li>
<li><strong>Listener Config</strong>. Address jumpers</li>
<li><strong>Listener</strong>. ATMega 328P</li>
<li><strong>ISP</strong>. ISP pads</li>
<li><strong>Power</strong>. Voltage regulator</li>
</ul>
<p>Because we want to be able to turn off any analog sensors connected to the stack, we drop a 328 onto this layer. It is configured to listen to the command layer, so that a sensor developer simply plugs in this layer, can issue a “AnalogSensors.wakeUp(),” do a reading via “AnalogSensors.read(1),” and then tell the sensors to go back to sleep (“AnalogSensors.goToSleep()”). The 328P can source up to 40mA of current per pin, which for our sensors is more than enough (we’re in the 10-20mA range per sensor), so we should be able to source the current for the sensor directly from the processor.</p>
<p>Although code must be developed for this layer, it is primarily implementing an I2C “API” that the command layer will use. Once stabilized, we should never have to modify the analog sensor layer’s firmware, and instead can safely flash it to the board and leave it for the rest of time. If we need configurability (eg. wakeup time on a sensor, etc.), then we can expand, over time, the complexity of the firmware so we have defaults as well as reconfigurability built into the API.</p>
<p>Any sensing application that does not need analog sensors can, then, omit this layer of the stack.</p>
<p>This layer has a voltage regulator that can be optionally chosen via (solder) jumper. This may need to be a common option on many boards; a given stack may be able to run off a single regulator, or we many need multiple regulators in the stack, because peak current for the entire stack might exceeed what a single regulator can provide. (I’m thinking about possible high-current radio situations, for example.)</p>
<h3 id="a-usd-storage-board">A uSD Storage Board</h3>
<ul>
<li><strong>Listener</strong>. ATMega 328P</li>
<li><strong>Listener Config</strong>. Address jumpers</li>
<li><strong>Storage</strong>. uSD slot</li>
<li><strong>Power</strong>. Voltage Regulator</li>
</ul>
<p>The SparkFun OpenLog is a nice device, but pricy. Also, it stores data over serial; we need an I2C-based device to fit our stack. Ideally, we modify the OpenLog firmware, so that we can have a storage layer where we can wake up, store some data, and put the whole layer to sleep. Because the 328P is power miserly in deep sleep, we can use it (again) as an “API interface” for a common storage protocol for all storage layers.</p>
<p>By placing a 328P on this layer, we eliminate a great deal of code from the command layer. The command layer says “Storage.storeNext([array…])” or similar, and the driver handles squirting everything over I2C to the storage layer. If we develop the API reasonably, then the command layer can be ignorant of whether we are storing to a uSD or to a flash chip or FRAM. In short, we should be able to store sequential sensor readings easily, without worrying the developer about the particular medium they are storing to.</p>
<p>These abstractions will, ultimately, be limiting. But, they will be <em>flexible</em> abstractions. We can always improve the API running on the storage layers. For example, should a developer be able to issue a single command to “Storage.storeNextWithTimestamp([array…])”, or should the programmer be responsible for first getting a timestamp, combining it into a structure for storage, and then sending that to the storage layer? While not yet designed/decided, it is nice to know that these abstractions <em>can be built</em>, and the ultimate goal (of being able to quickly, reliably, program low-power environmental sensors) can be achieved.</p>
<p><strong>Note</strong>: The voltage regulator should be one that has an ENABLE line. This way, the 328P can be used to enable/disable power to the uSD card.</p>
<h2 id="a-radio-board">A Radio Board</h2>
<ul>
<li><strong>Listener</strong>. ATMega 328P</li>
<li><strong>Listener Config</strong>. Address jumpers</li>
<li><strong>Radio</strong>. An ESP8266.</li>
<li><strong>Power</strong>. Voltage Regulator</li>
</ul>
<p>We can go one of two ways on this board: it can have a listener and an ESP8266, or it can have just an ESP8266. The ESP8266 will drop into a reasonably low-power mode, but it is not as miserly as the 328P. As a result, we may want the intermediary, where the “wakeUp()” command will first wake the 328P, and it will then power up the ESP826.</p>
<p>In terms of software design, we would implement the same API (perhaps) on the 328P as a storage device. That way, storing data and sending data look exactly the same. The “smarts” for handling retry/etc. live on the local listener. The 328P, therefore, looks like every other listener: an I2C protocol implementation, and communication with the ESP8266 can be carried out over serial. This then looks like any Arduino sketch that uses an ESP8266 board, and allows us to use a stock firmware on the radio, as opposed to writing a custom controller in Lua or Python on the 8266. (If we later want to redesign this board, we can… but, it might be easier to start this way.)</p>
<p>If we must, we can use an ESP8266-01 module (with the six-pin header), or we can use a 12-E/F (and surface mount it). In other words, we could design this layer incrementally… one where we can prototype with a component we can plug in, and then evolve the design to one that we solder directly onto the board. There may be some question of using an external antenna… which would need to be inside the enclosure, but it could be done regardless.</p>
<p>I am worried about WiFi through the PVC in the field, but again… that’s what testing is for. This could become a LoRa radio, using a Nordic part. However, by using WiFi, we can have a base station that provides WiFi to cellular bridging as a COTS purchase. So… we’re really in a position where we want WiFi to “just work.”</p>
<p><strong>Note</strong>: When we want to send data, how do we do it? Does the radio retreive everything since last send? Does the storage layer have “get everything since last transmission” API call? Or, does the controller have to shuffle everything from the storage, to the radio, and keep track of these things? In short, how smart is the stack? It would be nice to be able to say “Radio.transmitNewData()”, and it would handle talking to the storage layer (we would hand it an I2C address at setup), get the data, and send it. This suggests the storage layer has “Storage.getNewData()”, and it knows which datapoints are considered “new,” because it maintains an internal pointer that is updated everytime this is invoked. (We should also be able to “getNewDataPointer()” and similar.) Ideally, though, we can either squirt data directly at the radio from the controller, or the controller can hand off the issues of getting all of the data and sending it, so that the control code looks simple, and the complexity is implemented (once, and correctly) in the interface to all Radio and Storage layers.</p>
<h3 id="other-boards">Other Boards</h3>
<p>Using the above board as examples, we also imagine boards that might have flash or FRAM for storage, boards where we connect I2C-based sensors, a power distribution board (where we have a voltage regulator and battery connections), and… so far, that might be it.</p>
<h2 id="benefits">Benefits</h2>
<p>For this project, it allows students to cut their teeth on circuit design one wafer at a time. Instead of trying to design a single board with everything on it, we design multiple boards that each do just one thing. This makes gives students the opportunity to design or revise wafers in (conceptual) isolation, as well as provide an ongoing source of projects. If we decide to scale down to 1.5” or 1” PVC, then we have a whole redesign process… but, can do it piecewise.</p>
<p>The stackable/wafer approach is also nice from a programming perspective; any board that has a local listener will be written to receive I2C messages, and do things in response. This makes the state machines for each wafer simpler to write: sleep, wake, process message, go back to sleep.</p>
<p>The individual wafers have no intelligence in the context of the overall sensor, which means we can develop and test each wafer’s API, have confidence in that wafer, and then integrate it into the sandwich stack. “Unit testing” of individual wafers is a huge win for this kind of application.</p>
<p>The controller is also interacting over a common protocol, and we can create small OO wrappers, so that we have a common API across all objects. Every board, for example, should have a “goToSleep()” method. We then develop the driver code so that where there are differences, we simply <em>don’t care</em>. That is, we call “Clock.goToSleep()” to put the DS3231 to sleep, even if that is actually a different set of I2C commands than if we put a storage layer to sleep; in either case, we invoke the “goToSleep()” method.</p>
<p>It is also designed for the enclosure. This is a problem with many COTS boards: they are designed as hobbyist boards that are breadboardable, but they are not designed for low power/extended battery usage, and they are not designed for a particular enclosure system. Here, we settle on PVC, and design against the constraints of a cylindrical enclosure.</p>
<p>We can reuse code from similar projects (Rocketscream’s low-power library, the OpenLog firmware may serve as a starting point for our storage layer), and we stay in the Arduino ecosystem with the lowest-power chip in that ecosystem.</p>
<h2 id="related-systems">Related Systems</h2>
<p>It would be nice to be able to buy Adafruit Feathers, design against the Feather “standard,” and be done. However, Feathers are not designed for low power usage. So, we could pick the standard, but we would be in a position of designing all new boards.</p>
<p>That said… Feathers are open source. Therefore, we could take (say) the 328P board, rip off anything we don’t want, and then use it.</p>
<p>But, we would need to 3D print a harness that held the Feathers at an angle to fit in the board, or run them along the central axis… and, then, we would have limited-or-no stackability. This eliminates the benefits of the Feather as a formfactor.</p>
<p>Quiic (from Sparkfun) has no way to control power on each board. There’s no form-factor standard. There’s no clear way to mount them in a given enclosure. If we use PVC, we will end up with a mess of boards wired to each-other, which feels… messy. It lacks support for SPI, which may (for some sensors) be critical to our applications. There is no provision for GPIO signaling if we absolutely need it.</p>
<p>Grove has similar drawbacks to Quiic, but lacks the commitment to a single protocol.</p>
<h2 id="drawbacks">Drawbacks</h2>
<p>We have to design everything.</p>
<p>We have to develop the API, and write all of the code.</p>
<p>We aren’t starting with a CircuitPython-compatible CPU. It would be really nice to design with the SAMD21 or SAMD51, but… with limited experience, and unknown library support, this could lead us into a space where we’re doing more embedded software engineering with students new to embedded design than we like. We can always transition over time; because we are designing against an I2C API, we can (for example) replace the controller board with a CircuitPython-ready CPU, and program it in Python, and still have the same abstractions.</p>