Markus Östberg

Escape the City Buzz: A Cozy Catskills Getaway

Sat, 05 Apr 2025 19:00:00 +0000

Feeling the relentless pace of city life? That craving for fresh air, quiet woods, and a sky full of stars – perhaps without sacrificing a comfy bed and modern conveniences – is something we know well. We recently chased that feeling with a weekend escape to Postcard Cabins Eastern Catskills, and it turned out to be the perfect antidote to the urban grind.

Day One: Trading Concrete Jungles for Woodland Whispers

Leaving the familiar chaos of New York City behind, we pointed our car north towards the Catskills. The roughly 2.5-hour drive to Catskill felt like shedding layers of city stress with every mile. Its proximity makes it an incredibly accessible retreat for a quick nature fix. Before fully settling into our woodland haven, a quick 7-minute zip to the nearby Hannaford had our groceries sorted for the weekend.

With supplies gathered, our first evening unfolded exactly as hoped: simple, peaceful, and outdoors. We fired up the grill over the cabin’s fire pit, the scent of woodsmoke replacing traffic fumes, enjoying dinner under the vast, darkening sky. It was an instant decompression.

Day Two: Cabin Calm and Cascading Falls

Day two was all about embracing the stillness we came for. We let the morning unfurl slowly within our cozy cabin – a space genuinely designed for recharging. Sunlight streamed through the large window, offering front-row seats to the local wildlife theatre. We watched birds flit between branches and squirrels scamper playfully, a simple, grounding connection to nature enjoyed over morning coffee, right from our bed.

Later, adventure beckoned. A scenic 28-minute drive brought us to the legendary Kaaterskill Falls. Setting off from Laurel House Road, we tackled the 1.7-mile out-and-back trail. It’s generally considered moderately challenging; while well-maintained steps have been added recently, expect some steep and rocky sections (good footwear is recommended!). The payoff? Absolutely breathtaking views of the 260-foot Kaaterskill Falls, New York’s highest cascading waterfall. The sheer scale and the roar of the water were invigorating.

For dinner, we ventured to Casa Susanna in nearby Leeds (part of Camptown lodging), drawn by its reputation for modern Mexican cuisine inspired by family history and the incredible bounty of the Hudson Valley. Their commitment to local farmers, ranchers, and fishermen felt perfectly aligned with the region’s natural spirit, and the meal was fantastic. Back at the cabin, the day wound down with a quintessential campfire activity: melting marshmallows for gooey, delicious s’mores by the fire pit.

Departure Day: Hudson Charm and Lingering Calm

Our final morning involved packing up – a straightforward process thanks to the cabin’s minimalist, “everything you need, nothing you don’t” approach. We weren’t quite ready to dive back into city life, so we made a detour to the delightful town of Hudson, NY.

For brunch, Le Perche on bustling Warren Street proved to be an excellent choice. We can personally vouch for the incredible Croque Madame and the perfectly zesty Spicy Chicken Sandwich. While they might be known for baked goods, their savory options are definitely worth the stop! Afterwards, a leisurely stroll down Warren Street, popping into unique boutiques and galleries, offered a final, perfect taste of Hudson Valley charm before the drive back to Brooklyn.

Overall Reflections: The Glamping Experience

Our stay at Postcard Cabins Eastern Catskills delivered a solid dose of “modern glamping.” It strikes a great balance, offering comforts like an en-suite bathroom (a huge plus!) and reliable AC/Heat, alongside outdoor essentials like a kitchenette and that lovely fire pit. It’s ideal if you want a nature immersion without fully “roughing it.” The cabins are smartly designed, cozy, and make for a comfortable base camp.

A Few Things to Keep in Mind:

While we thoroughly enjoyed our stay, here are a few practical points based on our experience and general feedback:

Community Vibe: While nestled in woodlands, the cabins are situated relatively near each other. Expect to see and possibly hear your neighbors – it’s more of a shared nature retreat than total seclusion.
Hot Water Schedule: The hot water system provides about five minutes of shower time before needing roughly 30 minutes to regenerate. Just something to plan around, especially if multiple people are staying.
Connectivity: As expected in a more remote spot, cell service can be patchy (perhaps a welcome nudge to disconnect!). However, the Wi-Fi worked reliably when we needed it.
Priced Provisions: While convenient, some items stocked in the cabin (like tea bags, coffee and firewood) come at an extra cost, operating like a hotel minibar. To give you an idea, a single tea bag was priced at $2.50, so factor that in if you plan to use these extras.

Final Thoughts

Ultimately, our weekend at Postcard Cabins was precisely the recharge we craved. It offered a seamless blend of quiet nature, comfortable lodging, and easy access to stunning local attractions like Kaaterskill Falls and the vibrant town of Hudson. It was a welcome reminder that a truly refreshing escape doesn’t have to be far-flung or complicated. We left with lungs full of fresh Catskills air, feeling reconnected and ready to face the city again – until the next escape calls!

Adding Search to GitHub Pages Hosted Sites

Sun, 16 Feb 2025 17:00:00 +0000

Adding search to a blog hosted on GitHub Pages can present some unique challenges. Because GitHub Pages doesn’t support server-side code, traditional server-side search engines are off the table. This means we need to explore alternative approaches, primarily client-side solutions or leveraging third-party search services. In this post, we’ll delve into how to implement a basic yet effective client-side search engine using the powerful combination of Jekyll and JavaScript. The implementation will involve pre-processing the blog content during the Jekyll build process to create a search index, enabling fast and efficient searching directly within the user’s browser.

Why Client-Side Search?

While third-party search solutions are available, client-side search offers some compelling advantages, especially for smaller blogs. First and foremost, it avoids any external dependencies, keeping your site architecture simple and eliminating reliance on external services. This also gives you complete control over the search functionality and how it’s implemented, allowing for customization and fine-tuning to perfectly match your blog’s needs. And, of course, a significant benefit is that it’s cost-effective, avoiding any recurring costs associated with third-party search services.

Building the Search Index with Jekyll

To minimize the processing burden on the client’s browser and ensure a responsive search experience, we’ll pre-process the blog content and generate a search index. This crucial step happens during the Jekyll build process.

The search index will take the form of a JSON array, where each element represents a blog post. Each post entry will contain the title, a collection of keywords and their frequencies within the post, the URL of the post, and a short, descriptive excerpt. This structure allows us to quickly search for terms and retrieve relevant information.

[
    {
        title: "Build your own Search Engine 101",
        keywords: {
            search: 26,
            engine: 6,
            document: 24,
            example: 3,
            containing: 3,
            word: 4,
            lookup: 2,
            key: 1,
            indexed: 1,
            tokens: 4,
            tokenizer: 4,
            frequency: 7,
            // ... more keywords
        },
        url: "/projects/2015/01/31/build-your-own-search-engine-101.html",
        excerpt: "For many of us, there is something magical with a search engine. You type in a few well selected words and out comes, hopefully, exactly what we are looking for. So how does it work? In this post I’ll go through some of the basic building blocks needed for a basic search engine."
    },
    // ... more blog posts
]

The index contains a list of all blog posts with the title, keywords, url and an excerpt of the content. The keywords are generated by tokenizing the content of the blog post and counting the frequency of each word. The search index is generated using a Jekyll template file that loops over all the posts and generates the index.

Because of the limitations of Jekyll’s Liquid templating we have to skip more advanced features like stemming. We also have to be careful with the size of the index. The index for this blog is currently about 100KB in size, which is still small enough to be performant even on slower devices like phones. To keep the index small we use a list of stop words, common words like “the”, “and”, “is”, etc, that we do not include in the index.

To build this index with Jekyll we create a new search_index.json file in the root of the project.

search_index.json

---
layout: null
---
{%- assign stop_words = site.data.stop_words.words -%}
{%- assign split_chars = site.data.split_chars.chars -%}

[
  {% for page in site.posts %}
    {
      "title": "{{ page.title | escape }}",
      "keywords": {
        {%- assign sanitized_content = page.content | append: ' ' | append: page.title | strip_html | downcase | escape %} 
        {%- for char in split_chars %}
            {%- assign sanitized_content = sanitized_content | replace: char, ' ' %}
        {%- endfor %}

        {%- assign words = sanitized_content | split: ' ' %}
        {%- assign filtered_words = words | where_exp: "word", "word.size > 1" %}

        {%- assign stop_words_filtered_words = '' %}
        {%- for word in filtered_words %}
            {%- unless stop_words contains word %}
                {%- if stop_words_filtered_words == '' %}
                    {%- assign stop_words_filtered_words = word %}
                {%- else %}
                    {%- assign stop_words_filtered_words = stop_words_filtered_words | append: ',' | append: word %}
                {%- endif %}
            {%- endunless %}
        {%- endfor %}

        {%- assign filtered_words = stop_words_filtered_words | split: ',' %}

        {%- assign grouped_words = filtered_words | group_by_exp: "word", "word" %}
        {%- for group in grouped_words %}
            "{{ group.name }}": {{ group.items | size }}{% unless forloop.last %},{% endunless %}
        {%- endfor %}
      },
      "url": "{{ page.url | relative_url }}",
      "excerpt": "{{ page.excerpt | strip_html | escape | strip_newlines }}"
    }{% unless forloop.last %},{% endunless %}
  {%- endfor %}
]

Let’s break this template down. First, it grabs data from two files: stop_words.yml and split_chars.yml. These files, which live in your _data folder, contain lists of common words (like “the”, “a”, “is”) and characters used to split text (like periods, commas and other separators). These lists help keep the index small by ignoring unimportant words and make sure the text is broken down correctly.

The template then goes through all your blog posts, grabs the title and content, and cleans it up by making everything lowercase, removing HTML, and replacing splitting characters with spaces. Then, it gets rid of short words (one letter) and those common “stop words.” After that, it counts how often each word appears and finally creates the JSON for each blog post, including the title, the important words and their counts, the URL, and a short excerpt.

Here’s a sample of what _data/stop_words.yml and _data/split_chars.yml files might look like:

_data/stop_words.yml

words:
  - the
  - a
  - is
  - and
  - ... (add more words here)

_data/split_chars.yml

chars:
  - "."
  - ","
  - "!"
  - "?"
  - "-"
  - ... (add more characters here)

Search Page

The search page is a simple HTML page that contains a search input field and a list of search results. The search page uses the search index generated by Jekyll to search the content of the blog. The search page is implemented using another Jekyll template and a JavaScript file that loades search index using an AJAX request and then handles search requests.

search_index.md

---
layout: default
title: Search
permalink: /search/
---

<div id="search" data-baseurl="{{ site.baseurl }}">
  <input type="text" id="search-input" placeholder="Search articles...">
</div>

<script src="{{ site.baseurl }}/assets/js/search.js"></script>

This page has a search box and a div where the results will show up. It also includes the JavaScript file that does all the processing.

assets/js/search.js

document.addEventListener("DOMContentLoaded", function() {
    const searchInput = document.getElementById('search-input');
    const baseUrl = document.getElementById('search').dataset.baseurl;
    let resultsContainer;
    const maxResults = 10;

    fetch(`${baseUrl}/search_index.json`)
        .then(response => response.json())
        .then(searchIndex => {
            // Function to calculate TF-IDF score
            function calculateTfIdf(query, item) {
                const queryTerms = query.toLowerCase().split(' ');
                let score = 0;

                queryTerms.forEach(term => {
                    const termFrequency = Object.keys(item.keywords).reduce((acc, keyword) => {
                        if (keyword.startsWith(term)) {
                            acc += item.keywords[keyword];
                        }
                        return acc;
                    }, 0);
                    
                    const inverseDocumentFrequency = Math.log(searchIndex.length / searchIndex.filter(i => Object.keys(i.keywords).some(keyword => keyword.startsWith(term))).length);
                    score += termFrequency * inverseDocumentFrequency;
                });

                return score;
            }

            // Function to perform the search
            function performSearch(query) {
                query = query.toLowerCase();
                const results = searchIndex.map(item => ({
                    ...item,
                    score: calculateTfIdf(query, item)
                }))
                .filter(item => item.score > 0)
                .sort((a, b) => b.score - a.score)
                .slice(0, maxResults);

                displayResults(results);
            }

            // Function to display results
            function displayResults(results) {
                if (!resultsContainer) {
                    resultsContainer = document.createElement('div');
                    resultsContainer.id = 'search-results';
                    searchInput.parentNode.appendChild(resultsContainer);
                }
                resultsContainer.innerHTML = '';
                if (results.length === 0) {
                    resultsContainer.innerHTML = '<p>No results found.</p>';
                    return;
                }

                const ul = document.createElement('ul');
                results.forEach(item => {
                    const li = document.createElement('li');
                    const a = document.createElement('a');
                    a.href = item.url;
                    a.textContent = item.title;
                    li.appendChild(a);

                    const p = document.createElement('p');
                    p.textContent = item.excerpt;
                    li.appendChild(p);

                    ul.appendChild(li);
                });

                resultsContainer.appendChild(ul);
            }

            // Event listener for search input
            searchInput.addEventListener('input', function() {
                performSearch(this.value);
            });

            // Trigger initial search if search input has value
            if (searchInput.value.trim() !== '') {
                performSearch(searchInput.value);
            }
        })
        .catch(error => console.error('Error loading JSON:', error));
});

In order to better support matching of search terms we calculate the TF-IDF score for each document in the search index not by exact match, but with an startsWith check. This allows for partial matches to be included in the search results and allows the search to return results as soon as the user has typed even a partial word. The search results are then sorted by the TF-IDF score and the top 10 results are displayed to the user.

Due to the lack of stemming and other more advanced features the search is not as accurate as a full search engine, but it is good enough for a simple blog.

Conclusion

We’ve walked through the process of adding a client-side search engine to my GitHub Pages hosted blog. I’ve shown how I pre-processed the blog content and generated a search index using Jekyll, enabling fast and efficient searching directly in the user’s browser. This basic implementation serves my current needs well, providing a functional search capability. However, there’s always room for improvement and further refinement as the blog grows and evolves. I’m excited to explore more advanced features and enhancements to enhance the search experience.

My approach involved creating a JSON index of post titles, keywords, URLs, and excerpts. I chose to use Jekyll’s Liquid templating for index generation, leveraging its ability to iterate through my posts and extract the necessary information. While Liquid has some limitations, it was sufficient for my needs at this stage. I also implemented a simple JavaScript-based search function that uses the generated index to find relevant results. I opted for a simplified TF-IDF calculation and basic term matching to keep the implementation straightforward and easy to modify.

This approach works well for my blog’s current size and complexity. However, I recognize that it has room for improvement. For instance, the current keyword matching doesn’t handle variations in word forms or typos, which could lead to missed results. As this blog grows, this might become a more significant issue. Similarly, the simplified term frequency calculation could also be refined for better accuracy.

Looking ahead, I’m considering several enhancements. Exploring fuzzy matching and stemming could improve search recall and handle typographical errors. I might also investigate more sophisticated client-side search libraries, which often provide these features out of the box, as well as more advanced indexing and ranking algorithms. These libraries could offer a more robust and performant solution as the site scales.

In summary, this post has detailed my process of adding a client-side search engine to my GitHub Pages blog. This implementation serves my current needs and provides a solid foundation. I’m pleased with the results achieved, and I’m excited to explore future enhancements to better refine the search experience.

Kublet: A review of a failed Kickstarter promise

Sun, 09 Feb 2025 19:00:00 +0000

Remember Kublet? The tiny, powerful data tracker promising to bring essential information to your fingertips? It launched on Kickstarter in June 2023 with a compelling vision, raising over $150,000 from 800 backers eager to support its development. They envisioned a sleek device displaying everything from stock tickers to weather updates. Sadly, the reality fell far short of the vision, culminating in a project that, for many backers, feels effectively ghosted, leaving those investors with little more than disappointment. And Kublet’s story echoes a similar recent situation with another platform, Tidbyt, highlighting a potentially troubling trend in the niche hardware space.

The initial promise of Kublet was compelling: a minimalist device, customizable apps, and a vibrant community of developers. But the comments section of the Kickstarter page tells a different story. A story of defective units, a clunky and feature-poor app, and a growing sense of betrayal. Backers reported receiving Kublets that simply didn’t work. Others struggled with the app, a “skin and bones” offering that lacked basic functionality and frequently provided inaccurate data. Many promised apps never materialized, leaving users with a limited and often frustrating experience.

I backed the project soon after it’s launce in 2023, and I was excited to receive my 3 Kublets almost a year later in 2024. The device itself seemed decently well-designed and aesthetically pleasing, even though it did feel a bit cheep. The software was a different story. The Kublet apps where slow, buggy, and lacked many of the features promised in the campaign. The companion mobile app was empty, and the few available apps for the Kublet were poorly designed and often didn’t work. My Kublets quickly became a glorified paperweight, sitting on my desk and displaying outdated information most of the time.

Beyond the product itself, communication from the creators dwindled. Concerns raised in the comments were often met with silence, dismissive responses, or generic troubleshooting advice. Shipping issues plagued the project, with unexpected costs and address errors adding to the growing dissatisfaction. The promised developer community, meant to extend the Kublet’s capabilities, was hampered by inadequate tools and a lack of support.

The comments section of the projects Kickstarter page, once a hub of enthusiastic anticipation, transformed into a repository of complaints, frustrations, and accusations of a scam. Many backers feel they were sold a bill of goods, receiving a product that was both functionally lacking and far from the polished device showcased in the campaign.

This narrative bears a striking resemblance to recent events surrounding Tidbyt, a similar retro-style display platform. Tidbyt, while initially praised for its aesthetic and functionality, also saw its developers seemingly abandon their community over night. Similar to the Kublet situation, Tidbyt users reported a sudden decline in communication, a lack of updates, and a sense that their feedback was no longer valued. The Tidbyt developer community, once a vibrant hub of innovation, has also experienced a significant drop in activity.

These parallel experiences raise questions about the viability and sustainability of niche hardware projects, particularly those relying heavily on community development. Building and maintaining a thriving ecosystem of developers and users requires consistent effort, transparent communication, and a genuine commitment to addressing community needs. Both Kublet and Tidbyt demonstrate how easily this trust can be eroded when communication breaks down and promises go unfulfilled.

For both Kublet and Tidbyt users, the lack of ongoing support and communication feels like a betrayal. It leaves backers and early adopters with expensive devices that have limited functionality and diminishing potential. It also casts a shadow over future Kickstarter campaigns and raises concerns about the risks associated with backing niche hardware projects.

Months after the initial shipments, the Kublet Kickstarter page and Discord channels remains largely inactive. Updates have ceased, and the creators’ presence in the comments has become non-existent. The Tidbyt situation follows a similar trajectory. For many backers of both projects, they have become synonymous with broken promises and a creator team that appears to have vanished.

These projects serve as cautionary tales, reminding us that even the most promising Kickstarter campaigns can ultimately end in disappointment and silence. They leave behind frustrated users, piles of unusable hardware, and a lingering sense of what could have been.

Snowflake Key-Pair Authentication migration

Sat, 25 Jan 2025 00:00:00 +0000

Snowflake is implementing an important security enhancement by phasing out the use of usernames and passwords for authentication in November 2025. This change necessitates a transition to a more secure authentication method known as Key-Pair Authentication. This change was announced on the Snowflake Blog and is part of Snowflake’s ongoing commitment to enhancing security and protecting sensitive data.

Understanding the Transition

Traditional password-based authentication, while convenient, presents inherent security vulnerabilities. Key-Pair Authentication addresses these concerns by employing a more robust security model.

Key-Pair Authentication utilizes a public-key/private-key pair for authentication. The public key is registered within the Snowflake environment, while the private key remains securely stored by the user. This eliminates the transmission of passwords over the network, significantly reducing the risk of unauthorized access.

Implementing Key-Pair Authentication

The migration to Key-Pair Authentication involves several key steps outined in the Snowflake documentation:

Generating a Key Pair: Generate a unique pair of cryptographic keys – a private key and a corresponding public key.

Private Key:

openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8

This generates a private key PEM file named rsa_key.p8.

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIE6T...
-----END ENCRYPTED PRIVATE KEY-----

Public Key:

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

Output:

-----BEGIN PUBLIC KEY-----
MIIBIj...
-----END PUBLIC KEY-----

Registering the Public Key: Register the public key within your Snowflake account.

ALTER USER example_user SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';

Updating Applications: Modify your applications to utilize the private key for authentication.

Depending on your Snowflake driver or library, the implementation may vary. For example, if you are using the JDBC driver with a Apache Tomcat server this sample Github repository might be of help: https://github.com/markusos/tomcat-snowflake-sample/

The relevant part of the tomcat/conf/context.xml file would look like this:

<Resource name="jdbc/snowflake"
        auth="Container"
        type="javax.sql.DataSource"
        driverClassName="net.snowflake.client.jdbc.SnowflakeDriver"
        url="jdbc:snowflake://${SNOWFLAKE_HOSTNAME}/?user=${SNOWFLAKE_USER}&amp;private_key_file=/tmp/rsa_key.p8&amp;private_key_pwd=${SNOWFLAKE_RSA_KEY_PASSWORD}&amp;db=${SNOWFLAKE_DATABASE}&amp;schema=${SNOWFLAKE_SCHEMA}&amp;warehouse=${SNOWFLAKE_WAREHOUSE}"
        maxTotal="20"
        maxIdle="10"
        maxWaitMillis="10000"/>

Note that this file is templated and the values are replaced using the envsubst command in the startup.sh script.

The raw connection string would look something like this:

jdbc:snowflake://myorganization-myaccount.snowflakecomputing.com/?user=demo_user&private_key_file=/tmp/rsa_key.p8&private_key_pwd=pwd&db=demo&schema=test&warehouse=demo_wh

This type of setup also works for applications like Pentaho Business Analytics since it is built on top of Apache Tomcat webserver.

Conclusion

To ensure a smooth and secure transition it is important to plan ahead. Begin the migration process well in advance of the November 2025 deadline to avoid any disruptions to your Snowflake operations.

Migrating to Key-Pair Authentication is a crucial step in enhancing the security of your Snowflake environment. By implementing this more robust authentication method, you can significantly mitigate the risk of unauthorized access to your valuable data.

Exploring AI coding tools

Mon, 20 Jan 2025 23:00:00 +0000

In recent years, AI coding tools for software engineers have rapidly improved. Tools like GitHub Copilot, which has been around for over three years, are becoming an integral part of many developers’ workflows. However, the question remains: Do these tools really provide value? While I’ve seen the rise of these technologies, it wasn’t until last year that I had the opportunity to use them in my day-to-day work. After experimenting with Google’s Gemini coding assistant at work, I decided to explore and compare a few different AI tools in my own time.

For my experiments, I chose GitHub Copilot (which is powered by OpenAI’s GPT-4o model), as well as a local model I had access to via LM Studio: Llama-3.2-3B-Instruct-4bit. While the Llama model is less powerful than Copilot’s underlying GPT-4o model, I thought it would be a useful point of comparison, especially since I was able to run it locally on my MacBook.

To make this comparison meaningful, I created a small Python CLI app to solve the classic Knight’s Tour problem — a well-known backtracking problem in mathematics and computer science. The goal is to move a knight around a chessboard, visiting each square exactly once. It’s a problem often used to evaluate algorithmic problem-solving and the capabilities of AI in programming tasks.

Basic Problem Solving

The first phase of my experiment was to test whether these AI tools could help me write a basic Python script to solve the Knight’s Tour problem. I started by providing both tools with a simple prompt to get them started.

GitHub Copilot: Copilot quickly generated a full solution, which, after a few rounds of prompting, resulted in code that worked without major issues. It was impressive how quickly it tackled the problem and provided a working solution.

Llama (via LM Studio): The local Llama model was able to give me some basic structure for the code but struggled to generate a complete, functional solution. While it suggested some useful steps, it failed to produce a working result on its own, dispite multiple rounds of prompting.

This initial comparison made it clear that while both tools could help get started, GitHub Copilot was far better at generating a comprehensive, working solution on the first try. However, there was still room for improvement in terms of code quality, as I’ll explain below.

Code Optimization and Performance

Once the basic solution was in place, I moved on to testing how well the tools could help improve the performance and structure of the code. Specifically, I wanted to see if they could suggest optimizations, such as using Warnsdorf’s Rule, which is a heuristic for solving the Knight’s Tour problem more efficiently by reducing the number of moves at each step.

GitHub Copilot: Copilot was quick to suggest using Warnsdorf’s Rule to optimize the solution. It even provided working code to implement the rule, significantly improving the performance of the initial solution. This was a notable strength of Copilot: its ability to incorporate optimization techniques with minimal prompting.

Llama (via LM Studio): The Llama model also suggested using Warnsdorf’s Rule, but it wasn’t able to provide working code for this implementation. The suggestions were useful in theory, but they didn’t translate well into a functioning solution. This gap highlights the limitations of running a smaller model, like Llama 3.2, especially for more complex tasks.

Takeaways: Code Quality and AI Limitations

Although tools like GitHub Copilot can generate full working solutions, there are still several challenges when it comes to code quality. Here are some important takeaways from my testing:

Code Readability: The solution generated by Copilot was functional, but it wasn’t particularly readable or maintainable. Variables were sometimes poorly named, and the structure of the code could have been improved for clarity. I had to manually refactor the code to make it more understandable, which is something that a more experienced developer would have to do even after Copilot’s suggestions.

Optimization Limitations: While Copilot was able to suggest and implement an optimization like Warnsdorf’s Rule, the Llama model struggled to generate even basic optimization suggestions, let alone implement them. This shows that while AI tools are helpful for basic tasks, they still struggle when the complexity increases, and they often fall short in terms of delivering more sophisticated solutions.

Handling Code Across Multiple Files: One issue I encountered with GitHub Copilot was its integration with VS Code. While Copilot works well for generating code in a single file, it struggles to provide useful suggestions when the code spans multiple files or requires more context. In large projects, this could be a major limitation, as maintaining the right context across files is key to writing clean, cohesive code.

The Bigger Picture: AI Tools in Software Development

This experience aligns with what I’ve seen from other AI coding tools I’ve used professionally, like Google’s Gemini. While these tools can be helpful for providing starting points and solving simple problems, they often falter when the problem complexity increases. AI can assist in generating basic boilerplate code or solving straightforward problems, but they still need significant improvements to handle more complex tasks effectively.

One of the key areas where I see AI tools improving in the future is in their ability to handle larger codebases and more sophisticated optimization tasks. However, at this stage, I believe these tools are most useful for routine, repetitive tasks rather than high-level programming. For example, AI tools could be used for generating basic code scaffolding, writing unit tests, or solving well-defined, standard problems like sorting or searching algorithms. But when it comes to more complex, domain-specific code or nuanced problem-solving, developers will likely need to rely on their own expertise for now at least.

Conclusion: What’s Next for AI Coding Tools?

After using GitHub Copilot and the Llama model, I’m optimistic about the future of AI-assisted coding but believe we’re still in the early stages. While these tools can provide value for basic tasks, they’re not yet ready to replace the nuanced judgment and problem-solving skills of human developers. I’m excited to see how these tools will evolve over time. As AI models get more powerful and better at maintaining context across files, I imagine they’ll become more useful for larger, more complex projects.

For now, I’ll continue to write most of my own code, but I’ll definitely keep an eye on AI tools as they continue to improve. It’s clear that they have the potential to significantly augment software development, even if we’re not quite there yet.

Create a site map using CasperJS and Graphviz

Thu, 07 Jan 2016 22:30:00 +0000

I recently challenged myself to start learning CasperJS, a scripting tool that is using the PhantomJS WebKit headless browser to navigate the web. For one of my first projects, I decided to build a site map generation script. Since I was already familiar with the Graphviz tools I decided to use the Graphviz sfdp command-line tool to generate visual site maps. This post is a short introduction to the tool, and how to use it.

CasperJS is an open-source navigation scripting & testing utility written in Javascript for the PhantomJS WebKit headless browser and SlimerJS (Gecko). It eases the process of defining a full navigation scenario and provides useful high-level functions, methods & syntactic sugar. - casperjs

The site mapper tool is available on Github.

How to use:

Install CasperJS, for more instructions see here: docs.casperjs.org

$ brew install casperjs --devel

Run the scrip to crawl your website, in this case this site “http://markusos.github.io/:

$ casperjs sitemap.js http://markusos.github.io/ > map.dot

This scrapes the provided site by following all publicly accessible internal links the crawler can find on the site. The script outputs the site structure in DOT format to the terminals standard output and can easily be piped to a file for further processing.

If you want to generate a visual site map graph from the DOT file output, you need to have Graphviz installed. Use Brew to install it, or if you don’t have a Mac with Brew get it from here: www.graphviz.org

$ brew install graphviz --with-gts

Run Graphviz on the site map DOT file with this command:

$ sfdp -Tsvg map.dot -o sitemap.svg

This creates an SVG image of the site map. It could look something like this:

Have any questions or ways to improve the script? Leave a question here below!

Brew after updating to El Capitan

Wed, 30 Sep 2015 22:30:00 +0000

If you’re an early adopter, like me, you probably also spent a few hours today installing the latest version of Apple’s newest OS X version, El Capitan (OS X 10.11). If you use your Mac for any kind of development, chances are you also have the Mac’s missing package manager, Brew, installed. As with any time Apple releases an update, there is some tinkering to be done to get everything running again. In this post, I’ll go through some of the problems I ran into with Brew after updating to El Capitan and how to solve them.

The first step is to make sure that the latest version of Brew is installed. To do this, use the command brew update. You might run into this error message:

$ brew update
...
error: unable to unlink old '.gitignore' (Permission denied)
Error: Failure while executing: git pull --quiet origin refs/heads/master:refs/remotes/origin/master

This is probably caused by the OS X update process resetting the folder owner of the /usr/local folders. By running sudo chown -R $(whoami) /usr/local you can reset the folder owner for all subfolders to the currently logged-in user on your system. After that, the brew update command should hopefully work fine.

The next step is to run brew upgrade to upgrade all installed packages to the last version. Here you might run into a couple of different errors; I ran into this error, among a few others:

$ brew upgrade
...
configure: error: C compiler cannot create executables

This is most likely caused by an outdated or misconfigured C compiler. If you see this, you can try to update to the latest version of the Xcode command-line tools. By running the command xcode-select --install and following the steps of the installer will update the Xcode command-line tools to the new version 7.0. You can also install the newest version of Xcode through the App store. If you run into any more problems the brew doctor command can give you more hints as to what might be wrong.

I hope this post eases your transition to El Capitan!

How to build a Web scraper with DOM parsing in 10 minutes

Tue, 25 Aug 2015 22:30:00 +0000

API:s are a developer’s best friends when accessing remote data, but great API:s does not grow on trees. So what do you do when the data you need isn’t accessible through a well-designed API, or no API at all for that matter? As long as the data is accessible through your Web browser, you can always just scrape it yourself! In this post, I’ll go through how to build a simple Web scraper in 10 min using Guzzle and PHP’s DOM parser. I’ll also give a brief introduction to XPaths.

Web scraping is the art of fetching and parsing a Web document to extract information.

When scraping a Web site we first need to request the page and receive the page HTML DOM tree. There are a bunch of tools that could be used for this. One of the more well-known tools is cURL, a library and command-line tool that can be used to transfer data using a wide range of protocols, including HTTP.

After we have received the HTML DOM from the Web server it is time to parse the DOM tree to extract the information we want. The most naive way is to do this with string operations and regular expressions. This is usually very time-consuming for more complex HTML documents and it usually doesn’t work too well if the HTML DOM changes slightly. A more robust solution is to use a DOM parser library with either CSS selectors or XPath’s to query the DOM for the DOM elements that contain the information we want to extract. PHP has a decent built-in DOM parser in one of the languages default extention.

In the basic scrape class below I use the Guzzle PHP library to request and receive the site HTML DOM. Then I use PHP’s DOM parser library to parse the DOM tree for the nodes containing information.

<?php namespace Scrape;

/**
 * A basic web scraper class
 * @author Markus Östberg <markusos@kth.se>
 */

use \DOMXPath;
use \DOMDocument;
use GuzzleHttp\Client;
use GuzzleHttp\Exception\ConnectException;

/**
 * Class Scrape
 * @package Scrape
 */
class Scrape
{
    /**
     * @var Client
     */
    private $webClient;
    /**
     * @var DOMDocument
     */
    private $dom;

    /**
     * Init scraper to scrape $site
     * @param string $site Site to scrape
     * @param int $timeout seconds before request times out. 
     */
    public function __construct($site, $timeout = 2)
    {
        $this->webClient = new Client([
                'base_uri' => $site,
                'timeout' => $timeout
            ]);
    }

    /**
     * Load sub page to site.
     * E.g, '/' loads the site root page
     * @param string $page Page to load
     * @return $this
     */
    public function load($page) {

        try {
            $response = $this->webClient->get($page);
        } catch(ConnectException $e) {
            throw new \RuntimeException(
                    $e->getHandlerContext()['error']
                );
        }

        $html = $response->getBody();

        $this->dom = new DOMDocument;

        // Ignore errors caused by unsupported HTML5 tags
        libxml_use_internal_errors(true);
        $this->dom->loadHTML($html);
        libxml_clear_errors();

        return $this;
    }

    /**
     * Get first nodes matching xpath query
     * below parent node in DOM tree
     * @param $xpath string selector to query the DOM
     * @param $parent \DOMNode to use as query root node
     * @return \DOMNode
     */
    public function getNode($xpath, $parent=null) {
        $nodes = $this->getNodes($xpath, $parent);

        if ($nodes->length === 0) {
            throw new \RuntimeException("No matching node found");
        }

        return $nodes[0];
    }

    /**
     * Get all nodes matching xpath query
     * below parent node in DOM tree
     * @param $xpath string selector to query the DOM
     * @param $parent \DOMNode to use as query root node
     * @return \DOMNodeList
     */
    public function getNodes($xpath, $parent=null) {
        $DomXpath = new DOMXPath($this->dom);
        $nodes = $DomXpath->query($xpath, $parent);
        return $nodes;
    }
}

Now how do we use this class to scrape a Web page? First we need to understand what an XPath is and how to use it.

XPath

If you’ve ever written CSS you should know what a CSS selector is. XPaths stands for XML Path Language and is a query language, just like CSS selectors, used for selecting nodes from an XML or HTML DOM tree. Most modern browsers support XPaths in their development console, so press Ctrl + Shift + I if you are on Chrome in Windows or Cmd + Opt + I on Crome in OSX and type in this:

$x('//b[@id="test"]');

When you hit enter is should return this element.

So how does it work? The first part, // tells the parser to start at the root of the document. b filters it down to all <b> tags on the page. The brackets [] right next to the b tell the parser to match attributes on the b elements. @id="test" is used to only match the node where the attribute id equals “test”. Let’s look at some more examples of how we can select the same node, If you want you can look at the DOM and try to figure out how it works:

$x('//b');
$x('//article/p/b');
$x('//article//b');
$x('//article/*/b');
$x('//article[@class="post-content"]/p/b');
$x('//*[@id="test"]');

Putting it all together

Now let’s use the scraper class and our knowledge about XPaths to scrape the root page of this site:

<?php

$scraper = new Scrape('http://markusos.github.io/');
$scraper->load('/');

$siteTitle = $scraper->getNode('//a[@class="site-title"]');
echo $siteTitle->nodeValue;

$posts = $scraper->getNodes('//ul[@class="post-list"]/li');

foreach($posts as $post) {
  $postLink = $scraper->getNode('./h2/a[@class="post-link"]', $post);
  $date = $scraper->getNode('./span[@class="post-meta"]', $post);
  $excerpt = $scraper->getNode('./p', $post);

  echo $postLink->nodeValue;
  echo $postLink->getAttribute('href');
  echo $date->nodeValue;
  echo $excerpt->nodeValue;
}

When you run this it should print out a list of all the posts and the corresponding data available on the home page of this site.

If you read this far, I hope that you found this introduction to scraping useful. If you have any questions regarding XPaths or something else in this article, just post in the comment section below and I’d be happy to help. The source code is also available on GitHub.

Dockerizing a PHP project - A short introduction

Tue, 21 Jul 2015 22:30:00 +0000

After using Vagrant and VirtualBox at Engage for several months I finally stumbled over a project at work where I got the opportunity to learn and use Docker. Inspired by my newly acquired knowledge I wanted to experiment more with Docker so I decided to “Dockerize” my latest side project.

Docker is a platform for developers and sysadmins to develop, ship, and run applications. Docker lets you quickly assemble applications from components and eliminates the friction that can come when shipping code. - Docker.com

The simple search engine project I worked on a few months back was a perfect candidate for Dockerization since it had dependencies on multiple services, databases, and libraries. Running the project on my local machine was a hassle and I had yet to set up a Vagrant box for it.

Get Docker up and running

There are many great tutorials out there, so I’m not going to dive too deeply into how to set up Docker on your local machine. If you, like me, are using a Mac for your development I suggest you start here: docs.docker.com/installation/mac/ to read about what you need to install Docker. In this post, I’ll use Boot2Docker and Docker-Compose to set up and run my PHP application.

Since Docker does not run natively on Mac you need to run it inside a Virtual Machine. Boot2Docker is a lightweight VM custom-built for running Docker. If you have used Vagrant before, you’ll feel right at home. Follow the installation guide in Dockers documentation for Mac and you should be ready in no time. If you already have VirtualBox and my favorite package manager for Mac, Brew, installed; all you need to do is to run:

$ brew install docker
$ brew install boot2docker
$ boot2docker init
$ boot2docker up

The next step is to install Docker-Compose. It is another great tool, used to defines multi-container applications. It basically keeps track of how to initialize and run applications that need several docker containers that are linked together. You can read more here: docs.docker.com/compose/

$ brew install docker-compose

Define the Applications Dockerfile

Let’s start by looking at a simple Dockerfile, the basic configuration that docker uses to build a container:

FROM ubuntu:latest

RUN apt-get update && apt-get install php5 php5-json php5-mysql php5-mongo php5-memcached php5-xdebug -y
RUN apt-get install php5-dev make php-pear -y
RUN yes '' | pecl install mongo
RUN yes '' | pecl install stem
RUN echo "extension=stem.so" | tee -a /etc/php5/cli/php.ini

ADD . /code

This basic file contains three commands FROM, RUN and ADD:

FROM ubuntu:latest pulls the latest official Ubuntu container image and uses it as a base for the container. The FROM command is used to define the parent image of the

RUN apt-get update && apt-get install php5 php5-json php5-mysql -y Updates the system and installs some common php5 dependencies, it is easy to install more dependencies if needed. The following lines with RUN commands install more dependencies and PHP libraries used through the pecl package manager.

ADD . /code mounts the current directory (the directory where the docker command is run from) to the code directory inside the container.

Docker-Compose

If you want you can run build and run the container by itself, but we are going to use this container in combination with some other containers hosting the caching provider Memchached and the databases MySQL and MongoDB. To set this up we are going to use Docker-Compose.

To use Docker-Compose we first need to define a docker-compose.yml configuration file:

web:
  build: .
  command: php -S 0.0.0.0:8000 -t /code/src/Demo
  ports:
    - "8000:8000"
  links:
    - db
    - mongo
    - memcached
  volumes:
    - .:/code
  environment:
    DB_HOST: db
    DB_NAME: test
    DB_USER: root
    DB_PASSWORD: password
    MEMCACHED_HOST: memcached
    MEMCACHED_PORT: 11211
    MONGO_HOST: mongo
    MONGO_PORT: 27017

db:
  hostname: db
  image: mysql
  environment:
    MYSQL_DATABASE: test
    MYSQL_ROOT_PASSWORD: password

mongo:
  image: mongo
  command: mongod --smallfiles --quiet --logpath=/dev/null

memcached:
   hostname: memcached
   image: memcached
   environment:
     MEMCACHED_MEMORY_LIMIT: 128

This configuration file defines four different services. When running this configuration, each of these services is started in its own Docker container.

build Defines the path to the Dockerfile that should be used to build the container.
image (alternative to build) Defines an image on Docker hub to use to build the container.
command Defines the command that will be run inside the container when started.
ports Defines the port binding between the docker container and the docker host environment.
links Connects the web container to the listed services. This means that the web container can access data in the databases and Memcached instance.
volumes Mounts volumes from the docker host environment to the service provider container.
environment Defines the system environment variables available to the application running inside the container. Here we define things like hostnames, usernames, and passwords used in the application.

The first one, web, is the Dockerfile we defined before. The following three service providers are defined to use the official images for MySQL, MongoDB, and Memcached. They all define the hostname that they are accessible through in the linked web container as well as environment variables. The command config row can be used to override the image’s default command, as seen in the MongoDB provider case.

We are now ready to build and start the containers and run the application.

$ docker-compose build
$ docker-compose up

The containers should now be running in your docker container host and should be accessible at the host’s IP on port 8000.

Below follows some good to know commands when getting started with Docker.

# SSH into the boot2docker VM
$ boot2docker ssh

# List running containers
$ docker ps

# List built docker images
$ docker images

# List built docker images
$ docker images

# Delete all containers
$ docker rm $(docker ps -a -q)

# Delete all images
$ docker rmi $(docker images -q)

Hope this gives you some ideas on how to use Docker and Docker-Compose to run your PHP applications. If you want to see this Docker configuration in action, head over to GitHub and check out my simple search engine project here: github.com/markusos/simple-search.

Automatically solve Kuku-Kube with JavaScript

Fri, 10 Apr 2015 23:59:00 +0000

Yesterday during my lunch break one of my co-workers introduced me, and the other developers at Engage, to the color puzzle game Kuku-Kube. The goal of the game is to as quickly as possible identify and click on a uniquely colored square in a grid of squares. The game has been featured on a number of big sites during the last day, including Reddit and Business Insider. This post is about how I build an automatic solver in JavaScript over a lunch break.

It starts off easy, with just a few squares and a significant difference between the colors. But as you play it quickly gets harder. The number of squares increase and the difference in color decrease. After just a few successful clicks there is only a small difference in hue between one square and all the other squares in the grid.

After giving it a few tries I came up with the idea of writing a script that automatically plays the game. The plan was to see what kind of score a perfect player would get compare to the scores we got when we played ourselves.

I wanted the script to play the perfect game, so just clicking on every square was out of the question. That is too easy and can be done with just a few lines of javascript.

A few minutes later the first version of the script was ready. The script algorithm was designed to as quickly as possible identify a uniquely colored square and then click on it. This is done by inspecting the color of the squares one by one until the script has seen two different colors, where one color has been seen only once and the other has been seen at least twice.

Since the game always only contains two colors we assume that if a color has been seen twice it is not a unique color. Therefore a square of another color that has only been seen once has to be unique. This assumption lets us avoid looking at every square before we can find the unique one.

After each click, the script sleeps 10 ms to let the game load a new set of squares. This also lets the game timer work (almost) normal.

Below follows the code for the script. It can also be found on Github.

/* KukuKube Solver
 *
 * Solves the color tests on www.kuku-kube.com automatically.
 * To run this, run the javascript code below in the
 * browsers developer console.
 * The script identifies the uniquely colored box in the box  
 * grid and uses javascript to click on it.
 *
 * WARNING: Don't run the script if you have epilepsy!
 *  It will flash different colors fast when it solves the color tests.
 *
 * MIT License (MIT)
 * Copyright (c) 2015 Markus Östberg
 */

var KukuKube = function ($) {

    function startTest() {
        var start = $('.play-btn:visible');

        if (start.size() === 1) {
            start.first().click();
            solveKukuKube();
        } else {
            console.log('Could not find start button.');
        }
    }

    function solveKukuKube() {
        var boxes = $("#box").find("span");

        // Get the color of the first box as a baseline
        var color = boxes.first().css('backgroundColor');

        var colorCount = 0, secondColorCount = 0;
        var colorPosition, secondColorPosition;

        boxes.each(function(i) {

            // Count color occurrences
            if ($(this).css('backgroundColor') === color) {
                colorCount += 1;
                colorPosition = i;
            } else {
                secondColorCount += 1;
                secondColorPosition = i;
            }

            if (secondColorCount === 1 && colorCount >= 2) {
                boxes[secondColorPosition].click();
                return false;
            }

            if (colorCount === 1 && secondColorCount >= 2) {
                boxes[colorPosition].click();
                return false;
            }
        });

        // Check if game is over
        if ($('.gameover:visible').size() > 0) {
            console.log('Game Over!');
        }
        else {
            // Game is still running, solve the next one!
            setTimeout(solveKukuKube, 10);
        }
    }

    return {
        init: function () {
            startTest();
        }
    };

}(jQuery);

jQuery(KukuKube.init);

So there we have it; a somewhat worthless piece of code that automatically plays a browser game that’s probably gone again in a few days. But it was a fun lunch break challenge and a way for me to clear my mind of my other work projects for a few minutes.

And one last thing. Try beating it yourself, if you can: