Playing With Pandas

| Comments

This is the third part of the articles I am writing about my little project I am working on. In part1, I created a web scraper to get the data I needed. In part 2, I added support to save the collected data to a mongodb database. Now in this part, I will look into how to clean up and add new features (columns) to the collected data to make it more suitable for analysis.

My primary motivation here is to learn new technologies as I progress, so my baby steps may not be the state of art in this particular area and all tips and tricks or corrections are welcome.

For this project I am using python and each day I love it more and more. There are some cool libraries for python such as pandas that will be used. There are some col tools such as python notebooks that will be also used.

Getting Started With Handsontable

| Comments

I am looking for a way to be able to display and manipulate tabular data on a webpage. For this, I decided to evaluate a few javascript based libraries and see how they can be used.

The first such library is called handsontable. Handsontable is a Javascript/HTML5 Spreadsheet library for developers. It seems that Handsontable is more for building Excel like user interfaces than just simple data grid control.

So let’s look how easy to build a minimal example using with handsontable

Scraping Stackoverflow Careers for Fun and Profit

| Comments

When you want to learn something new the best way to do is to come up with a problem that can be useful to you or maybe to others and then solve it. Not so long ago I decided that I want to start learning something data related. So I came up with the idea that I would create a solution that gathers the data from Stackoverflow Jobs. Do some manipulation on it and then present it in a nice format. This is the first step of the journey.

To present something, I need data. To get the data, I need to write a script that gathers data from the web. I decided to use python and Scrapy framework.

Serializing Empty IEnumerable With

| Comments

I am recently trying to use for serializing objects for network communication in a .net application. Protobuf is a great format for sending data. Protobuf tries to be very frugal to save bandwidth. However, this causes some unexpected side effects when using it with .NET.

The Problem With Wirecutter

| Comments

I love I like them a lot. They are posting really well researched reviews about stuff I care about a lot. Headphones, smartphones, TVs, NAS. Some of their articles are based on hundreds of hours of research and they are trying out dozens of gadgets to find the best ones. They save me time, energy and money and I trust them…

Whenever I am looking for something to buy or I need to learn about stuff I will go there. Read their article and learn a lot about the actual topic. After I am done with the reading I am ready to buy whatever they recommend me to buy. Only if, I could buy it.

Recently, I was looking for a new smart tv that is cheap, so I read from start to finish The Best $500 TV on my favourite site. Their pick is Samsung PN51F4500, which seems like an amazing tv, so let’s buy it. Quick google search shows that I cannot buy this tv in the EU.

It turns out I need to decrypt the model number and try to find a similar tv. So PN51F4500 means that it is a P=Plasma TV made for N=North America in 51 inch size. The model number also tells me that it is a F=2013 model from the 4500 series. Now I know enough, so I will search for PE51F4500 in the EU or one of the variants. I found the PE51H4500AW model which I think is the 2014 year model of the same series.

So the problem is with is not really a problem for all. It is rather my problem. It is the problem of amazon that they are no delivering to remote places like the eastern part of the EU. It is the stupidity of the TV manufacturers that they give out different model numbers for product that are the same and I have to spend time on decrypting the model number and then translate it to a model that is available in my country.

At the end, it comes down to a lot of factor why this is a problem and the easiest way to fix this if wire would start posting the different model numbers for each market - if there are differences. It would make my life easier…

What Does the Term Porcelain Mean in Git

| Comments

“Porcelain” is the material from which toilets are usually made (and sometimes other fixtures such as washbasins). This is distinct from “plumbing” (the actual pipes and drains), where the porcelain provides a more user-friendly interface to the plumbing.

Git uses this terminology in analogy, to separate the low-level commands that users don’t usually need to use directly (the “plumbing”) from the more user-friendly high level commands (the “porcelain”).

One of those programmer jokes…

Reading About Date and Time

| Comments

So is Noda Time perfect then?

Of course not. Noda Time suffers several problems:

Despite all of the above, I’m a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven’t quite implemented it yet.

I am getting more confused about this simple thing called date and time minute by minute…

Simple.Data and Dynamic Calls Are Awesome

| Comments

Couple of weeks ago I started to look into different open source packages on the .NET platform such as Nancy and Simple.Data. They all looked very cool, but one paragraph in an article called Introducing Simple.Data - explaining how the code snippet below works - caught my attention:

getting data from db
var db = Database.Open(); // Connection specified in config.
var user = db.Users.FindByNameAndPassword(name, password);

That’s pretty neat, right? So, did we have to generate the Database class and a bunch of table classes to make this work?


In this example, the type returned by Database.Open() is dynamic. It doesn’t have a Users property, but when that property is referenced on it, it returns a new instance of a DynamicTable type, again as dynamic. That instance doesn’t actually have a method called FindByNameAndPassword, but when it’s called, it sees “FindBy” at the start of the method, so it pulls apart the rest of the method name, combines it with the arguments, and builds an ADO.NET command which safely encapsulates the name and password values inside parameters. The FindBy* methods will only return one record; there are FindAllBy* methods which return result sets. This approach is used by the Ruby/Rails ActiveRecord library; Ruby’s metaprogramming nature encourages stuff like this.

Just read that sentence once again…

That instance doesn’t actually have a method called FindByNameAndPassword, but when it’s called, it sees “FindBy” at the start of the method, so it pulls apart the rest of the method name, combines it with the arguments, and builds an ADO.NET command which safely encapsulates the name and password values inside parameters.

This sounds like crazy awesome magic and I need to understand it!

How magic works?

To find out how the magic worked, I downloaded the source code from github and started to poke around the code. The rest of the article explains what I got out of this experiment.