Define a simple rule for matching

Ok let’s start to understand the rewrite engine.

Let’s do it with an example, our goal:Find all the methods that send the consecutives mesages: #globlals #at:

In our example we want as result:

 

findClassesForCategories: aCollection
| items |
aCollection isEmpty
ifTrue: [ ^ self baseClass withAllSubclasses asSet ].
items := aCollection
gather: [ :category |
((Smalltalk organization listAtCategoryNamed: category)
collect: [ :each | Smalltalk globals at: each ])
select: [ :each | each includesBehavior: self baseClass ] ].
^ items asSet

But not with this variation:

findClassesForCategories: aCollection
| items |
aCollection isEmpty
ifTrue: [ ^ self baseClass withAllSubclasses asSet ].
items := aCollection
gather: [ :category |
((Smalltalk organization listAtCategoryNamed: category)
collect: [ :each | Smalltalk globals at: each ifAbsent: [ "ignore" ]])
select: [ :each | each includesBehavior: self baseClass ] ].
^ items asSet

Why? because in the second example we send the messages #globals (ok we want this!) but then we send #at:ifAbsent: (we don’t want this)

Ok so… Which object do I send wich message to find my “problematics” methods?

Which object?
The answer is very simple a RULE… For each rule you should extend one flavor of rule and define what you want to do. Sounds easy let’s check the rules hierarchy

All of those are abstract classes, so we will take a look into them and choose the one that is better for our case.

  • RBLintRule: defines the protocol used for execute a rule.
    This is a very general class, doesn’t define:
    – what to do when we are cheeking a class, for this we have to implement: #checkClass:. In our case we don’t care about the class, so we can let the default that does nothing with the class
    – what to do when we are cheeking methods, for this we have to implement: #checkMethod:.
    In our case we should obtain the source code form the method and search in the source code all the variants for the consecutive messages #globals #at:, an inocent version of this can be:
    A way to do it:

    checkMethod: aContext
    (aContext compiledMethod source findString: 'globals at:') ~= 0
    ifTrue:["we have a match in aContext selectedClass>> aContext selector we should handle it"]
    

    I said that it is an innocent implementation because if we have a method with the code: “(…) globals  “yes a comment here!”     at: (…)” or “(…) globals           at: number” we aren’t causing a match and if we have: “(…) globals at: something ifAbsent: (…)” we are and we shouldn’t.
    Probably we should allways use one subclass of LintRule with more features implemented otherwise we have to extend LintRule to define what to do in case of matching, model a result, transforming code (…)

  • RBBasicLintRule: it’s another abstract class that adds a result associated to a rule. We still should redefine checkMethod and/or check class but we have a proposal for a result:
    checkMethod: aContext
    (aContext compiledMethod source findString: 'globals at:') ~= 0
    ifTrue:[result addClass: aContext selectedClass selector: aContext selector ]
    

    This can be a good extension point, but again… you have a lot to implement here.

  • RBBlockLintRule: it’s an abstract class, that by default specifies that the resultClass corresponds to a selector environment.
    This class doesn’t add a lot of behavior but we reduce errors related to the result handling.
  • RBParseTreeLintRule: here is where everything gets interesting.
    The main point in this class is to offer an implementation for #checkMethod: the implementation motivation is to check methods using ast’s representations.
    We will have a wanted tree and then we will obtain the AST for the checked method, in order to find a match we observe if the wanted tree is subtree of the method tree.
    Also it’s posible to define metavariables where we can specify that we want to match with a node type but not necesary the value of it.
    In our example our matching expression will look like:

    “@lotOfStuffBefore globals at: “@lotOfStuffAfter

    With this expresion we are saying:
    –  “@lotOfStuffBefore: ignore everything before globals (receiver / message send):
    –   “@lotOfStuffAfter: ignore everything after at: (object / messages)
    (For more information about metavariables see my first post)
    The solution implemented for the check method introduces new objects:

    Matcher: Is the responsable for visit an AST and verify the matches. Is the matcher the one who interprets the matching expressions. First of all we have to specify if we are matching with a method (matchesMethod: aString do: aBlock) or an expreesion (matches: aString do: aBlock) pattern.
    We can add more than one matching expression, it’s very important to know that every time we add an expression in the matcher inside we are adding a RBSearchRule for that expression.
    When the tree rule sends the message executeTree: to the matcher at the end the matchers iterates over all the rules visiting the nodes delegating in the RBSearchRules to perform the match.
    It’s important to say that the search rules aren’t deleted automatically, so, if you want to reuse the object probably you should reset the rule, reseting the matcher.
    ParseTreeEnvironment: Is an specialized SelectorEnvironment that makes it possible to detect the selection interval for an expression inside the method, using the ast matcher
    This class is still abstract because we should categorize it, adding a name and initilize the rule with the matching expression patterns.
    In functionality is almost the same that before but we have other abstraction level, the result is handled automatically, and we don’t have to worry about checking a method or a class only to define the desired matching expression.

  • RBCompositeLintRule: Is just a composite for rules.
  • RBTransformationRule: The main idea here is to produce a transformation in the system, for this implements: #checkMethod: in a similar way that RBParseTreeLintRule the diference is that if we find a match we will produce a modification in the code and then we change the method with the new version of the code (recompiling the new method).
    In order to solve this the rule adds some objects:
    RBParseTreeEnvironment: with all the results, the results are: RBAddMethodChange to track a change in a method
    RBParseTreeRewriter: It’s a subclass off the matcher (RBParseTreeSearcher) and again the main point is that this is a visitor that works over a method AST changing it depending in the matching and transforming expression.
    The transforming expression also works with metavariable and usually we use the metavariables defined in the matching expression to specify the transformation.

Resuming:

  1. Before start you should choose if you want to:
    perform a search
    do a match
  2. Implement your rules because all of them are abstract, probably you will end up using the Tree rules because are more automatics and powerfull than to basics.
  3. When you are defining your rule:
    – give a name for it
    – define if you will use a method or an expression type pattern
    – write your patterns and add them to the rule
    – if you are in a search rule to the matcher
    – if you are in a transformation rule rewriteRule
    – define what to do with a result
  4. run your rule
  5. use your results
  6. if you want to reuse it reset your rule, again if it’s a search rule, reset the matcher, if it’s a transformation rule the rewriteRule.

In our example:

  • we want to match, so let’s create an object that extends: RBParseTreeLintRule:
 RBParseTreeLintRule subclass: #SearchGlobalsAtUsage
 instanceVariableNames: ''
 classVariableNames: ''
 poolDictionaries: ''
 category: 'Blog-example'
 
  •  we have to implement the abstract methods:
 name
 ^ 'Find all potential wrong usage in with globals'
 
  • I want to match an expression type because I do not care about the rest of the method, I want everything that contains the messages: #globals #at:, we also have to say what to do with the matching node in this example I will open an inspector:
 initialize
 super initialize.
 self matcher
 matches: '``@lotOfStuffBefore globals at: ``@lotOfStuffAfter'
 do: [:theMatch :theOwner | theMatch inspect].
 
  • now we should run the rule:

WARNING: this can take a time because you will check the hole system

 SearchGlobalsAtUsage new run.

To avoid this you can restrict the environment for your rule, an example:

 rule := SearchGlobalsAtUsage new.
 environment := RBClassEnvironment class: Result.
 RBSmalllintChecker runRule: rule onEnvironment: (environment).
 

If you have matches then you will see the inspector.

So, as we can see this is quite complex, and in the sinposis you can see that before doing anything you have to make too many decisions, the idea behind Flamel is to make it easier.

The equivalent code (with a restricted environment) using Flamel for all this is:

FlamelMatchAndTransformRule new
 matchingExpression: '``@lotOfStuffBefore globals at: ``@lotOfStuffAfter';
 scope: environment;
 run;
 result

If you evaluate that and inspect it you can search your results 🙂

I think this is quite cool to replace all that code (with class creation included) with 5 lines.

And this was all for understanding a little bit the rewrite engine and see Flamel in action for today

 

Just want to match!

My face inspecting the empty result set

My face inspecting the empty result set

It took me many hours (and too much coffee) to realized all of this and I want to share with you some tips to help you define the matching pattern you really want to aplly

  • Pay attention to the dots! `sel “@.Statements1. self subclassResponsibility.  “@.Statements2 is very different to: `sel “@.Statements1 self subclassResponsibility.  “@.Statements2
  • Matching the selectors is not a simple task
  • Is not the same matching an expression that matching a method one line can change everything!

If you say:

</pre>
matcher <strong>matchesMethod:</strong> aMatchingExpression

You are particullary saying that aMatchingExpression should be parsed doing: RBParser parseRewriteMethod: aMatchingExpression instead of RBParser parseRewriteExpression: aMatchingExpression

And this mean that:

  1. Your string is well formed, if you have a syntactic error you will see the window saying that the expression is not correct
  2. Your string has a method structure
  3. Your metavariables will play as a message send or as a variable depending on the context

A fast example, imagine this expression:

`anObject size.

Let’s play we are the a cool parser and someone told us hey… this is a string, but you should interpret it as a mehod… so:

`anObject = the selector so I want all the unary methods

size = a variable named ‘size’ because it can’t be a parameter because my message is unary, it’s not a variable definition it has to be a variable.

So as a cool parser I say that I will match with all the unary methods that the body is just an unique sentence that contains a variable called ‘size’

Now, imagine someone told us this is an expression… so:
`anObject = the object, I do not know nothing, only that here goes an unique object
size = before I had an object so here I should have an invocation for the message #size.

So as a cool parser I say that I will match with all the methods the expressions that sends the message size

  • If you look for a particular message send you should care about the structure:
  1. Do you want all the senders? for defining the expresion is not the same object messageToFind that object message anotherMessage messageToFind moreMessages
  2. It can be inside a block?
  3. It can be invoked as a symbol? object perform: #messageTofind
  4. It can be in any part of method? (the first sentence? after that can be more message sends?)
  • The matcher uses pattern matching! You give an alias and when it matches then is bounded and that’s

This pattern:

`sel “@.Statements1. self messageTofind.  “@.Statements1

Does not match with:

example
self oneMessage.
self messageTofind.
self otherMessage.

Because we use the same variable name! Statements1 and in the matching it bounded to the statement self oneMessage that is NOT identical to self otherMessage so we don’t have the match.
But! It should match with:

example
self oneMessage.
self messageTofind.
self oneMessage.

But this feature is really cool if you want to search repeated code in the same method…

  • If you are matching inside a method, does the method defines temporary variables?
  • Remember that when you parse if the variable definition is wmpty it get’s ignored but if you do not put it in your pattern and the method defines a temp you are excluding it.

An example

example
|aTempVar|
self messageTofind.

matches with method pattern:

`sel |`@vars| `object messageTofind

but does NOT match with:

`sel `object messageTofind 

In the end I can understand that all this “tinny” details are very important… but if you want to match a common case it can be really ugly and it’s VERY easy to do it wrong.

But do not worry too much one big goal for Flamel it’s to offer a simple API to avoid all this commons errors.

Developing under develop

I think that this had happend to a lot of smalltalkers with the time… After a while you just get used to do some tasks that aren’t very intuitive for a newcommer and started to feel “natural”. Yes… the classic user that get used to a system…

And these days I’ve been watching the frustation of someone that is starting the path and trying to help I just realize that I have a lot of tips and tricks that I follow unconcious and decided to make an entry here maybe this help and more people with more expirience would share their tips.

Starting

Before start your coding you have to setup your environment, an one choose is which image? Wich version? I strongly suggest the lastest but maybe you should keep reading before choose.

Why does the title say developing under develop?

Because when I start a new project in pharo I want to use the lastest features, Pharo is getting really cool and has improved a lot and to use all those feautures I need the lastest image.

Other reason is that while you are programming your system you are testing Pharo also and time to time you find some problems and correct them are not so hard, so it pushes you to improve Pharo itself!

So in order to have all the benefits you have to use the lastest image (under develop) and with that you should have concience in the consecuencies and take some considerations.

Mental preparation

Ok you are in a system under developing, IS NOT PERFECT but it is in the path to be one step near to the perfection and to achive that needs your help.  Basically applies to the system the same rule that applies to your development:

  • The system may change: some apis can change and maybe you are an user of that API. This is not bad! Probably it’s a refactor to improve the code, so you get benefit because your code will be more expressive.
  • The system can have bugs! Yes as I said is not perfect and it’s not on purpose.
  • Sometimes the lastest image have problems, yes we would love to be perfect but sometimes there are some nusty side effects and some functionality that stop working, but to find it sometimes you have to release and wait for feedback and you are a great candidate to give that feedback.
  • The debugger does not mean that everything is lost, your first approach sholud NOT be close the debuger and say abandom abandom!
  • Get attach to an image is not a great idea… Your “lastest” image is not there for stay with you for a long time, you have chossen to use under develop image, this means that your image will change (in the happiness path but change anyway) and deliberated decide to attach with an old particular version from a changing image has the worst of two worlds: Now you are not in an stable but you are stuck in a version that nobody is using and by consecuence you don’t have the new fratures/fixes
  • The tests are your friends 🙂 (As usual!)

Tips and tricks!

If you are new at Pharo y strongly recomend to take a look in Mariano Peck‘s blog particullary the post called: Pharo tips and tricks. The shortcuts are really cool and from your previous experience maybe you would say… I can’t do that only with my keyboard… but the true is that probably you can do it! Just take a look in Key mappings

Mitigate the errors

Some tips when you are developing:

  • Use fresh images, if you are in lastest Pharo a week with the same image is really A LOT
  • Allways use a repository to share your code, I strongly recommend smalltalkhub
  • If your project has a complex setup invest some time making a configuration
  • Do not make a lot of actions before commit, small and numerous commit is too much easier than one big commit
  • Reverting the changes is not so easy. There are a lot of effor in this topic and is comming, but is not ready! So, try to keep it simple
  • If you find extrange behavior send email to the user mailing list: pharo-users@lists.pharo.org nobody will judge you for asking!
  • Lenguage barrier is not so hard with a tolerating comunity, Pharo comunity is full of non native-english speaker and is more important to improve Pharo than writing poetry, just ask and try to make that understandable (look… if I’m writing this blog with my horrible english, you shouldn’t care at all)
  • If you can reproduce a problem open an issue in the issue tracker
  • Save your image regulary, if you have worked for a while and you have uncommited changes save your image regulary!
  • Before killing your image because “does not respond” you have the resource to interrupt the process just press alt + ‘.’ or cmd + ‘.’
  • Contribute and give feedback, it’s a way to keep growing

I’m sure we have a lot more, but at least is a beginning!

Developer’s tool in Pharo

I’ve finished an internship in RMOD team in Inria and I was doing some retrospective in the work I’ve done.

The idea for my internship was to improve the developer experience while using Pharo [http://www.pharo-project.org/], (quite challenging!), so in order to do this I was mainly focused on 4 points:

  • Suggestions
  • Code Navigation
  • Syntax Coloring
  • Class Definition representation

Suggestions
While we are coding we usually want to apply actions depending on the element we are writing/seeing, for example if it’s a variable we may want to rename it. But in order to do this, we have big menus to find what we want, usually with lot of options that don’t apply.
The idea in smart suggestions is that based on the context offer only the relevant options.
We use the current AST to do this through RBParser (parseFaultyMethod:parseFaultyExpresion:) and Opal Semantic Analysis.

We choose RBParser because we can parse faulty expressions, with this feature we can offer suggestions while the user types the method.

Opal is the new compiler integrated in Pharo, it’s a very clean implementation and we think that will replace the old compiler, so we didn’t want to couple suggestions with an old compiler. The semantic analysis helps us to complete information form a code, because there is information that we can not know from the code without the context, by example: if we have a temporary variable or an instance one.
With the best AST node for the selection we have the available suggestions and with the semantic analysis we go one level deeper for the variables getting to know the nature: temporary, instance or class to refine the suggestions.

asd

If you want to define a new suggestion you only have to annotate a method with the scope where you would like to offer the suggestion and return an object respecting the SugsSuggestions API.
For returning the object you can extend SugsSuggestion and redefine the desired methods, you can see various examples with the current implementation in the SmartSuggestions-Suggestion package.
Also you can instantiate a generic suggestion: with SugsSuggestion class>>#for:named:icon: where the first parameter is a block with the action to execute and receives a valid SugsAbstractContext.

The available scopes are:

  • <assignmentCommand>
  • <classCommand>
  • <classVarCommand>
  • <instVarCommand>
  • <literalCommand>
  • <messageCommand>
  • <methodCommand>
  • <sourceCodeCommand>
  • <tempVarCommand>
  • <undeclaredVarCommand>
  • <globalCommand>

Here we have an example defining a Dummy action in

DummyAction class >>


newFormattingSuggestion
 <globalCommand>
 ^ SugsSuggestion
 for: [ :context | context formatSourceCode ]
 named: 'My global formater'
 icon: Smalltalk ui icons smallFindIcon

Navigation
Sometimes while browsing code we think in programming terms instead of text, for example we think in a messageSend or a statement instead of word, spaces or symbols.
The idea is to use context information and let the programmer navigate the code thinking in those terms.
In order to do this we find the best ast node through the RBParser and we offer navigations in different directions:
– Parent: The node that contains the selected one. For example if we have the code ‘aNumber between: anotherNumber’ and we are selecting the variable anotherNumber if we navigate to the parent we will go to the message send.
– Sibling: The node in the same level as the selected node. For example in a temporary variables definition: ‘| one two three |’if we are in the variable one we can go to the siblings two or three.
– Child: Node contained by the selected node. For example if we are in a message send: ‘aNumber between: anotherNumber’ we will go the parameter anotherNumber.

It’s very easy to see how the Suggestions + Navigation work together, and how with not too much effort you can improve a lot.
If you want to activate the Node navigation using Command (or Control if unix) + Arrows go to the System Settings and activate: AST Navigation.
ASTColoring
We want to color the syntax we are writing having the most information possible, in order to be able of select the scope where we are or show information associated to that piece of code.
In order to do that we use the AST and the semantic analysis (we need the semantic analysis because we want to show different kinds of variables with different colors, like undeclared variables), through the RBParser (parseFaultyMethod:/parseFaultyExpression:) to obtain the AST representation. The implementation it’s very simple because we can define a new Visitor to which we delegate the coloring algorithm and once we define the coloring from each syntax representation we just visit the tree: ast acceptVisitor: “visitorImplmentation”.
To enable the syntax coloring activate in System Settings: Enable AST based coloring.

Class Definition
Include a representation modeling the arguments and the message as 1st class object. About this point I have written a whole blog post.
Summary
After working with the AST I can say that is a very powerful tool and also is very easy to use. The visitor model is great and very flexible. But we must be careful because we can easily add too many responsibilities there and complicate the design.
Changing the AST structure has an obvious impact in the visitor implementors, and in the current Pharo implementation this can be complicated because we have some interesting users of the AST (like the new Compiler), and if we make a mistake it is very easy to break the image, so if we are going to do some experiments with this it is better to change the compiler first.
Obtaining an AST representation is not so hard to do when we have valid code, but when we are trying to find the best representation for a faulty expression things get nasty, and if we want to colour the text while we are writing we need this feature.

But the main point of everything is the re-affirmation that Pharo is a great tool fordoing experiments, the power of a live environment and reflectivity give us an edge over other environments.
With not so much effort we can implement some tools that improve a lot the programming experience a lot, and everyone can do this!

The class definition message

Q: What is a class definition in Smalltalk?

A: Just another message send!

I love that simplicity… We don’t need control structures because we send messages and as we have an abstraction that represents some piece of code everything is happiness (yeah… blocks are cool stuff).

But still knowing it I go to the System Browser… I see the class definition and I don’t think that is another common message send, in fact when I realize that I can write that “magic code” in a workspace and have the same effect (creating the class) and see it in the system browser I said wow! After a while I thought “so lame… of course is just a message send”.

But however I go to the class definition again select an instance variable and I want to apply the rename refactoring… so… I ask for suggestions… and I got… nothing… just selecting a string and that confused me and then I realize, ok it’s just a message send and is just a string as argument so… that’s logic. The thing is that is not true that is just a message send… because for some reason we expect more in some points, by example when we are in the class definition…

So I did some prototypes for changing the representation and meanwhile I change my mind in the process.

1. The class definition is not represented as a message node

I was radical… I said: A Class Definition is always a class definition, so I changed the parser and when parsing I tried to parse a class definition
The prototype for this can be found in:

MCHttpRepository
location: ‘http://smalltalkhub.com/mc/gisela/ParsingClassDefinition/main&#8217;
user: ”
password: ”

I started with some issues here… The main one is:
How do I know I’m in a class definition, in the prototype you will see a very “innocent” implementation for realizing this… I just assume that if the message identifier I parsed is #subclass: it’s because a full message definition. There is no way to know if you are in a class definition until you finish the parsing.
Lot of the behavior is similar to a message send, and to implement the program node API we have to see how to share this code, it seems an efort without a real motive, because all the problems I was having relies in the fact that I assumed that a class definition is not a message send…

2.The class definition is created from a message send

Something in between, I still changed the parser but… transforming the message node parsed only if it was a class definition.
The prototype is in:

MCHttpRepository
location: ‘http://smalltalkhub.com/mc/gisela/MessageClassDefinition/main&#8217;
user: ”
password: ”

I had less problems 🙂 but still some strange behaviors, it was because I changed the AST and all the users of the ast should change, specially the visitors because now you have classDefinitionNode, categoryNode, variableDefinitionNode (…).

And then I have lots of failling test and for olving in the major part I endedUp delegating in the message node reference I had hold.
But this was expected, since I changed the AST structure in some point all the users of that structures need to be modified, but ended up having like the same that before… lot of behavior shared between a message node and a class definition, and sometimes the “new representation wasn’t even desiarable.
By example asking for suggestions… if I’m writting a method and write somthing like:

Object
subclass: #JunkClass
instanceVariableNames: ‘zzz’
classVariableNames: ”
poolDictionaries: ”
category: ‘DeleteMe-1’.

If I ask for suggestions in zzz I expect something like… “Extract local” and no “Rename” but in the definition space… I want that behavior, so… since that I decided that in some ocassions you look the class definition message as a special stuf, and that depends on the context where you are, so changed the parser wasn’t so great because at that point you don’t how they would like to treat that particular message, that it’s mostly user-decision.

3. Request the class definition explicitly

I don’t change the parser, I add a message to the nodes “asClassDefinition”

MCHttpRepository
location: ‘http://smalltalkhub.com/mc/gisela/ClassDefEnhaceTree/main&#8217;
user: ”
password: ”

And then let the user of that tree decide in which case he is and if it’s who apply to do the transformation or not.

With this approach the direct user of the ast must know if he wants this representation or not, the original tree doesn’t change, and we can think that is that we let that two representation exists, is a similar approach to the semantic analysis by default you don’t have the information and if you want that you explicitly need to say… ok do the analysis.
And in the end

I finish in the beginning, the AST represents the syntax structure and that is how we understand it and how we manipulate it, sometimes we want a transformation, we want more and we enhance that structure for better understanding.

At least for the suggestions it was only in the class definition view. And after dealing with the consequences of change I get convinced that we send message and that’s it, we don’t have and neither need an special syntax for defining a class, is just sending the right message to the right object, it’s simple and it’s very good.

And when we want another approach now we can have it, very easily.

 

Defining the UI

The main feature we want for the rewriting tool UI it’s to be example driven. Two very similar examples:

Option 1 – All in the same view

Imagen

Option 2 – A tabed view

Imagen

Imagen

Option 3 – A 3-Column layout

Screenshot from 2013-06-14 21:57:42

We also want to be possible:

  • select a method and drop into the example text editor.
  • manipulate the ASTs nodes in the text editor to make simple define transformations and matching.
  • highlight the matched parts from the example.
  • select scopes to apply the rule.
  • preview the changes.
  • show the transformation in the example expression.
  • modify an expression throw the example or writing the pattern.
  • output the smalltalk valid code that produces the effect we see in the example.

Next steps…

Now we have a very powerful tool but quite complex to use… The pattern’s definitions ends up been confusing and lot of people dismiss the tool for this.

The fact that the patterns are strings is one of the causes in misunderstanding and encourage to produce lot of mistakes, when we want to match an expression thinking in a method definition we think in terms of: a method contains this statement, or declares this variable and not in complex regular expressions, it’s also true that sometimes we end up with a cryptic expression, hard to understand by others (even for ourselves).

It would be great to have a very expressive api for what we want to do, for example make possible to say something similar to:

“some scope” pattern
includesStatement: anStatement;
hasTemporaryVariable: aVariable.

We should write some nice examples of usage and also take a look into the rewriting engine, we could do lot of improvements.

All of this sounds very ambitious, so I will start for defining an UI for make the patterns auto generated in an easy way with lot of cool features, specially example driven when building expressions.

Once we have the UI and a nice video showing how to use it… we can improve the API and make it more expressive and if we have time we will take a look in the rewrite engine… but first let’s do these easy to use!