MIT Offers Picture-Centric Programming To the Masses With Sikuli

timothy posted about 5 years ago

GUI 154

coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it." Here's a video demo of the technology, and a paper explaining the concept (PDF).

Re:Fucking Communitst (1)

AlexLibman (785653) | about 5 years ago | (#30853398)

Good libertarian / Objectivist / Anarcho-Capitalist trolls at least try to post on topic... Watch me and learn, grasshopper. ;-)

Anyway, did MIT just figure out a way to make computers slower and GUI script kiddies more arrogant?! Yuck! C, perl, and OpenBSD FTW!

Re:How easy IS it? (1)

Hognoxious (631665) | about 5 years ago | (#30852474)

Have you seen his wife recently?

Hadlock (143607) | about 5 years ago | (#30852032)

The subtitles were a bit of a surprise. Can MIT not afford better than built in microphones on cheap laptops? Between her vaugely asian accent, the poor quality of the audio (seriously, you're TELLING people how to do something, the audio is important here - did they record this in a shower stall or something? my netbook's audio sounds 100x better than this), and then apparently some sort of wacky audio encoding basically makes her impossible to understand. People who speak english as a second language aren't going to be able to understand this, thank god they did the subtitles.
Neat concept though.
Neat concept though.

Re:MIT can't afford real microphones (1)

pclminion (145572) | about 5 years ago | (#30852348)

On the contrary, my experience has been that non-native speakers of English are actually better at understanding other non-native speakers. I don't know why that is, but intuitively it makes sense -- non-native speakers probably learned from a diversity of other non-native speakers.

I was at a WinHEC panel session in 2008 and the panel leader had absolutely horrible English (I'm sure he was intelligent, but he wasn't intelligible). Somebody else, clearly of another racial background (the specific ethnicities are unimportant) stood up and asked a question, also in completely unintelligible English. The questioner and speaker went back and forth for several minutes speaking. Other non-native speakers in the audience were nodding their heads emphatically, indicating they could understand as well. I looked around and every American in the room seemed completely baffled.

Re:MIT can't afford real microphones (1)

Yvan256 (722131) | about 5 years ago | (#30852704)

That's because non-native speakers can't string the words together, they have to cut them up individually. If that makes any sense.

Cut up words? (0)

Anonymous Coward | about 5 years ago | (#30854198)

Now why would you want to do that?

Re:How easy IS it? (1)

0100010001010011 (652467) | about 5 years ago | (#30852456)

Wow, no one has watched the movie Swordfish [imdb.com] have they?

Re:How easy IS it? (2, Funny)

Anonymous Coward | about 5 years ago | (#30852844)

Wow, no one has watched the movie Swordfish have they?

We're trying to repress those memories, you insensitive clod!

FrontPage? (2, Interesting)

Itninja (937614) | about 5 years ago | (#30851710)

Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.

Re:FrontPage? (0)

Anonymous Coward | about 5 years ago | (#30851750)

Sounds like LabView - very useful for somethings, painfully tedious for others.

Re:FrontPage? (1)

ArhcAngel (247594) | about 5 years ago | (#30852448)

That was my first thought as well. I programmed in HP VEE [agilent.com] and Labview [ni.com] in the early nineties.

Better (2, Interesting)

pavon (30274) | about 5 years ago | (#30851920)

Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).

Re:Better (0)

Anonymous Coward | about 5 years ago | (#30852048)

This isn't that new. What about Logo or Turtle or whatever it was called back in the '80s. Programming with pictures.

Re:Better (1)

BitZtream (692029) | about 5 years ago | (#30852256)

I can think of at least 3 ways of doing (scripting gui apps that aren't scriptable) already that have been around for years.

Re:Better (0)

Anonymous Coward | about 5 years ago | (#30854010)

GUI automation has been around for quite some time.
I personally have written programs to automate GUIs both web pages and desktop applications.

What is new here is the unnecessary extra work of image recognition.
I hope it doesn't try to do recognition every time and instead stores the UI element and uses the element directly.

What happens if your background changes?
Does the script break?

Re:Better (0)

Anonymous Coward | about 5 years ago | (#30855256)

Just remember to never change themes.

Re:FrontPage? (4, Informative)

gad_zuki! (70830) | about 5 years ago | (#30852272)

>And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.

Re:FrontPage? (1, Insightful)

mustafap (452510) | about 5 years ago | (#30852726)

>Do you want to democratize technology or just have it controlled by elites?

Neither. I'd like to see people who wish to program, learn how to.

Re:FrontPage? (3, Insightful)

AardvarkCelery (600124) | about 5 years ago | (#30855752)

Yeah, that's real easy for a programmer to say. Ever used a brownie mix? I'll bet a pastry chef would say, "I'd like to see people who wish to bake brownies actually learn how to bake brownies properly." Tools like Sikuli are the programming equivalent to brownie mix. It's easy gratification. (... or at least easier than learning to capture part of the screen and then do fuzzy image pattern matching on it.) If I were a very casual, light duty programmer, this would be pretty helpful sometimes.

Re:FrontPage? (1)

Yvan256 (722131) | about 5 years ago | (#30852754)

The problem with FrontPage wasn't the users, it was the code that it produced.

Re:FrontPage? (1)

Xiaran (836924) | about 5 years ago | (#30852764)

Elite or competent? I'm all for people tinkering with software in their spare time the problem is people who arent qualified start thinking *everything* in software development is as simple as the tiny little things they are doing. Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

Re:FrontPage? (4, Insightful)

BobMcD (601576) | about 5 years ago | (#30853586)

Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.

Re:FrontPage? (0)

Anonymous Coward | about 5 years ago | (#30855224)

Changed out? Then why is the local hospital here hiring VB developers?

Re:FrontPage? (1)

idontgno (624372) | about 5 years ago | (#30853516)

Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

Thousands of cars on cinderblocks and dozens of houses with flooded basements are testimony that sometimes, paying someone is the only thing that isn't ridiculous. There's DIY, and there's "OMG you are SO in over your head." Anyone whose software development abilities are so stunted that the "advancement" outlined in TFA would help them is absolutely in the latter category.

Re:FrontPage? (1)

ilsaloving (1534307) | about 5 years ago | (#30855446)

There is a minimum level of skill and talent required to do anything. The only thing that happens when you make something "so simple anyone can do it", is a minefield of crap software. Instructing a computer to do something requires the ability to think abstractly, and organize/plan with an orders of magnitude more sophistication than "Do I want eggs or pancakes for breakfast?". Arguing that the 'elites' are pushing down the 'DIYers' is disingenuous. A real DIYer will overcome the learning curve of whatever they're trying to do, because they care enough to put the effort into it.

There's a big difference between that, and someone who just wants to slap a bunch of widgets together and expect it to work.

The end result is a bunch of people who don't know what the hell they're doing, but demand that they be called programmers. You also get other people who, when needing specialized software to run some key part of their business, look to these non-skilled 'programmers', and then turn to the skilled people and complain how unreasonable their higher rates are. It's downright insulting.

It's the same mindset (or lack thereof) that many people think Y2K was a big waste of time and money because 'nothing happened'.

Hell, it's (relatively) easy to program an iPhone too. What do we have? Tens of thousands of apps that emit varying types of fart sounds.

Potential (2, Insightful)

zero0ne (1309517) | about 5 years ago | (#30851804)

Especially for Testing your GUI.

This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

Re:Potential (0)

Anonymous Coward | about 5 years ago | (#30852100)

The only problem is that it is non-deterministic. It works off of pattern matching. There are cases where you can have statistical matches that are wrong such is having multiple "Add" or "Remove" buttons on your screen. Not to mention upgrades with new icons/graphics/layout/test will add "noise" to the search domain.

Re:Potential (1)

Jonah Hex (651948) | about 5 years ago | (#30852278)

Watching the YouTube demo, I immediately thought of how basic this is compared to AutoIT's functions, and even the quick record function is faster to "program" with than this screenshot function.

It says it can tolerate some changes, but what if there is a completely different visual theme installed? What if a drop down is not on the same item it was when you made the script? AutoIT can take care of this by reading the underlying GUI code to allow for these kind of things. As someone who has been automating OS/Software installs since before Windows, I know you can not expect things to work the same way every time when doing so.

Jonah HEX

Re:Potential (1)

BitZtream (692029) | about 5 years ago | (#30852420)

The MS test crap in the latest versions of VisualStudio do it as well, and they'll be happy to find a button (if its a standard control) to click on using other data rather than mouse coordinates as well.

Re:Potential (1)

gad_zuki! (70830) | about 5 years ago | (#30852422)

>This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

Right, its AutoHotKey/AutoIT with a nicer OCR library. Perhaps this will light the fire under the butts of the AutoHotKey devs and add in some smarter screen reading and browser integration.

Re:Potential (1, Interesting)

Anonymous Coward | about 5 years ago | (#30853390)

Eggplant [testplant.com] says hi.

As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out. Web sites change way too often for something like this.

The key to writing maintainable scripts is finding and hooking into the property that is least likely to change. If you're automating Windows Forms .NET apps, you might be able to get the actual variable name. If you're automating web pages you could look at the id or name of the control. You can look at the text of a button or the label of a textbox. You find whatever you can that won't change.

On Windows, use AutoIT [autoitscript.com] if you want something free. There's better commercial tools but they start in the hundreds of dollars and only go up from there.

For web automation, look at watir [watir.com] , WebDriver/Selenium [google.com] , or WatiN [sourceforge.net] .

On Macs you get these nice tools called AppleScript and Automator. These are made for end users. They don't use the UI, but instead use an interface made just for automation.

If you can at all avoid it, I recommend not using image recognition tools. They're extremely fragile. That said, sometimes it can't be avoided. I'll probably take a look at the source to see if there's anything I can use in those few cases where image recognition is unavoidable.

MMO macro maker? (4, Interesting)

visgoth (613861) | about 5 years ago | (#30851842)

This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.

Re:MMO macro maker? (1)

BoppreH (1520463) | about 5 years ago | (#30852690)

Things to take into account:

- selecting and clicking on see-through buttons (the background will change too much)
- the program access to the actual game for seeing, clicking and typing
- the game's anti-hack detection / counter-measures
- macro playing lag (see video)

But it seems very promising nevertheless.

Re:MMO macro maker? (1)

Arimus (198136) | about 5 years ago | (#30853124)

Add in the number of pilots who even if they're anti-pirate operate a KOS policy when it comes to macro miners....

Re:MMO macro maker? (1, Offtopic)

burkmat (1016684) | about 5 years ago | (#30853034)

I don't know how much experience you have in EVE, but generally, if you're AFK you're dead meat. Suiciding miners even in hisec is quite fashionable these days.

Re:MMO macro maker? (1, Offtopic)

visgoth (613861) | about 5 years ago | (#30853440)

I've done a fair bit of mindless semi-afk mining during my time playing eve, and never had much trouble with suicide attackers, can flippers, or other such stuff. I'd imagine that taking the usual minimal precautions like parking in a dead end, low traffic system would work relatively well.

Depending on how robust sikuli is, it might be possible to make a mission running macro, which could be even safer than blasting rocks (with the right ship setup, and such). Barring that I'd likely use sikuli on a second account to automate monkey work. Things like post-mission looting/salvaging, hauling, etc. are wonderful candidates for macro abuse.

Re:Click Fraud Boosters Away!! (1)

BitZtream (692029) | about 5 years ago | (#30852344)

There are far easier ways to commit click fraud than actually looking at the screen to do it. The ad companies tend to ignore the same request multiple times from the same IP so this changes nothing.

People who commit 'click fraud' aren't writing crappy little screen scrapers to do it, its far easier and faster to write a plugin for firefox to do what you're say and just find the text of your ad on the page and trigger the link. No need to futz with whats displayed or 'moving the mouse' to the right spot, you just tell Firefox to find the link and trigger it.

A relatively simple WebKit wrapper would work equally well.

My grandmother knows python (5, Insightful)

Anonymous Coward | about 5 years ago | (#30851964)

"Computer users with rudimentary skills"..... "with a basic understanding of Python"?

Re:My grandmother knows python (0)

BitZtream (692029) | about 5 years ago | (#30852396)

You're reading a story about MIT on slashdot.

Two groups that are so utterly disconnected from the real world that they both have no idea why their favorite toy hasn't taken over the world even those its the simplest, most efficient, easiest to use, most feature rich (insert whatever here) on the planet.

Most of both groups probably think grandma knows assembly as well.

Re:My grandmother knows python (5, Funny)

Fred_A (10934) | about 5 years ago | (#30852710)

"Computer users with rudimentary skills"..... "with a basic understanding of Python"?

Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.

Re:My grandmother knows python (0)

Anonymous Coward | about 5 years ago | (#30853008)

I don't see why not.

Re:My grandmother knows python (1)

Alex Belits (437) | about 5 years ago | (#30855604)

Moar liek BASIC understanding of a python.

Re:My grandmother knows python (3, Informative)

AardvarkCelery (600124) | about 5 years ago | (#30855796)

If a friend wanted to learn just enough programming to do a few light chores, what would you recommend? Python is arguably one of the easiest languages to learn. Randy Pausch used it for Alice [alice.org] , which has been successful for teaching middle school girls how to program. So if "computer users with rudimentary skills" means rudimentary programming, then that works for me.

The Cow pat model (5, Funny)

Anne Thwacks (531696) | about 5 years ago | (#30852030)

Yeah - lets hear it for a new development model:

For years I have been asking for a softwsare development tool that allows me to write PHP code by throwing cow-pats at the screem with the Wiimote.

And my colleagues wat a tool that allows dispatching my bugs with the Wii gun attachment they use in "Quantum of Solace".

Re:The Cow pat model (-1, Troll)

Anonymous Coward | about 5 years ago | (#30853304)

First, they need to develop some system that alerts you to misspellings.

High? (1)

instagib (879544) | about 5 years ago | (#30852056)

FTFA: "Sikuli -- which means God's eye in the language of the Huichol Indians in Mexico". Mexican Indians love their hallucinogenic Peyote [wikipedia.org] . On the other hand, MIT researchers want the masses to program with the mouse. Well, I know about "correlation is not causation", but MIT sure is an interesting place to be.

Expect (0)

Anonymous Coward | about 5 years ago | (#30852164)

This is a GUI version of Expect. Nothing really groundbreaking. It will also break as soon as the app changes how it looks, just like Expect. I hate expect passionately.

Re:Expect (1)

Razalhague (1497249) | about 5 years ago | (#30852924)

How would it not break? You don't expect your regular program to work if the API it's using changes, do you?

Right hands great- chances are more harm than good (1, Interesting)

Anonymous Coward | about 5 years ago | (#30852190)

Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in practice too because they "can" do it- it just the results of their work takes more maintenance. It reminds me of this .bat file written for this video store that backs up a database to a flash drive. If it had only had a statement to check if the flash drive were present and alert the user they wouldn't of wasted $80 calling me to come and find out why the backup program wasn't working. Seriously dumb programmer. In the right hands this kind of thing is good. In the wrong hands it is bad.

Program, NOT code. Think MACRO (3, Insightful)

SmallFurryCreature (593017) | about 5 years ago | (#30852232)

From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.

But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.

Interesting stuff. Just don't think you will be writing software with this.

Re:Program, NOT code. Think MACRO (1)

eulernet (1132389) | about 5 years ago | (#30853714)

Interesting stuff. Just don't think you will be writing software with this.

Since a few years, programming has become equivalent to placing Lego bricks in the correct order (I'm working with Microsoft .NET and tons of components).

So I'm not very surprised by the approach, as long as we can find all the possible varieties of pieces.

Re:Program, NOT code. Think MACRO (0)

Anonymous Coward | about 5 years ago | (#30855658)


Re:Program, NOT code. Think MACRO (1, Interesting)

Anonymous Coward | about 5 years ago | (#30854196)

Don't use a tool like this for testing. Start with AutoIt or nunit+white [codeplex.com] , and look at commercial tools if those don't do what you need.

Re:Program, NOT code. Think MACRO (1, Interesting)

Anonymous Coward | about 5 years ago | (#30854796)

Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.

bad VB flashbacks (1)

mirix (1649853) | about 5 years ago | (#30852378)

I'm suddenly reminded of horrible apps written in VB97, with no concern for the back end, horrible input kludge, etc.

Re:bad VB flashbacks (1)

YourExperiment (1081089) | about 5 years ago | (#30853096)

I'm suddenly reminded of horrible apps written in VB97

You're 93 versions ahead of your time - VB6 was the last version of Visual Basic before .NET.

Perhaps more to the point, this not only targets a completely different purpose than Visual Basic, but also looks nothing like it whatsoever.

Re:bad VB flashbacks (0)

Anonymous Coward | about 5 years ago | (#30853524)

Visual Basic 5 was released in 1997, as part of Visual Studio 5. It installed itself in a directory called VB97. VB6, incidentally, installed itself in a directory called VB98.

The smart-assery is weak with this one.

Re:bad VB flashbacks (0)

Anonymous Coward | about 5 years ago | (#30855464)

Not to mention that 97-6=91 not 93.

Re:bad VB flashbacks (1)

ClosedSource (238333) | about 5 years ago | (#30854210)

That's OK. For most VB apps there wasn't any "back end".

pushing-robot (1037830) | about 5 years ago | (#30852848)

Sorry, there are some things even Sikuli can't process.

Sikuli (0)

Anonymous Coward | about 5 years ago | (#30852428)

Sikuli velly nice. Near Itari. Parelmo, velly nice. Except warret got storen.

CLI (0)

Anonymous Coward | about 5 years ago | (#30852514)

This is where we get when everything is a GUI. As long as I have a decent shell & environment, I think I prefer shell scripting.

Yes, but can Sikuli be used to write Sikuli? (3, Funny)

hellop2 (1271166) | about 5 years ago | (#30852520)

Otherwise it's just not complete, IMHO.

Re:Yes, but can Sikuli be used to write Sikuli? (2, Interesting)

Seor Jojoba (519752) | about 5 years ago | (#30853158)

Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.

Perfect Macro program... (1)

BoppreH (1520463) | about 5 years ago | (#30852574)

... but does anyone knows if the program is always that slow?

I understand that it has to visually find the button and this is computationally expensive, but the 2~3 seconds lag didn't seem compatible with the task.

On a sidenote, the video states that there's no "internal API" dependence, but it clearly has to send "click" and "type" signals. Is that really OS independent or was it just an overstatement?

Re:Perfect Macro program... (1)

babyrat (314371) | about 5 years ago | (#30854420)

the video states that there's no "internal API" dependence

I suspect they were referring to internal API of the program being controlled. ie COM, Corba, etc...

lame (2, Insightful)

Charliemopps (1157495) | about 5 years ago | (#30852604)

This is the same sort of scripting you can do with many already existing languages. Autohotkey for example. The only new feature would be the ability to copy the screenshot directly into the program as apposed to taking it outside the program and referencing the file directly. I'd say that this scripting language is actually weaker because of it. As far as using this inside a game... they are already hardened against this sort of thing. For example, next time you're in EVE look at the buttons you use. They are semi-transparent. This is not just for aesthetics. If you take a screenshot of the button, and then change your camera angle the button looks different because what's behind it is different. That doesn't mean you can't script inside EVE, you just have to be a lot more clever than using a script to click on a static image of the gui. This language would be almost completely useless in any GUI that has any transparency. Which I'd think would include Vista, Win7 and even Macs with the right stuff turned on.

Re:lame (1)

misexistentialist (1537887) | about 5 years ago | (#30852954)

Using screenshots seems more effective than instructing autohotkey to click on coordinates

Re:lame (1)

HaeMaker (221642) | about 5 years ago | (#30853644)

So, you tried it and it didn't work?

Re:lame (1)

sky289hawk1 (459600) | about 5 years ago | (#30854624)

The sikuli language supports fuzziness. You can actually have a "close match", and you can set the tolerance.

Re:lame (1)

mattack2 (1165421) | about 5 years ago | (#30855630)

I didn't RTFA, but basing this stuff on the *accessibility* view of the screen is/can be useful.

Applescript was invented a LONG time ago people... (1)

RocketRabbit (830691) | about 5 years ago | (#30853118)

It can script GUI actions in much the same way. Granted it's not a very nice environment for more complicated work, but still.

Re:Applescript was invented a LONG time ago people (1)

babyrat (314371) | about 5 years ago | (#30854436)

The last time I tried to use Applescript on windows or linux, it wouldn't even start up.

Its a brilliant idea. (2, Insightful)

Seor Jojoba (519752) | about 5 years ago | (#30853122)

Come on, let's cut through the default Slashdot snark. The image capture aspect of Sikuli is brilliant! I don't like the tagline "program anything with Sikuli" because 99% of software should be written in something else. But think of writing test scripts that can use the image matching features. If the software works as advertised, then you could throw together UI test cases way faster than anything else I've seen. System administration tasks should be a good match too. The resulting code would be brittle and hard to maintain, but for quick one-off scripts, sure... I can see it.

Re:Its a brilliant idea. (1)

rmcd (53236) | about 5 years ago | (#30856030)

Couldn't agree with you more. I'm surprised by all the negativity. And it seems to me this is innovative enough to have uses that no one here is thinking about right now.

Problems (1)

master_p (608214) | about 5 years ago | (#30853314)

The script may not work if the UI style is different from the one recorded or if the UI language is different from the one recorded. Generally, any option that can change the UI from computer to computer will create a problem for Sikuli.

Re:Problems (1)

VortexCortex (1117377) | about 5 years ago | (#30854030)

It's even worse than that... Just change your icon or window border theme and watch every Sikuli script break.

The great thing about all other languages except Sikuli is: When you change your Icon or window border theme the programs still run.

fork bomb, or loop? (0)

Anonymous Coward | about 5 years ago | (#30853434)

Has anyone tried writing a Sikuli script that finds the Sikuli IDE window and clicks the green run button?

Again!?! That trick never works. (1, Insightful)

Anonymous Coward | about 5 years ago | (#30853508)

This time for sure!

The Sikuli School of Programming (2, Funny)

presidenteloco (659168) | about 5 years ago | (#30853784)

if NOT understand logic then
      talkTo (self, "Don't program!")
      Look (@ Pretty pictures)

Google Video Search? (0)

Anonymous Coward | about 5 years ago | (#30854184)

This might have potential, depending on how flexible the pattern match is when looking for thumbnails of, ahhh, things...

It's Not Going Anywhere (1)

Clugy (1325793) | about 5 years ago | (#30854360)

I'd be curious to see how they handle the back end, especially as some others pointed out it does make calls that seemingly require some hook into the OS. As for its usefulness, I doubt it will really take off beyond being a decent prototype. It relies on image matching so if you use and change a custom icon set all your scripts would be kinda worthless. Same goes if the programs you are "screenshot scripting" receive a major overhaul in the GUI department. Until it can address those issues, I doubt it will really take off.

Think executable step-by-step tutorials (4, Insightful)

tucuxi (1146347) | about 5 years ago | (#30854416)

Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".

As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.

Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.

Re:Think executable step-by-step tutorials (3, Interesting)

tristanreid (182859) | about 5 years ago | (#30855082)

I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.

I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.


You're doing it wrong. (0)

Anonymous Coward | about 5 years ago | (#30854674)

If you have to write a script to automate GUI applications you're undermining the purpose of computers. I'm sitting here imagining people automating deletion.

Re:You're doing it wrong. (1)

tomhath (637240) | about 5 years ago | (#30854980)

I mostly agree with you, it's always silly to automate a sequence of GUI actions.

However I can see where they're going here; the program examines your screen and finds the widget to click on or enter data into, much like a human looking at the screen and deciding what to do next. Extend that to the real world, a robot that looks around your room for the remote control and turns on the TV, then surfs through the channels until it recognizes something you like to watch. By then it will also be capable of understanding speech and making decisions autonomously. Computers will be thinking like humans within just a few years. Oh wait.

Use This for Software Testing, and Scripting? (1)

LifesABeach (234436) | about 5 years ago | (#30855062)

I just open this can of worms up, but the first thing I thought of after seeing the demo was, "Can I push a button on a Flash page?"

Re:Use This for Software Testing, and Scripting? (1)

phi2one (762028) | about 5 years ago | (#30855734)

I am wondering the same thing myself; If all it's doing is scraping the screen buffer somehow, I don't see why not.

What's so wrong with TurboTax? (2, Interesting)

AardvarkCelery (600124) | about 5 years ago | (#30855292)

Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.

(BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)

SendKeys (1)

codepunk (167897) | about 5 years ago | (#30855704)

Wow they just created the old VB SendKeys command. I was actually doing stuff like this 12-14 years ago with SendKeys command in VB. In "practical" use back then
it sucked and I am certain that has not changed.

AutoIt (1)

White Flame (1074973) | about 5 years ago | (#30855730)

I did this exact same thing in AutoIt [autoitscript.com] , except that it needs exact matches of images instead of a fuzzy recognizer. (Plus, I also had rule triggers and state vs just a single list of imperative commands)

The fuzzy match is a nice addition, but this automation concept has been available for years.

Better Solution one line (1)

codepunk (167897) | about 5 years ago | (#30855778)

man ifconfig

Spammers Rejoice! (1)

VortexCortex (1117377) | about 5 years ago | (#30855842)

Just Great... all the spammers need now is a few CAPTCHA deciphering Sikuli plug ins.

Once that's done we can all go back to manually removing spam from our web forums and in-boxes.

Bobby Tables (1)

gmuslera (3436) | about 5 years ago | (#30855856)

How you sanitize your inputs in a language that checks what is displayed on the screen? Instead of xss or sql injection you could end being hacked by watching a mail attached normal picture if that kind of programming becomes popular.
Load More Comments
