Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Why Don't Open Source Databases Use GPUs?

Unknown Lamer posted about 9 months ago | from the connection-machines-rise-from-the-grave dept.

Databases 241

An anonymous reader writes "A recent paper from Georgia Tech (abstract, paper itself) describes a system than can run the complete TPC-H benchmark suite on an NVIDIA Titan card, at a 7x speedup over a commercial database running on a 32-core Amazon EC2 node, and a 68x speedup over a single core Xeon. A previous story described an MIT project that achieved similar speedups. There has been a steady trickle of work on GPU-accelerated database systems for several years, but it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc. Why not? Many queries that I write are simpler than TPC-H, so what's holding them back?"

cancel ×

241 comments

Sorry! There are no comments related to the filter you selected.

Something something online sorting (5, Informative)

Anonymous Coward | about 9 months ago | (#45781889)

...because I/O is the limiting factor of database performance, not compute power?

Re:Something something online sorting (5, Insightful)

Arker (91948) | about 9 months ago | (#45781975)

Wow, a fp that hit the nail on the head.

Indeed, database applications tend to bottleneck on I/O, not processor, so most uses would see little gain from this. That's probably the biggest reason no one has bothered to do it.

Certain uses would probably benefit, but then there are other reasons too. You run databases on machines built for it, not gaming machines, so it's not like they already have this hardware. You would have to buy it and add it as an expense. And GPUs are error prone. Not what you want in most database applications either (although again, there may be niches where this would be ok.)

Re:Something something online sorting (4, Insightful)

Runaway1956 (1322357) | about 9 months ago | (#45782005)

I'll add that most people who put up the cash for high performing GPU's aren't much interested in actually "computing" with them. They are far more interested in "gaming". They demand video performance, as opposed to crunching database numbers. Those companies that are most likely to pay people for manipulating data bases generally have little interest in top notch video, so they aren't going to pay for hundreds of GPU's.

Re:Something something online sorting (5, Insightful)

houstonbofh (602064) | about 9 months ago | (#45782065)

... so they aren't going to pay for hundreds of GPU's.

Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.

Re:Something something online sorting (4, Insightful)

girlintraining (1395911) | about 9 months ago | (#45782617)

Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.

Is now a bad time to point out that many researchers have built clusters based out of thousands of GPUs to model the weather, protein folding, and other things? As it turns out, gamers aren't the only ones that buy GPUs. And GPUs aren't functionally all that different from FPGAs, which as I understand Linus went off to Transmeta to build CPUs based off such architecture.

I'm irritated whenever people here on slashdot can't see past their own personal experience; it's become quite sad. The true innovators don't see something that's already been done and figure out how to do it better. They see the same things as everyone else, but put them together in radically new ways nobody's ever thought of before.

GPUs for database processing? That's crazy! Which is why it's innovative and will push the limits of informational technology. three hundred quintillion polygasmic retina displays with 99 billion pixels to play Call of Duty 27 will never do that. Most slashdotters that put down an idea like this really have no concept of what geeks and hackers do.

We push the limits. We fuck with things that ought not to be fucked with. We take the OSI 7 layer model, set it on fire, turn it inside out, and hack out new ways to do do it by breaking every rule we can find. We go where we aren't wanted, aren't expected, and we push every button we can find. We do things precisely because people tell us it's impossible, that it can't or shouldn't be done, and take great pleasure in finding novel new ways to do something even if there's already twenty proven ways to do it.

And while probably 99 times out of a 100, the experience matters only for the hacker or geek doing it, and is done merely to learn... that glorious one time when something unexpected and interesting happens, that is what all progress on this industry is based on. And people like you who belch about "synthetic benchmarks" and insist nobody would do X because that's just stupid will never understand.

Re:Something something online sorting (0, Informative)

Anonymous Coward | about 9 months ago | (#45782111)

Try getting a top end Radeon card for a reasonable price at the moment.

They're being bought out by cryptocoin miners (LTC, for example) to the point that there's a supply shortage that's pushed the price way above MSRP.

GP's point about GPUs being error prone is only partially correct; they're prone to errors if pushed beyond their power or thermal limits, and most DIY machines don't pay enough attention to either. I'm running an old 560ti on a number of BOINC projects, underclocked slightly and well cooled. Still much, much faster than CPU processing (an FX-8350), yet will crunch happily for as long as I leave the machine running and not produce any validation failures.

Re:Something something online sorting (0)

Anonymous Coward | about 9 months ago | (#45782255)

For Lite coin yes because it was basically designed to work best on a GPU but it is a really small market. For Bitcoin, the one that most people think of, if you are still using GPUs you are losing money unless you are stealing your electricity. The FPGA and ASIC rigs are the only ones doing enough hashes per watt to still make money and the FPGA based ones probably have less than a year of profitable life left in them.

Re:Something something online sorting (4, Interesting)

ron_ivi (607351) | about 9 months ago | (#45782339)

performance ... put up cash...

The biggest opportunity for GPUs in Databases isn't for "performance". As others pointed out - for performance it's easier to just throw money at the problem.

GPU powered databases do show promise for performance/Watt.

http://hgpu.org/?p=8219 [hgpu.org]

However, energy efficiency is not enough, energy proportionality is needed. The objective of this work is to create an entire platform that allows execution of GPU operators in an energy proportional DBMS, WattBD, and also a GPU Sort operator to prove that this new platform works. A different approach to integrate the GPU into the database has been used. Existing solutions to this problem aims to optimize specific areas of the DBMS, or provides extensions to the SQL language to specify GPU operation, thus, lacking flexibility to optimize all database operations, or provide transparency of the GPU execution to the user. This framework differs from existing strategies manipulating the creation and insertion of GPU operators directly into the query plan tree, allowing a more flexible and transparent framework to integrate new GPU-enabled operators. Results show that it was possible to easily develop a GPU sort operator with this framework. We believe that this framework will allow a new approach to integrate GPUs into existing databases, and therefore achieve more energy efficient DBMS.

Also note that you can write PostgreSQL stored procedures in OpenCL - which may be useful if you're doing something CPU intensive like storing images in a database and doing OCR or facial recognition on them: http://wiki.postgresql.org/images/6/65/Pgopencl.pdf [postgresql.org]

Introducing PgOpenCL - A New PostgreSQL Procedural Language Unlocking the Power of the GPU

Re:Something something online sorting (1)

JWSmythe (446288) | about 9 months ago | (#45782539)

Well, gaming machines do make great servers. What is a gaming machine? Fast CPU, lots of memory, fast storage. The only difference is the video card. For home built servers in PC cases, I just don't bother with the pesky high end video cards. They run so much cooler and quieter. I'd hate to have a rack of servers at the house. I rather not have a jet engine running in the next room. :)

Re:Something something online sorting (1, Troll)

Arker (91948) | about 9 months ago | (#45782569)

"The only difference is the video card."

ROFL you may know how to build a gamer rig but you certainly know nothing about servers, to have said that.

You'll use an entirely different class of hardware from the ground up. Different class of motherboard, different class of RAM, at most you *might* use the same box and power supply. You dont need a videocard at all, just a serial port, and these days it's more likely to be a rackmount than a box anyway.

Re: Something something online sorting (0)

Anonymous Coward | about 9 months ago | (#45782827)

A lot of people don't get this. It is worth calling out.

A database is a development environment. You build the tool you need. There are serious tradeoffs, and yes, IO is the usual limiting factor.

Packaged tools that include ...and install MySQL are great, but that isn't what most businesses need.

And here is my opportunity to piss people off about why typed data is good, compute farms are great at being compute farms, but they aren't a database, and Ruby can go die in a fire.

Re:Something something online sorting (1)

the_B0fh (208483) | about 9 months ago | (#45782863)

Curious why do you think GPUs are error prone?

Not true (4, Insightful)

kervin (64171) | about 9 months ago | (#45782127)

...because I/O is the limiting factor of database performance, not compute power?

Just a few projects into Database Performance Optimization would convince you that's not a true statement. IO/Memory/CPU are in fact largely interchangeable resources on a database. And depending on your schema you can just as easily run out of any of these resources equally.

For instance, I'm currently tuning a SQL Server database that's CPU heavy based on our load projection targets. We could tweak/increase query caching that would cause more resultsets to stay in memory. This would mean that less complex queries would be run, drastically reducing I/O and some CPU resource usage. But then drastically increasing memory usage. This is just a simple example of course to illustrate the point.

Databases run out of CPU resources all the time. And a CPU advancement would be very well received.

My guess as to why this hasn't been done is that it would require end-users to start buying/renting/leasing GPU enabled hardware for their Database infrastructure. This would be a huge change from how we do things today and this sector moves very slowly.

Also we have many fairly old but more important Database advancements which have been around for years and are still almost unusable. If you ever tried to horizontally scale most popular Open-source databases you may know what I'm talking about. Multi-master, or just scaling technology in general, is required by about every growing "IT-dependent" company at some point. But that technology ( though available ) is still "in the dark ages" as far as I'm concerned based on reliability and performance measurements.

Re:Not true (1)

Bengie (1121981) | about 9 months ago | (#45782565)

Rule of thumb, if your dataset can fit in memory, it probably won't benefit from GPUs. Talking about 10TB+ datasets and few long running Data Warehouse style queries, not small OLTP style queries. GPUs take a crap if you have any branching, so all queries used must not have any conditions that can cause different rows to take different branches to be useful, so very basic WHERE statements.

Re:Not true (0)

Anonymous Coward | about 9 months ago | (#45782599)

Databases run out of CPU resources all the time. And a CPU advancement would be very well received.

I'd have to agree that database servers can certainly run out of CPU resources before anything else (it may not be the most common bottleneck, but it happens frequently enough). However, in the cases where I've seen this happen - and where it wasn't caused by badly tuned queries or client code, it was mostly because the server was simply underspecced. I.O.W. there were still valid upgrade paths to expand the CPU power the conventional way. I'm not entirely up to date on latest GPU prices but when they say "top notch" and it's 68 times as fast as a single core, I'm not so sure the GPU is going to be more cost effective than using 8 core CPUs when you want to cram them in server rack space. And what's the calculation on power usage and cooling requirements? Can you even put more than 1 in a pizza box without burning down your data center? Will my hardware provider support emergency replacements for such a setup?

High single thread performance in a TPC-H record setup is nice, but is it viable in a real world scenario? As long as they don't show that part, it's going to remain in the academic environment, like much they seem to be doing with GPUs these days.

true. large, busy Wordpress CPU bound (1)

raymorris (2726007) | about 9 months ago | (#45782621)

Indeed. We have a large WordPress based site and it is bound by database CPU despite the fairly powerful CPU it uses. It should scale to many cores, so I'm thinking of trying a pair of the 8 core AMD processors. Intel is faster PER CORE, but an AMD rig could have 16 cores.

GPU-CPU link is slow -- AMD HSA will help (0)

Anonymous Coward | about 9 months ago | (#45782663)

I think the problem is that copying data from the GPU to the CPU is too slow to work on all queries and would be hard to model into a query optimizer. When AMD's Kaveri processor are released in a few weeks, I think several DB's will add patches to to improve DB performance. The 7X improvements are for queries where the entire DB is stored in GPU memory.

Re:Not true (0)

girlintraining (1395911) | about 9 months ago | (#45782691)

IO/Memory/CPU are in fact largely interchangeable resources on a database.

I can't believe someone up-modded you for saying such a patently stupid thing. This is like saying the tires and the gas tank on a car are interchangeable. These are separate resources, and any competent network administrator will conduct simulations to find out what the proportions of each will be. And it's different for every project and use scenario.

Facebook has far different needs for its database than Google does; Even if they are both websites. Google needs to take large amounts of data which is randomly accessed and perform complex queries on it; it is much more cpu-intensive than Facebook, which while its pages are rendered dynamically, can be heavily cached and predictive algorithms will be highly effective -- people don't search on Facebook, they view.

You're full of shit suggesting that you can make "drastic" tradeoffs here. You might be able to make a non-trivial tradeoff, even a significant one in certain use scenarios -- but to suggest that everything is interchangeable is absurd. There are limits that simply cannot be exceeded no matter how much wishful thinking you want to throw at it.

Re:Something something online sorting (2)

fzammett (255288) | about 9 months ago | (#45782133)

Very good point, entirely correct. However... for an in-memory database I wonder if there's gains to be had? I'm not sure CPU-memory I/O is much of a bottleneck, though such DBs aren't suitable to every task of course.

Re:Something something online sorting (1)

K. S. Kyosuke (729550) | about 9 months ago | (#45782395)

Even with in-memory databases, most of the stuff are simple computations with a large amount of data with often random access. GPUs like a lot of computation with streaming data. Also: data copying, virtual memory support. But perhaps Kaveri and its successors will be more useful for that.

Re: Something something online sorting (0)

Anonymous Coward | about 9 months ago | (#45782253)

I've seen many databases that are cpu limited. Things like joins and sorts are compute expensive, and can easily become the major bottleneck when most of your hot data fits in memory and you have reasonably fast io for the stuff that doesn't (gigabit connection to fast ssd storage). Io bound dbs usually have low memory, or the storage backend is slow (ebs).

Re: Something something online sorting (1)

K. S. Kyosuke (729550) | about 9 months ago | (#45782437)

Things like joins and sorts are compute expensive, and can easily become the major bottleneck when most of your hot data fits in memory

What kinds of joins on data that fits into a computer's operating memory get actually accelerated by something that has its own (seriously limited) physical address space and needs serious data copying to get proper random access to the data in the first place? (Not to mention the effective call latencies.)

(Oh, BTW, and when did "compute" become a noun? Nouning Latin verbs feels ridiculously retarded.)

Re:Something something online sorting (1)

CODiNE (27417) | about 9 months ago | (#45782381)

In other words... For databases that fit in memory GPU makes a lot of sense. For really large data sets the limit is how fast you can get the data off the hard disk.

But what "io bottleneck" people may be missing is that an io bound server could still benefit from this if the freed up CPU time can be used for other things when it's not shuttling data to and from the GPU. It also could end up saving a lot of energy, and that's money.

Re:Something something online sorting (1)

marcosdumay (620877) | about 9 months ago | (#45782547)

Except that GPUs are bad for most of the tasks a database do. Normaly, databases require random memory access (not mapping arrays) and complex selection rules. GPUs are best doing maps over continuous arrays, and with very simple (best if none) conditional cases.

Re:Something something online sorting (2)

Bengie (1121981) | about 9 months ago | (#45782587)

For databases that fit in memory GPU makes a lot of sense.

A bit more selective that that. For datasets that fit in memory, where memory patterns are sequential, and the queries have almost no branching. GPUs are very picky.

Re:Something something online sorting (5, Informative)

fatphil (181876) | about 9 months ago | (#45782433)

Read the paper - page 7 (which bizarrely doesn't render clearly for me at all, and I can't copy/paste)
"Scale Factor 1 (SF 1) ... data fits in GPU memory"

They ran the TPC-H ("H"="Huge") with a dataset that was ABSOLUTELY FUCKING TINY.

No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.

Frauds.

Re:Something something online sorting (4, Interesting)

TheRaven64 (641858) | about 9 months ago | (#45782849)

No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.

Any paper that does X on a GPU generally fits into this category. It's not science to run an existing algorithm on an existing Turing-complete processor. At most it's engineering. But it's a fairly easy way to churn out papers. Doing X 'in the cloud' or 'with big data' have a similar strategy. It's usually safe to ignore them.

Re:Something something online sorting (0)

Anonymous Coward | about 9 months ago | (#45782537)

THIS, but also all the work this is capable of is being made useless by most of the NoSQL technologies. Without joins, regex, and all the other capability on the database, you remove the need for a lot of cpu.

Take a look at the new i2 instances on AWS for instance. These were designed for databases for maximum IOPS. Few (powerful) CPU cores, and buttloads of RAM and fast PCIe SSDs. Maybe graph DB's should jump on the GPU bandwagon so AWS can build SSD instances with GPUs.

Re:Something something online sorting (0)

TrollstonButterbeans (2914995) | about 9 months ago | (#45782695)

Also because most servers aren't plugged into a monitor.

If every server had to have a monitor, it would take more space and no longer be economical.

Re:Something something online sorting (1)

gweihir (88907) | about 9 months ago | (#45782753)

Bah, pesky facts! Don't you know that the latest buzzwords have to be accepted unquestioningly to be truly hip (and utterly incompetent)?

Cost? Time? Hardware? Skill? (4, Interesting)

AHuxley (892839) | about 9 months ago | (#45781891)

The people with the skills have day jobs and want to enjoy time off with other projects.
The people with the skills have no jobs and want to write the code but the hardware is too expensive.

Because SQL is basically dead (2, Insightful)

Maury Markowitz (452832) | about 9 months ago | (#45781907)

The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.

It's bad enough that the only standardized access system is ODBC, designed 25 years ago when pipes were short and thin and a WAN was the next building over. If we can't get that problem fixed, what's the hope for integrating new technologies?

Re:Because SQL is basically dead (2, Informative)

Anonymous Coward | about 9 months ago | (#45782045)

The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.

Except for the part where errybody's keeping up with the latest developments. They're just actually looking at developments that matter. GPUs... Do not matter. If you want to know more, check the first post.

Processing power is inconsequential compared to I/O. RAM is pretty straightforward; newer, faster RAM comes out, larger amounts become cheaper, you buy it, you throw it into the mix.

The cool stuff is happening around SSDs (which are also pretty straight forward), solid state memory devices (think FusionIO-style cards; Violin devices; RAMSANs), and crazy arse storage solutions.

Re:Because SQL is basically dead (0)

Anonymous Coward | about 9 months ago | (#45782391)

but io inside the gpu memory is the whole point, not just counting floats.

a lot of building graphics for your screen is just.. io. heck that's what io basically is..

Re:Because SQL is basically dead (2)

houstonbofh (602064) | about 9 months ago | (#45782077)

Run a big query on your database. Now, while the hard drive light is solid red, look at your CPU load. See how it is not using all the CPU because it is waiting on the hard drive? A GPU will not help that.

Re:Because SQL is basically dead (0)

Anonymous Coward | about 9 months ago | (#45782497)

No, but a multi channel SSD Raid would.. expensive yes... but certainly possible. then where is your bottleneck.

Again, the IO from a hard drive being a bottleneck is because of choice[budget] not because of a real technical limitation. The reality is that most systems work 'good enough' and to get open source to work at the same level as say an oracle database you just throw more hardware at it and generally still save money. Because of that building an expensive high end machine to host a database is pointless.

However on a server where you pay by CPU having a non-CPU extension makes a lot of sense. This would be why propriety systems have GPU extensions and open source systems do not.

in short - if you need CPU in open source get get another CPU - it's cheap. If you need CPU in a closed source application you get a GPU - it doesn't work as well as a CPU but it adds performance and it doesn't incur more licensing fees.

Question Answered

Re:Because SQL is basically dead (1)

advocate_one (662832) | about 9 months ago | (#45782557)

16 gigs of fast RAM as well would be a good boost... but with really big databases, you go parallel... with multiple machines running against small subsets of the data at the same time...

I'm surprised we haven't seen FPGAs being deployed instead of GPUs...

Re:Because SQL is basically dead (0)

Anonymous Coward | about 9 months ago | (#45782631)

Parallel, well that depends on the definition and use. I for one would not want to add another IO bottleneck to a 50TB Data Warehouse just because someone thought going parallel is a good idea. I would advise you to check the latest hardware solutions from Oracle, since very large databases systems have: stupid amounts of ram (more than you have on all your HDs), stupid amounts of SSD (for indexes and highly used relations), tons of add-on processor boards, all this sharing a gigantic bus.

Re:Because SQL is basically dead (1)

Anonymous Coward | about 9 months ago | (#45782293)

Rubbish. GPUs do not access I/O. DBs that aren't toy systems use a fuckton of storage.

"Them"? (2)

FaxeTheCat (1394763) | about 9 months ago | (#45781909)

so what's holding them back?

Wrong question. It is open source. If you need it, you fix it.

Re:"Them"? (1)

houstonbofh (602064) | about 9 months ago | (#45782087)

so what's holding them back?

Wrong question. It is open source. If you need it, you fix it.

No, it is the right question. And the answer is, the people that actually understand these things work also know this will not help anything in real world applications. They are also busy optimizing for additional cheap ram, and the new and fast SSD cards that are almost affordable.

Risk aversion. (2, Interesting)

Anonymous Coward | about 9 months ago | (#45781919)

Because a lot of us have personal experience on how "reliable" GPU calculations are.

A few screen "artifacts" tend to be less painful than db "artifacts". Maybe things have changed. But it's not been that long since nvidia had a huge batch of video cards that were dying in all sorts of ways.

As for AMD/ATI, I suspect you'd normally use some of their crappy software when doing that GPU processing.

Re:Risk aversion. (1)

Anonymous Coward | about 9 months ago | (#45781987)

Business grade GPUs exist, they are not the default, are not cheap and loose some of the speed to error correction. A lot of code already uses APIs like CUDA and OpenCL however that code mostly uses floating point computations which are extreemly optimized and redundand on GPUs, computing pointers and integer operations (likely heavily used by DBs) is something of a lesser priority.

Re:Risk aversion. (0)

Anonymous Coward | about 9 months ago | (#45782181)

GPUs for DBs will at most be niche for a long time. Most DB people are more likely to be waiting for SSDs that are more reliable. I bet many have looked at the IOPS and gone "what the heck" and have resigned themselves to replacing SSDs (with SSDs) every 6 months on their RAIDs either due to failures or as a precautionary measure. Many DB admins would put up with worse for 30000 IOPS per drive vs 200 IOPS.

The other sort of hardware that would help DB folk is probably something that would allow easy fast low latency low overhead locking/synchronizing in a cluster. Might be a pipe-dream but if you can do that it'll be easier to have a big DB on many machines as opposed to many small DBs (shards) on many small machines (which while doable for many scenarios, isn't as nice).

Re:Risk aversion. (0)

Anonymous Coward | about 9 months ago | (#45782053)

Why was this stupidity modded up? Compute clusters aren't using the low to mid-range consumer GPUs this post alludes too.

Nope (0)

Anonymous Coward | about 9 months ago | (#45782273)

You mean you don't understand what a compute cluster IS.

Re:Nope (0)

Anonymous Coward | about 9 months ago | (#45782781)

In what way? GPU compute clusters use things like FirePros or Quadros/Teslas. Not toy, low-end consumer shit.

You just answered your own question (4, Insightful)

vadim_t (324782) | about 9 months ago | (#45781927)

"Many queries that I write are simpler than TPC-H, so what's holding them back?" -- simple queries don't need acceleration.

A "SELECT * FROM users WHERE user_id = 12", or a "SELECT SUM(price) FROM products" doesn't need a GPU, it's IO bound and would benefit much more from having plenty cache memory, and a SSD. A lot of what things like MySQL get used for is forums and similar, where queries are simple. The current tendency seems to be to use the database as an object store, which results in a lack of gnarly queries that could be optimized.

I do think such features will eventually make it in, but this isn't going to benefit uses like forums much.

Re:You just answered your own question (4, Insightful)

tranquilidad (1994300) | about 9 months ago | (#45781997)

This...

If you go beyond the abstract and read the paper you'll notice that they chose a TPC-H scale factor of 1 (1 GB of data) so that the entire dataset would fit in the GPU.

The question they seem to really be asking is more akin to, "Why don't we make our datasets small enough for complex queries that it can all fit in the storage attached to a processor we like?"

They continue to answer their own question when discussing results and admit they can't compare costs of "traditional" implementations because those tests were all run with scale of 100 (100 GB of data).

They say the comparison is difficult against complete systems because of the scaling factor and "...this paper is about the effectiveness of mapping relational queries to utilize the compute throughput [of] GPUs".

So, it seems to boil down to a test of compute power on data sets small enough to fit in memory rather than an effective test of relational query processing, though they did use relational queries as their base testing model.

Re:You just answered your own question (1)

fahrbot-bot (874524) | about 9 months ago | (#45782049)

So, it seems to boil down to a test of compute power on data sets small enough to fit in memory rather than an effective test of relational query processing, though they did use relational queries as their base testing model.

Or... Just because you can do something, doesn't mean you should.

Re:You just answered your own question (1)

fatphil (181876) | about 9 months ago | (#45782485)

Exactly!

"They say the comparison is difficult against complete systems because of the scaling factor[...]"

The TPC go a little bit further:
"""
Note 1: The TPC believes that comparisons of TPC-H results measured against different database sizes are misleading and discourages such comparisons. The TPC-H results shown below are grouped by database size to emphasize that only results within each group are comparable.
"""

Their toy is simply irrelevant in the field of real world databases.

Servers (0)

Anonymous Coward | about 9 months ago | (#45781953)

Most servers do not have powerful GPUs, and that is where heavy production databases are run.

Re:Servers (2)

fuzzyfuzzyfungus (1223518) | about 9 months ago | (#45782007)

Most servers do not have powerful GPUs, and that is where heavy production databases are run.

Servers turn over comparatively quickly, though (sure, every shop has ol' reliable trucking away on the 13GB SCSI drive that was pretty cool when it left the factory, doing something obscure but vital; but the population as a whole churns faster than that), and servers with nice chunks of PCIe (typically intended for your zippy network cards or fancy storage HBAs; but they are perfectly normal PCIe slots) aren't at all difficult to find. Nor has (Nvidia in particular, AMD trailing a touch) Team Graphics been shy about pushing server-suitable GPU compute parts.

It is true that servers today mostly have little to no GPU power; but if the case were made, that would change rather quickly.

Re:Servers (2)

houstonbofh (602064) | about 9 months ago | (#45782099)

But for that money, more ram or faster drives makes more of a difference...

Re:Servers (1)

fuzzyfuzzyfungus (1223518) | about 9 months ago | (#45782647)

Oh, with the exception of dedicated GPU compute setups, definitely, that's why the servers in use are configured as they are. My point was not that servers should have more GPU power; but that (if a change in software made doing so a good idea) the existing hardware wouldn't provide too much 'inertia' to stop or slow adoption.

There doesn't seem to be too much interest, on the whole; but if one were interested they could change the composition of their servers in fairly short order; and a broader shift could happen comparatively quickly (again, given suitable software).

Re:Servers (1)

PPH (736903) | about 9 months ago | (#45782155)

IT staff needs GPUs to play Crysis. Your DBMS gets a lower priority.

Why? (1)

koan (80826) | about 9 months ago | (#45781955)

Isn't everyone using them? I do 3D and the one drag about 3D is render time, I have a piece of software that uses the GPU and I am able to get a decent render in real time.
Premiere and AfterFX run much better and quite often real time renders too.
The way GPU's work seems to be the future, so I am puzzled why it isn't more prevalent, and I'm sure there is some technical reason I'm not aware of... right?

Re:Why? (2)

laffer1 (701823) | about 9 months ago | (#45781983)

One problem is OS and toolchain support. You might get something together for Windows, OS X and Linux, but that's where the buck stops.

The next problem is that standalone compute cards are rather expensive and putting in a high power GPU has considerable power requirements. Then most server racks are full of 1u wonders not designed to get rid of heat or even hold a huge AMD or NVIDIA GPU.

Open source databases are great, but they're often pushed as a cost savings to companies. To turn around and buy extra hardware to make them faster isn't going to cut it.

Finally, there's the oddity of programming languages that some databases are written in. The popular SQL databases are in C so that's not a problem. Some of the others are in Java, erlang, or some other crazy language that may or may not have OpenCL or CUDA support.

Re:Why? (1)

koan (80826) | about 9 months ago | (#45782121)

The cards would come down in price if they became popular, most GPU price points drop rapidly after release (except for the higher end cards), power is a problem, power is always a problem.

Fix the power problem and you're the richest man/woman in the World.

For the software side what if a company took the approach of building out the hardware, optimizing an OS, and then writing their apps for that hardware? (like Apple).
I'm actually asking because it seems like a good idea, who wouldn't want a database 12 times faster?

Re:Why? (1)

hairyfeet (841228) | about 9 months ago | (#45782483)

Uhhh...the price drops in CONSUMER GPUs, you know, the thing you play the shooty boomy games on? those are NOT what you use to compute your multimillion dollar DBs on, not unless you have no problem with 1+1 occasionally equaling 4.

The cards you use for the kind of number crunching in TFA would be your Tesla and FirePro cards and ya know what? The price on those does NOT drop quickly, just the opposite in fact with cards that are a couple of gen behind still fetching high dollar. You really can't compare the two, one has massive economies of scale while the other is a teeny tiny niche.

Re:Why? (1)

koan (80826) | about 9 months ago | (#45782785)

Demand grows price drops.

What's cheaper a handful of high end GPU's bought in bulk or the standard DB hardware used today? And don't just say one or the other show some proof.

Re:Why? (1)

michaelmalak (91262) | about 9 months ago | (#45782903)

1+1 occasionally equaling 4

Are you referring to the lack of ECC RAM on consumer grade GPUs or are you saying you know of FDIV or overclocking style unreliability in the compute engines themselves?

Re:Why? (1)

countach74 (2484150) | about 9 months ago | (#45782555)

Sorry, had to chime in on your claim that price would come down if they became popular. That is contrary to the Law of Supply and Demand. Top end GPU's are expensive because there is demand for the latest and greatest, which then justifies the cost of making very expensive, cutting-edge cards.

Re:Why? (1)

koan (80826) | about 9 months ago | (#45782799)

it used to be the same for CPU's, is it still that way?

Re:Why? (1)

Anonymous Coward | about 9 months ago | (#45782001)

The subject is databases. You're working in graphics... it seems that graphics processing units are good at... processing graphics. But GPUs are not database processing units.

What's holding them back? (1)

Culture20 (968837) | about 9 months ago | (#45781971)

Many queries that you write are simpler than TPC-H. Necessity is the mother of invention.

real world databases are usually not pcu bound. (0, Informative)

Anonymous Coward | about 9 months ago | (#45781993)

Databases in the real world are rarely cpu bound (and when I have seen them CPU bound it was when something was going badly wrong) Generally they are data bound and the GPU has several times lower bandwidth than the real cpus so effectively will be even slower, so while the computation on the gpu may be 10x faster...feeding the data in/out is 10x slower meaning it did not do anything for you, except require you a lot of extra coding complication do use it.

Benchmarks tend not look like real world queries, of often you can do something that helps a benchmark, but does nothing in the real world,.

Bus installed co processors (pci/pcie/vme) are only useful if you can fit the entire dataset in the co-processors memory, when you have to do large accesses outside of that ram because the data does not fit, then the co-processor usually becomes much slower and all advantages go away. That is why it works for supercomputing...the dataset being worked on is tiny in the cases the gpu works well for.

Re:real world databases are usually not pcu bound. (1)

StripedCow (776465) | about 9 months ago | (#45782063)

so while the computation on the gpu may be 10x faster...feeding the data in/out is 10x slower meaning it did not do anything for you, except require you a lot of extra coding complication do use it. ...
Benchmarks tend not look like real world queries, of often you can do something that helps a benchmark, but does nothing in the real world,.

But what if the benchmark is larger than the memory size of the GPU? I don't know the actual size, but I guess they use at least realistic amounts of data (larger than the memory of the GPU card), so that would prove your theory wrong!

By the way, there's more to databases than just queries. Skimming through the abstract, I see that they only address speeding up the queries. The commit phase of a database is also interesting, but they don't seem to address it.

Re:real world databases are usually not pcu bound. (0)

Anonymous Coward | about 9 months ago | (#45782141)

commits are generally io bound to the long term storage device...nothing a GPU can do for you there.

I had to look it up...tpc-h is decision support...not queries/sorts..so likely it is a tiny dataset that is CPU bound and being processed to make a decision/model so the dataset probably does fit in the GPU's memory and overall this is not useful for the rest of the database load (transactions, commits, sorts, merges).

Why not? (3, Funny)

Black Parrot (19622) | about 9 months ago | (#45782043)

It's waiting for you to get on it.

Hardware costs are limiting factor (1)

dkf (304284) | about 9 months ago | (#45782081)

What's holding them back? I'd have thought it was obvious!

The big issue with GPGPU for DB work is that you have to have the DB entirely in memory or your performance will suck (even SSDs aren't that fast). To get a big database to work in such a scenario, you have to split it into many smaller pieces, but that makes working with these sorts of things expensive even with an open source DB. The paper even says this. That makes this sort of work only really interesting for people with significant budgets, and they can easily use a commercial DB; the additional cost isn't prohibitive in that scenario.

Without general hardware availability, there's just that not many people pushing to have the feature; OSS thrives on having many people want it and many developers able to work on it.

Improvements have to come a few at a time (1)

leandrod (17766) | about 9 months ago | (#45782101)

All of these SGBDs are actually toys being sold for more then they are capable of. So developers there have to try to catch up to PostgreSQL before it becomes (even) easier to use and eat their lunch.

Meanwhile, the issues meriting scarce development and, mainly, review time at PostgreSQL are more interesting than accelerating a few workloads in hardware which is not yet in the servers out there. Things like making PostgreSQL even easier to install, set-up and manage, even more ISO SQL compliant, even more capable, even better than NoSQL at NoSQL loads

Now, if you can show your GPU aware PostgreSQL extension or modification, and show it is generally useful enough to merit review time for the next release why not?

Re:Improvements have to come a few at a time (1)

cyber-vandal (148830) | about 9 months ago | (#45782231)

SGBD = DBMS en Anglais ;-)

Re:Improvements have to come a few at a time (1)

leandrod (17766) | about 9 months ago | (#45782305)

Thank you, even if I fear it is too late to fix.

While I do speak French too, the mistake is probably from my native (Brazilian) Portuguese.

It depends (5, Funny)

Waffle Iron (339739) | about 9 months ago | (#45782109)

Research shows that there is good news and bad news on this approach.

The good news: Certain SQL queries can get a massive speedup by using a GPU.

The bad news: Only a small subset of queries got any benefit. They generally looked like this:


SELECT pixels FROM characters JOIN polygons JOIN textures
ON characters.character_id = polygons.character_id
WHERE characters.name = 'orc-wielding-mace' AND textures.name = 'heavy-leather-armor' AND color_theme = 'green'
ORDER BY y, x

there are certainly CPU-bound databases (1)

hedrick (701605) | about 9 months ago | (#45782137)

I'm responsible for a large university learning management system (Sakai). The daabase is completely CPU limited. I assume that's because the working set of data fits in memory. I would think lots of university and enterprise applications would be similar. Another data point is the experiments done on a no-SQL interface to innodb. That shows very large speedups. Surely some of this is due to the CPU overhead in processing SQL.

Re:there are certainly CPU-bound databases (1)

marcosdumay (620877) | about 9 months ago | (#45782567)

I assume that's because the working set of data fits in memory.

As memory access count as CPU time, not I/O, doing any query in a dataset that is in memory will be CPU bound. But that does not mean that you'll get improvements by adding CPU speed.

postgres can. (1)

jdew (644405) | about 9 months ago | (#45782143)

Guess nobody ever heard of the pgstrom

Re:postgres can. (0)

Anonymous Coward | about 9 months ago | (#45782191)

Yes, some did. Here's a link [postgresql.org] .

Brought to us by Kohei Kaigai, a very prolific PostgreSQL hacker indeed.

More interesting still is the architectural groundwork of PostgreSQL which makes such fancy things feasible (in this case the foreign data wrapper: rather cool).

Most servers do not have GPU's. (0)

Anonymous Coward | about 9 months ago | (#45782169)

It is really that simple. The companies that would gain the most from this do not (as a general statement) equip their servers with GPU's. Even if the DB's started supporting it first, giving people a reason to add GPU's into servers, the processing isn't the major bottleneck for DB servers. So there isn't a tremendous value in either adding them into the servers (so they are useful), or in adding code to support the GPU (when there aren't many servers that have them).

Its a chicken and egg and usefulness problem. ;)

Plenty to do first... (1)

fostware (551290) | about 9 months ago | (#45782179)

Besides datasets not fitting in to GPGPU memory, and I/O bottlenecks, I'm still seeing plenty of badly written SQL

A current contract has plenty of SQL work (not for me though), and the bulk of their time is cleaning up data exceptions, badly written report queries, and moving oft-used or large-dataset queries to stored procedures. GPGPU's will hide some of the rot, but if the SQL was written better in the first place, we're able to use parallelism and better use existing commodity hardware in clients virtualised environments.

I'm not dissing the prospect of GPU acceleration, just the priority TFA gives to it.

Re:Plenty to do first... (1)

Grishnakh (216268) | about 9 months ago | (#45782297)

This might be a stupid question as I'm not a DB expert, but isn't the problem of badly-written SQL something that could be mitigated by improvements in the SQL parser of a RDMBS? Other programming language compilers are frequently designed to optimize output code despite non-optimal constructs written by programmers. It seems to me that some of the improvements you talk of could be automated, especially moving oft-used queries to stored procedures.

Re:Plenty to do first... (1)

fostware (551290) | about 9 months ago | (#45782371)

I honestly don't know of any decent AQL optimisers...

I know MS SQL Management Studio has SQL Profiler, Index Tuning Advisor, and Database Performance Tuning Advisor.
But there's nothing in Aqua Data Studio that works with PostgreSQL, which means co-workers and I must rely on good looks and mad skillz (I'm only passable on both)

Re:Plenty to do first... (0)

Anonymous Coward | about 9 months ago | (#45782767)

I know MS SQL Management Studio has SQL Profiler, Index Tuning Advisor, and Database Performance Tuning Advisor.

These are all just tools to aid a skilled person in optimizing. They are comparable to tools offered by companies like Embarcadero, Quest, Toad (which generally work on any major SQL database system), but integrated into the whole development&management environment. MS takes care of its developers when it comes to that stuff. But in the end, they don't do any optimization for you, they just point out where the resources go, and where optimization is likely to have the biggest effect.

The only truly optimizing part in a RDBMS is the query optimizer, which is present in every big SQL database system, be it Oracle, MS or IBM. This is a fairly clever optimizing engine which tries to ensure the query is executed in the most efficient manner, no matter how it is written, by examining available indexes, rowcounts, and statistics the database engine keeps on column values. These optimizers are pretty much as clever as they are ever going to be, you won't find much improvements there, things like twiddling with the caching of the optimized query plans will only give you small improvements, not breakthroughs.

I tend to think that if there are going to be large improvements in SQL databases, and probably more general in systems dealing with relational data sets, it will be in the storage engine. It's possible I'm just thinking that because it's the part of database systems I know least about. It is what the noSQL guys are trying to do though, they're basically highly optimizing the storage and retrieval process by yanking out the relational part. This leaves the burden of data stewardship to the user (coder). It essentially 'optimizes' the query optimization process by forcing the coder to implement it himself and thus hopefully optimizing it for the task at hand, which should make it at least as fast as any generic solution. In practice, (enterprise) coders don't work that way. They want to compartmentialize that part, put it in a library so that nobody can do unexpected things that may bring the system down, and consequentially you end up with a query optimizer that is written by people who really weren't trained for that. But who knows, at least it opens up the option for clever coders to rewrite the query optimization engines, an option you won't be getting from any of the big players.

Re:Plenty to do first... (1)

Bengie (1121981) | about 9 months ago | (#45782677)

When your queries start getting into the 10 table joins, the join optimizer starts to attempt to make educated guesses because of the number of possible join arrangements. The metadata used is based on samples of the current data. To mitigate having to keep these metadata perfectly up to date, which would be very expensive and slow, the RDMBS only samples a subset.

While this works most of the time, there are some cases that don't. I've had quite a few times where I had to force join orders and/or join types to get the query to work correctly. Talking about 1-4 magnitude differences in performance many times. Since I have control over the DB, I can know how the data will relate and can force the DB into certain join orders.

Some times, breaking the query up and loading the output into temp tables can speed things up. I do not recommend this as "normal", but some cases warrant it.

Re:Plenty to do first... (1)

Grishnakh (216268) | about 9 months ago | (#45782901)

See, (again I'm speaking from a position of relative ignorance here) it seems like the RDMBS should be intelligent enough to figure this stuff out automatically, instead of requiring an in-house expert. It should be adaptive and learn from the current usage patterns, in relation to the data it stores. So if, for instance, breaking the query up and using temp tables speeds things up, the DB should figure this out and do it automatically. It wouldn't work for one-time queries, but if the same kind of queries are being done over and over, it should recognize the common queries, and behind-the-scenes look for ways of speeding up these queries, so that when they're done in the future, it can apply these improved methods and deliver much faster results.

Re:Plenty to do first... (0)

Anonymous Coward | about 9 months ago | (#45782861)

It seems to me that some of the improvements you talk of could be automated, especially moving oft-used queries to stored procedures.

Coming from a MS SQL Server background, this is a process that has already happened, and it has gone in 2 directions. On the one hand, instead of caching complete batches/stored procedures as one script, the caching has moved to work on statement level. On the other hand, parameterization of queries is (or can be) done automatically, meaning that literal values in the query are replaced by parameter placeholders before the query is hashed and a cache search is done. The result of this is that many of the disadvantages of ad-hoc queries compared to stored procedures have disappeared. I'm not 100% sure of the exact versions but I think statement level caching was introduced in 2005 and the option for forced parameterization in 2008. I assume Microsoft's competitors have made similar steps, if not long before them then surely short after, as it has some very clear advantages.

Wrong (0)

Anonymous Coward | about 9 months ago | (#45782347)

If you need more computing power, you're doing it wrong.

Postgres (2)

slackergod (37906) | about 9 months ago | (#45782355)

Looks like exactly what PostgreSQL's PGStrom [postgresql.org] project is trying to acheive.

PGStrom (0)

Anonymous Coward | about 9 months ago | (#45782365)

In PostgreSQL we have a project called PGStrom http://wiki.postgresql.org/wiki/PGStrom

Let's see... (0)

Anonymous Coward | about 9 months ago | (#45782397)

...maybe it has something to do with the fact that it's called a Graphics Processing Unit? Why the fuck are we using them as CPUs?

Re:Let's see... (1)

mc6809e (214243) | about 9 months ago | (#45782821)

...maybe it has something to do with the fact that it's called a Graphics Processing Unit? Why the fuck are we using them as CPUs?

We use them as CPUs because we don't suffer from that cognitive bias [wikipedia.org] known as functional fixedness. [wikipedia.org]

Conspicuous omission - PostgreSQL (1)

bill_mcgonigle (4333) | about 9 months ago | (#45782611)

it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc.

Lemme guess, MySQL fanatic?

You can already go download:

    https://wiki.postgresql.org/wiki/PGStrom [postgresql.org]

if it fits your problem domain and PostGIS has some hackers adding GPU support:

    http://data-informed.com/fast-database-emerges-from-mit-class-gpus-and-students-invention/ [data-informed.com]

Why not the others? Perhaps because PostgreSQL makes developing extensions easier - it's got the largest extension ecosystem, so I'm just presuming there. If it turns out well in Pg-land, the others will naturally adopt it.

So the answer to the story title is "they do." The next question would be, "why isn't it widely deployed", and the answer would be, "it's not done yet." Yadda, yadda, yadda, patches welcome. If the whole summary is just a way to try to turn "hey this is neat" (it is) into an ill-founded [google.com] complaint story, then write a better story next time. It's neat stuff, no need to whine.

because open source fucking blows (-1)

Anonymous Coward | about 9 months ago | (#45782615)

what do expect from shit hobbled together by hobbyists?

GPU not 7 times faster than 32 CPU cores (2)

loufoque (1400831) | about 9 months ago | (#45782749)

A GPU, even a GTX Titan, simply isn't 7 times faster than a modern 32-core x86 CPU in real life. Most of the gain probably comes from just general optimization that could have been done on the CPU too.

Use SQL Server (0)

Anonymous Coward | about 9 months ago | (#45782759)

Don't use open source db. Use SQL Server for security and speed.

does not calculate in real life (1)

Domas Mituzas (3475087) | about 9 months ago | (#45782807)

putting MonetDB, CouchDB and MySQL in single line already shows seriousness of the question. First of all, TPC-H is decision support (ad-hoc, analytic) workload, and putting all data into memory needs comparison with in-memory ad-hoc platforms like MemSQL where the difference might not be as pronounced. Also, it is easy to have hundreds of gigs of RAM for CPU driven systems, whereas GPU memory is still tiny. Yes, doing complex window functions on streaming data may seem fine, but anything requiring larger arenas of random data access would fall of the cliff. For all the people talking about speeding your Wordpress, you need to look at OLTP or even more readonly small data benchmarks (TPC-C is already too complex). Database that is efficient at transactional workload has too much overhead for analytical processing. Then there are all the datacenter considerations - getting rid of heat once you have thousands of GPUs around is no longer a trivial task, and may involve oil immersion or water cooling. Yes, sorting a dataset can be faster, but assembling it from all the I/O devices and memory is a task that is the major expense. Thats why multithreading works - there are lots of waits for memory already, making them even longer would be difficult. The I/O we talk about is not just reading from disk or disks or ssds, it is also about getting into the chip, and bandwidth there is still constrained. And yes, CPU can get quite busy in database server, for all the network, storage, compression, page and row mangling code has to run somewhere - sizing hardware for large scale databases is a tough balancing act. But very little of that can work on GPU in online world. Nice research though :)

not much math (0)

Anonymous Coward | about 9 months ago | (#45782897)

From my experience: I don't think database programs do much mathematical formulas or computations that would benefit from a graphics processing or even a floating processing unit.

A spreadsheet, on the other hand, might be able to take advantage of a GPU.

I should move my personal databases (OpenOffice.org base, Oracle Express) to an SSD drive. I can't afford an SSD yet. Progsql MySQL won't run properly on my Windows XP box for some odd reason.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>