The Good, The Bad and The WebGL-y

May 11, 2015, 8:32 pm

≪ Previous: Improve Player Retention Reacting to Behavior [Server Scripts]

The online world is in the midst of a major evolution. Old HTML ways are making way for the new, improved and interactive world of HTML5 and WebGL. The excitement of the static internet has long-since settled down, allowing visionaries a clear view of what the future of online means to consumers and developers. The future of online is fun and games, the future is immersive and interactive, the future is WebGL.

ThreeJS was my first venture into WebGL.

ThreeJS caught my attention because it allowed games to be built directly into a browser with no need for plugins. While great in theory, there was a huge learning curve and 3JS, in its current state, is the toy of elite coders and is pretty much inaccessible for someone wanting to implement simple WebGL into their current online presence.

Import test of the instruments from "The Music of Junk".

By following tutorials and opening up working examples, I was able to create many successful tests, but eventually hit a road block. When it came to getting animated characters into the browser via 3JS, I was unable to wrap my drummer's mind around the code to make it work. Relief to my frustration appeared in the file format of Sea3D, which allowed for very easy export of character models from 3ds Max into the 3JS world.

Hit box and physics test

So far as I know, 3JS does not have a GUI to work with, it's all back-end code to bring models into the scene. While that worked great once I figured out the code, I eventually lost interest when I was unable to make walls impassable. Soon after, I put 3JS to the side and took on other projects to entertain myself.

A short stop with X3Dom.

A little while later, I got a new job and was given some freedom to experiment for marketing. I messed a bit with 3JS and product displays, but was hindered by quality and file size. In the time between my venture into 3JS and my new job, I had abandoned 3ds Max, as I no longer had a system capable of running it. In November 2013, I decided to take up 3d again, and since enough time had passed that I would basically have to relearn 3ds Max, I decided to learn Blender instead. Thus, I reached another roadblock when wanting to work with 3JS, as the Sea3D character export only works for 3dsMax, and the developer never got around to the promised Blender Exporter.

Basic X3Dom embed code

  <head>
    <meta http-equiv='Content-Type' content='text/html;charset=utf-8'></meta>
    <link rel='stylesheet' type='text/css' href='http://www.x3dom.org/x3dom/release/x3dom.css'></link>
    <script type='text/javascript' src='http://www.x3dom.org/x3dom/release/x3dom.js'></script>
  </head>
  <body>
    <x3d id='someUniqueId' showStat='false' showLog='false' x='0px' y='0px' width='400px' height='400px'>
      <scene>Some Info About Your Model
        <inline url='yourModel.x3d' ></inline>
      <scene>
    </x3d>
  </body>

Blender comes equipped with the exporter for X3Dom file format, a great file system for product visualization, but hampered by file size and quality issues, like wireframe edges showing up in rendered models. With the limits of X3Dom and the dead end of 3JS when working with Blender, I figured I would have to wait for a dedicated development team to come along and take up the WebGL cause.

That team arrived in the form of Blend4Web.

Quick Godzilla Test

Blend4Web is where I currently sit watching the World Wide WebGL take shape in function, design and, most important to me, fun implementation of this new tech. While fully capable of making games that run entirely in a browser with no plugins required, what got my attention about Blend4Web was their attention to their product's potential for the retail world of online sales and interactive stores. Games are always fun and popular, and B4W's excellent system for making online games easily deserves commendations, however, for myself, retail is my type of game and here B4W shines.

Interactive Beehive Smoker

B4W has taken great care in producing an interface that allows all the important aspects of online retail such as proper Search Engine Optimization tags, meta-descriptions and titles, all within the B4W Blender interface. Files can be exported with single click, resulting in a fully-contained HTML file with a full 3d product, including hotlinks, reflections, glow effects, audio, and much more, all with no coding required. If one so chooses, models can be exported to individual JSON files for assembly later in a main scene, again all with hotlinks and glow in place.

A god rays and JSON test

To me, this is the future of the internet. Interactive user-friendly interfaces on a website that put the product virtually into the hands of consumers for perusal and more details. Blend4Web is an example of a company with forethought and vision. Retail may not be exciting to gamers, but to retailers, games are another product for the shelf, and Blend4Web makes putting those products on the shelf as easy as they have made making online games. With Blend4Web, everything in WebGL is simply a few clicks away.

With constant updates, fast responses to questions on their forum, excellent detailed tutorials, and their ability to produce a quality product that easily makes fun and interesting web experiences for gamers and consumers, Blend4Web stands out in the new internet of The Good, The Bad and The WebGL-y.

↧

Are You Letting Others In?

January 28, 2015, 8:38 am

≫ Next: Weiler-Atherton in 3D

≪ Previous: The Good, The Bad and The WebGL-y

Introduction

A good friend and colleague of mine recently talked about the realization of not letting others in on some of his projects. He expressed how limiting it was to try and do everything by himself. Limiting to his passion and creativity on the project. Limiting to his approach. Limiting to the overall scope and impact of the project. This really struck a chord with me as I’ve recently pushed to do more collaborating in my own projects. In an industry that is so often one audio guy in front of a computer, bringing in people with differing, new approaches is not only freeing, it’s refreshing.

The Same Ol' Thing

If you’ve composed for any amount of time, you’ve noticed that you develop ruts in the grass. I know I have. Same chord progressions. Same melodic patterns. Same approaches to composing a piece of music. Bringing in new people to help branch out exposes your work to new avenues. New opportunities. So, on your next project I’d challenge you to ask yourself – am I letting others in? Even to just evalute the mix and overall structure of the piece? To review the melody and offering up suggestions? I’ve been so pleasantly surprised and encouraged by sharing my work with others during the production process. It’s made me a better composer, better engineer and stronger musician. Please note that while this can be helpful for any composer at ANY stage of development, it's most likely going to be work best with someone with at least some experience and some set foundation. This is why I listed this article as "intermediate."

Get Out of the Cave

In an industry where so many of us tend to hide away in our dark studios and crank away on our masteripieces, maybe we should do a bit more sharing? When it’s appropriate and not guarded by NDA, of course! So reach out to your friends and peers. Folks that play actual instruments (gasp!) and see how they can breathe life into your pieces. Make suggestions as to how your piece can be stronger. More emotional. For example, I’d written out a flute ostinato that worked well for the song but was very challenging for a live player to perform. My VST could handle it all day… but my VST also doesn’t have to breathe. We made it work in a recording studio environment but if I ever wanted to have that piece performed live, I’d need to rethink that part some.

Using live musicians or collaborating can also be more inspiring and much more affordable than you might first think! Consult with folks who are talented and knowledgible at production and mixing. Because even the best song can suck with terrible production. I completely realize you cannot, and most likely WILL NOT, collaborate on every piece you do. But challenging yourself with new approaches and ideas is always a good thing. Maybe you’ll use them or maybe you’ll confirm that your own approach is the best for a particular song. Either way, you’ll come out ahead for having passed your piece across some people you admire and respect.

My point?

Music composition and production is a life long path. No one person can know everything. This industry is actually much smaller than first impressions and folks are willing to help out! Buy them a beer, coffee or do an exchange of services. When possible throw cash. Or just ask and show gratitude! It’s definitely worked for me and I think it would work for you as well. The more well versed you are, the better. It will never hurt you.

Article Update Log

28 January 2015: Initial release

GameDev.net Soapbox logo design by Mark "Prinz Eugn" Simpson

↧

Weiler-Atherton in 3D

May 9, 2015, 9:16 pm

≫ Next: Are You Letting Others In?

≪ Previous: Are You Letting Others In?

The well-known Weiler-Atherton Algorithm of the polygons clipping usualy is demonstrated in 2D performance. Nevertheless this idea also works in 3D.

The demo programs concerned Weiler2D.exe and Weiler3D.exe are in the Weiler3D directory to be unpacked from the attached article resource archive.

1.Weiler-Atherton in 2D

The Weiler-Atherton Algorithm of the two 2D polygons may be performmed as 3 steps:

To create the set of segments consisted of the vertexes of the 1st polygon contained inside the 2nd polygon the points of the ribs inertsection included.
To create the set of segments consisted of the vertexes of the 2nd polygon contained inside the 1st polygon the points of the ribs inertsection included.
To merge the sets of segments above with the inersection points.

The following illustrations have been created with the Demo Program Wailer2D.exe:

In fig 1.1 two randomly created polygons Red and Blue to be clipped.

In fig 1.2 the set of Magenta segments of the the vertexes of the Red polygon are contained inside the Blue polygon and the set of Aqua segments of the the vertexes of the Blue polygon are contained inside the Red polygon.

In fig 1.3 The sets of Magenta and Aqua segments are moved aside for the demonstration purposes.

In fig 1.4 The sets of Magenta and Aqua segments are moved together to create clipping polygons.

In fig 1.5 the the Yellow clipped polygons are shown together with the original Red and Blue polygons.

You may create your own performance of the 2D Weiler-Atherton Algorithm with the program Wailer2D.exe.

To watch step by step performance use the Right Arrow button while the Play timer is stopped.

To start clipping press Enter button.

All the commands for Weiler2D.exe available are shown in the Help Dialog (press F1 button to show)

To start the new scenario just press Space Bar. The polygons are randomly created and randomly oriented and randomly rotated.

2.Weiler-Atherton in 3D

The Weiler-Atherton Algorithm of the two polyhedrons clipping may be performmed as 3 steps:

To create the set of polygons consisted of the vertexes of the 1st polyhedron contained inside the 2nd polyhedron the points of the polygons inertsection included.
To create the set of polygons c consisted of the vertexes of the 2nd polyhedron contained inside the 1st polyhedron the points of the polygons inertsection included.
To merge the sets of polygons above with the inersection points.

The next illustrations has been arranged with the Demo Program Wailer3D.exe:

In fig 2.1 two randomly created Red and Blue polyhedrons randomly oriented to be clipped.

In fig 2.2 the Red and Blue polyhedrons moved into random position to start clipping.

In fig 2.3 the Red and Blue polyhedrons in random position are shown in blending mode as semi-transparent.

In fig 2.4 the sets of the Red polyhedron faces inside the Blue one and the sets of the Blue polyhedron faces inside the Red one the segments of intresection included are moved aside for the demonstration purposes.

In fig 2.5 the sets of the Red polyhedron faces inside the Blue one and the sets of the Blue polyhedron faces inside the Red one the segments of intresection included are moved together to obtain the clipped polyhedron.

You may select Play menu to watch the clipped polyhedron faces and/or you may use Mouse move with the left mouse button pressed.

To watch step by step performance use the Right Arrow button while the Play timer is stopped.

All the commands for Weiler3D.exe available are shown in the Help Dialog (press F1 button to show)

To start the new scenario just press Space Bar. The polyhedrons are randomly created and randomly oriented and randomly rotated.

The programs above has been developed in MFC platform. Needless to say that it is not a problem to develope them in Win32 or any other platform. The pseudocode of the procedures used in Weiler3D provided below:

declare:
Plane  :    space area determined with the normal vector and the distance of axis centre
Polygon:    list of the vertices layed in one plane
Polyhedron: list of the polygons conected
//////////////////////////////////////////////////////////////////
Procedure main
begin
Polyhedron Red
Polyhedron Blue
Polyhedron Mixed
ClipPolyHedrons( Red, Blue, &Mixed)
end
//////////////////////////////////////////////////////////////////
Procedure ClipPolyhedrons( Polyhedron p0, Polyhedron p1, Polyhedron * pRslt)
begin
ClipPolyhedronIn(Polyhedron p0, Polyhedron p1, Polyhedron * pRslt)
ClipPolyhedronIn(Polyhedron p1, Polyhedron p0, Polyhedron * pRslt)
end Proc
///////////////////////////////////////////////////////////////////
Procedure ClipPolyhedronIn( Polyhedron p0, Polyhedron p1, Polyhedron * pRslt)
//pRslt is a list of polygons of  Polyhedron p1 contained inside 
//the Polyhedron p0 intersected polygons including
begin
with Polyhedron p0 
   for every polygon
      Polygon pCur = the current polygon;
      Polygon pNew = the result of the intersection of the Polygon pCur and Polyhedron p1
	  IntersectPolygon(p1, pCur, &pNew)
	  if there are any vertices in the Polygon pNew
	      Polygon pNew is appended to the polygon list in Polyhedron * pRslt
      end if 
    end for
end Proc
/////////////////////////////////////////////////////////////////////////////
Procedure IntersectPolygon(Polyhedron  phdr, Polygon plgn, Polygon * pRslt)
//pRslt is a list of vertises of  Polygon plgn contained inside 
//the Polyhedron phdr vertises of the intersection including
begin
if Polygon plgn is completely inside of the Polyhedron  phdr  
   make Polygon * pPslt as copy of  Polygon plgn;
   return;
end if

Plane pA    //The Plane of the Polygon plgn vertices
Polygon pT  //The Polygon obtained with the intersection of the Polyhedron  phdr by the Plane pA

IntersectPlane(phdr, pA, pT);
if Polygon pT has no vertices
   return;
end if

ClipPolygons(plgn, pT, pRslt);
end Proc
//////////////////////////////////////////////////////////////////////////
Procedure IntersectPlane(Polyhedron  phdr, Plane pA, Polygon * pRslt)
//pRslt is a list of vertises of  the intersection Polyhedron  phdr by the Plane pA 
begin
with Polyhedron phdr 
   for every polygon
      Polygon pCur = the current polygon;
	  if all the vertices of the Polygon pCur layed in the Plane pA
        make Polygon * pPslt as copy of  Polygon pCur;
        return;
      end if
	  let plt - the list of vertices of the intersection of the Polygon pCur with the Plane pA 
	  IntersectByFlat(pCur, pA, &plt);
	  with the list of vertices plt
   	     for all the vertices 
		    if current vertice is not in the list of the  Polygon * pRslt
			    append current vertice to the list of the  Polygon * pRslt
            end if
         end for
   end for
end Proc
//////////////////////////////////////////////////////////////////////////
Procedure IntersectByFlat(Polygon plgn, Plane pA, list of intersection vertices &plt)
begin
with Polygon plgn
   for all the vertexes
    let pV = the current vertex;
    let pVn = the next vertex in the list Polygon plgn
	double d0 = Distance of pV to Plane pA;
	double d1 = Distance of pVn to Plane pA;;
	if(d0 > 0 && d1 >= 0 || d0 < 0 && d1<=0)
	  continue;
    end if 
    Intersection vertex pU:
    Vector * pU =  new Vector(* pV -(* pVn - * pV)*d0/(d1 - d0));
	  Append vertex pU to the list of vertices plt 
   end for
end Proc
///////////////////////////////////////////////////////////////////////////////////

The only pseudocode of the ClipPolygons procedure has been ommited because it is standard Weiler-Atherton algorittm in 2D.

Conclusion

The Demo above shows that the Weiler-Atherton Algorithm of clipping is working in 3D as well. The Weiler3D.exe has been created on the basis of NeHe's OpenGL Lessons mentioned in my former article. It seemed worth to use the Weiler-Atherton Algorithm of clipping in 3D simple applications and I believe it will work in 4D and 5D as required.

↧

Are You Letting Others In?

January 28, 2015, 8:38 am

≫ Next: Writing Efficient Endian-Independent Code in C++

≪ Previous: Weiler-Atherton in 3D

Introduction

The Same Ol' Thing

Get Out of the Cave

My point?

Article Update Log

28 January 2015: Initial release

GameDev.net Soapbox logo design by Mark "Prinz Eugn" Simpson

↧

Writing Efficient Endian-Independent Code in C++

June 1, 2015, 9:57 pm

≫ Next: How to Expand Games (and more) Eluding App Store Updating

≪ Previous: Are You Letting Others In?

Once upon a time, there was an article published on gamedev.net [Roy2013], which described a way (as he says, mostly taken from Quake2 engine) to deal with Little-Endian/Big-Endian issues in games. While this approach is mostly sound (“mostly” because of unaligned-read issues which will be discussed below), it is not the most efficient one. Better (simpler, faster, and more general) approaches do exist, and they will be discussed below.

What is Endianness

Endianness itself has been described in many different works, including [[Roy2013] and [WikipediaEndianness]. Basically, it is a way that CPU stores multi-byte data in memory; little-endian systems store least significant byte first, and big-endian ones store most-significant byte first. So, if you have

uint16_t x = 1234;

then x will look as {0xD2, 0x04} on a little-endian system, and as {0x04, 0xD2} on a big-endian system. As a result, code such as

send(socket,&x,2);

will send different data over the wire depending on system endianness (little-endian systems will send {0xD2,0x04}, and big-endian ones will send {0x04,0xD2}).

It is important to note that endianness effects cannot be observed unless we have some kind of cast between pointers on data of different sizes (in the example above, there was an implicit cast from &x, which is uint16_t*, to void*, which is actually treated as byte pointer by the send() function). In other words, as long as we keep away from casts and stay within arithmetical and bitwise operations without pointer casts, the result is always the same regardless of endianness. 2+3 is always 5, and (((uint16_t)0xAA)<<3)^0xCCCC is always 0xC99C, regardless of the system where our code is running. Let's name such calculations endianness-agnostic.

Scope

First of all, where do we need to deal with little-endian/big-endian issues? In fact, there are only two scenarios of which I know, where it is important. First one is reading files (which might have been written on a different machine), and another one is network communication. From our perspective, both of these cases are essentially the same: we're transferring data from one machine to another one.

Serialization/Marshalling

One thing which should be noted for both these data-transfer-between-machines scenarios, is that you should never transfer data as a C structure; instead, you should serialize/marshal it. Putting C structure in a file (which may be read on another machine) or over the network, is a Really Bad Idea for several reasons.

First of all, when writing/reading C structure to external storage, you're becoming a hostage of implicit alignment rules of the compiler you're using. In general, when you have a structure such as

struct X {
uint8_t a;
uint32_t b;
};

then sizeof(X) won't be 5 bytes as some might expect; in many cases sizeof(X) will be 8 bytes (1 byte of a, then 3 unused bytes of padding just to make b aligned on a 4-byte boundary, and then 4 bytes of b), but this is not guaranteed at all. To make things worse, the amount of alignment is not specified by standards, so when you're switching from one compiler to another one, it may change (not to mention switching between CPUs); to make things even worse, it can be affected by compiler switches and on a struct-by-struct basis by things such as #pragma pack.

If you are using types such as int or long (rather than guaranteed-size-types such as uint8_t and uint32_t), things worsen even further (yes, this is possible) – due to different sizes of these types on different platforms. Oh, and don't forget that variable-length strings and C++ containers are clearly off-limits.

There are other (rather minor) reasons for avoiding writing C structures directly: you'll write more data then necessary, the data written will include garbage (which will affect the ability to compress it), and so on. However, the most important issue is (lack of) inter-platform and inter-compiler compatibility mentioned above.

These issues are so important, that in the networking world sending C structures over the network is universally considered a Big No-No.

So, what you should do when you need to send a C structure over the network (or to save it to the file)? You should serialize it first (in network world term “marshal” is generally preferred, though it is essentially the same thing).

Implementing Serialization

The idea behind serialization is simple: for the struct X above you write one byte of a, and 4 bytes of b, avoiding alignment issues. In fact, you can go further and use, for example, VLQ [WikipediaVLQ] variable-length encoding, or put null-terminated strings into your serialized data.

One way of serializing data (the one I prefer), is to have serialize/deserialize functions such as

void serialize_uint16(DataBlock&, uint16_t);//DataBlock should grow as the data is serialized
uint16_t deserialize_uint16(Parser&);//there is constructor Parser(DataBlock&)

When we have these functions implemented, then serializing our struct X will look as

DataBlock data;
serialize_uint8(data,x.a);
serialize_uint32(data,x.b);

(and deserializing will look similar).

So far so good, now let's see how we can implement our serialize_uint16() function. If we will implement it according to [Roy2013], it would look like:

void serialize_uint16(DataBlock& data,uint16_t u16) {
  void* ptr = data.grow(2);//add 2 bytes to the end of data, and return pointer to these 2 bytes
  u16 =  LittleShort(u16);//calling ShortSwap on big-endian systems, and ShortNoSwap on little-endian systems
  *(uint16_t*)ptr = u16; //(*)
}

This would work fine on x86 and x86-64, but on the other platforms the line marked as (*) may run into problems. The problem is that our ptr might be either even or odd; and if it is odd – some CPUs will refuse to read 2-byte data from it (also they will usually refuse to read 4-byte data unless its address is a multiple of 4, and so on). This never happens on x86/x86-64, but happens on SPARC, and may or may not happen on ARM (unless we specify __packed qualifier for uint16_t*, but it is not universally available).

Another Popular Alternative

Another popular alternative (thanks to Servant of the Lord for reminding me of it) is based on LITTLE_ENDIAN and BIG_ENDIAN macros. In some sense, it can be seen as the same serialize_uint16() as above, but using different implementation for BigShort() etc.:

//code courtesy of Servant of the Lord
#ifdef LITTLE_ENDIAN
    #define BigShort(x)     ShortSwap(x)
    #define LittleShort(x)  (x) //Do nothing, just 'return' the same variable.
    #define BigLong(x)      LongSwap(x)
    #define LittleLong(x)   (x)
    #define BigFloat(x)     FloatSwap(x)
    #define LittleFloat(x)  (x)
#elif defined(BIG_ENDIAN)
    #define BigShort(x)     (x)
    #define LittleShort(x)  ShortSwap(x)
    #define BigLong(x)      (x)
    #define LittleLong(x)   LongSwap(x)
    #define BigFloat(x)     FloatSwap(x)
    #define LittleFloat(x)  (x)
#else
    #error No idea about endianness
#endif

While it is faster and less bulky than a previous one (see "Performance Analysis" section below), it has the same problem with unaligned read/writes on non-x86 platforms :-(. In other words, for serialization purposes it won't work, for example, on SPARC (and it working for ARM is not guaranteed).

What Is to be Done?

What is to be done?
-- name of the novel by Nikolay Chernyshevsky, 1863 --

The answer is quite simple, actually. Instead of writing two bytes as one chunk, we can always write it byte-by-byte:

void serialize_uint16(DataBlock& data,uint16_t u16) {
  uint8_t* ptr = data.grow(2);
  *ptr++ = (uint8_t)u16;
  *ptr = (uint8_t)(u16 >> 8);
}//deserializing is very similar

This technique is well-known (see, for example, [Pike2012], but the idea is known for years before); a Really Good Thing about it is that we don't need to care about endianness at all(!). This stands because the code above doesn't perform any casts, and calculates all the bytes in a completely endianness-agnostic manner (see “What is Endianness” section above); and all writes are exactly 1 byte in size, so there is no chance for endianness to manifest itself.

While [Pike2012] argues that all the other marshalling methods represent a fallacy, I'm not so sure about it, and will describe an improvement over byte-by-byte marshalling in a moment.

Further Optimization

When I really care about performance (which I usually do, as server-side handling of billions of network messages per day is quite a significant load), I often add a special handling for those platforms of the most interest, for example (taken from [NoBugs2015]):

void serialize_uint16(DataBlock& data, uint16_t u16) { //assuming little-endian order on the wire 
  uint8_t* ptr = data.grow(2);
#if defined(__i386) || defined(__x86_64__) || defined(_M_IX86) || defined(_M_X64) 
  *(uint16_t*)ptr = u16; // safe and fast as x86/x64 are ok with unaligned writes
#else 
  *ptr++ = (uint8_t)u16;
  *ptr = (uint8_t)(u16 >> 8);
#endif 
}

With this approach, we have the best of both worlds: (a) universal version (the one under #else) which works everywhere, and (b) optimized version for specific platforms which we know for sure will work efficiently there.

Performance analysis

Now, let's analyze relative performance of all three approaches: (a) the one from [Roy2013], (b) LITTLE_ENDIAN/BIG_ENDIAN based, © endianness-agnostic one (see [Pike2012]), and (d) the one from [NoBugs2015]. For the purposes of our analysis let's assume that all the data is already in L1 cache (it is the most common case for continuous reading, if it isn't, penalties will be the same for all the methods). Also, let's assume that L1 reads cost 2-3 clocks (L1 latencies with modern x86 CPUs are around 4-5 clocks, but latency isn't exactly translated to overall execution time, so 2-3 clocks works as a reasonable estimate); writes are usually around 1 clock. Also, we won't count costs of data.grow() for writing and of parser offset management for reading; it can be done in a manner that ensures amortized costs are quite low (of the order of single-digit clocks), and it will be the same regardless of the endianness handling method.

LITTLE_ENDIAN/BIG_ENDIAN-based marshalling is certainly not bad performance-wise: it can be inlined easily, and has a cost of 1 clock for writing and 2-3 clocks for reading.

[Roy2013] causes a function-call-by-pointer on each conversion; most importantly, such calls cannot possibly be inlined. From my experience, on x86 function calls with such parameters usually cost around 15-20 CPU clocks (while CALL/RET instructions are cheap, all the required PUSH/POPs and creating/destroying stack frame, taken together, are not), compared to inlined version. In addition, it will need roughly 1 clock to write the data (and 2-3 clocks to read), making the total for writing around 16-21 clocks and total for reading around 17-23 clocks.

Endianness-agnostic approach can be inlined easily; however, it causes 2 writes/reads instead of 1 (and compilers don't combine them, at least I've never seen a compiler combine 2 writes/reads), which is normally translated into 2 clocks for writing and 4-6 clocks for reading; also, it requires shifts and casts, which also cost around 2 additional clocks, making totals for writing around 4 clocks and total for reading around 6-8 clocks.

[NoBugs2015] is optimized for x86, and can be inlined too; same as LITTLE_/BIG_ENDIAN one, it has cost of 1 clock for writing and 2-3 clocks for reading.

Of course, the analysis above is very approximate and there are other things not taken into account (such as larger size of inline functions, which in general may affect caches), but I feel that these other considerations in most cases won't affect the overall picture.

Implementation Notes

One thing to note when implementing marshaling, is that in most cases it is simpler to do it using unsigned integers rather than signed ones; while using signed types isn't formally a bad thing, in practice it tends to cause trouble. Not that it isn't possible to implement marshalling with signed ints - it is just simpler to implement it with unsigned ints, with one less thing to worry about. For a list of troubles which can be caused by using signed stuff for marshalling - see a comment by SICrane below; however, you don't really need to care about them - just use unsigned and you'll be fine :-).

Another thing to keep in mind is to use those guaranteed-size types such as uint32_t and uint16_t (rather than int and short). You never know where your code will be compiled in 5 years from now, and just recently I've seen a guy who needed to fix his code because when compiling for AVR8, sizeof(int) is 2 (but sizeof(uint32_t) is always 4, regardless of the platform).

Summary

Properties of the three ways to handle endianness can be summarized in the following table:

	Applicability	Clocks on x86, write uint16_t	Clocks on x86, read uint16_t	Clocks on x86, write uint32_t	Clocks on x86, write uint32_t Clocks on x86, read uint32_t
[Roy2013]	Only C-structure based	16-21	17-23	16-21	17-23
LITTLE_/BIG_ENDIAN	Only C-structure based	1	2-3	1	2-3
Endianness-agnostic	Both C-structure based and serialization	4	6-8	8	12-16
[NoBugs2015]	Both C-structure based and serialization	1	2-3	1	2-3

Of course, on non-x86 platforms the picture won't be as good for [Nobugs2015] as it is written, but it will still perform exactly as the endianness-agnostic one, and there will be an option to optimize it for a specific platform (in the manner similar to x86 optimization) if necessary and possible.

References

[Roy2013] Promit Roy, Writing Endian Independent Code in C++, 2013, http://www.gamedev.net/page/resources/_/technical/general-programming/writing-endian-independent-code-in-c-r3301
[WikipediaEndianness] https://en.wikipedia.org/wiki/Endianness
[WikipediaVLQ] https://en.wikipedia.org/wiki/Variable-length_quantity
[Pike2012] Rob Pike, The Byte Order Fallacy, 2012, http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
[NoBugs2015] 'No Bugs' Hare, 64 Network DO's and DON'Ts for Game Engine Developers, Part IIa, Protocols and APIs, 2015 http://ithare.com/64-network-dos-and-donts-for-game-engine-developers-part-iia-protocols-and-apis/

Article Update Log

Jun 5, 2015: Minor change to explanation "why signed is bad" in Implementation Detail section
Jun 4, 2015: Added LITTLE_/BIG_ENDIAN approach to those being analyzed
Jun 4, 2015: Added 'Implementation Notes' section
Jun 2, 2015: Initial release

↧

How to Expand Games (and more) Eluding App Store Updating

June 10, 2015, 1:40 am

≫ Next: 12 mistakes of The Powder Toy Simulator

≪ Previous: Writing Efficient Endian-Independent Code in C++

One key to video game success is to be constantly releasing small updates. This alone might be a compelling reason to develop expandable games.

There are a plethora of reasons why games are successful, but what’s for sure is that players want to feel their favorite games are alive. They demand you to solve annoying bugs that affect gameplay, they like to have extra content to play through, and they love to see when you are able to introduce community proposals into the game!

This ideal situation means:

You are going to be dealing with the app stores very frequently, which will delay your updates.
This will impact your ability to appropriately schedule updates, limiting your marketing strategy.

It’s not all bad news though. There are ways to add new content to your game and having control of when new updates are launched at the same time, so you can keep your customers happy and make reliable marketing campaigns.

What are expandable games used for

Imagine you are creating a game with lots of levels, say a match-three game, such as Candy Crush. You’ll have to design a lot of levels. Developing triple digit levels takes a lot of time, and you still have to figure out the kind of levels that are going to work best with your audience.

Should you spend all your efforts, money and brains designing all levels for the very first game version?

You could do all of them at once, but luckily, you don’t have to. You can set a system that allows you to publish new levels independently of when new versions are published to the stores.

When well structured, using a downloadable content system will help you free your team from publishing all the small tweaks and updates to the stores. Following I’ll detail how you can expand your game using DLC.

First steps to develop expandable games and bypass stores

This is the basic outline you should consider to expand your games. I’ll stick to the match-three example I started with, but it can be applied to any kind of game you’re developing.

Step One

Use a mechanism to define your levels with an XML file or JSON document.

It could be the complete level definition, or just a meta description and a binary file to be downloaded and imported into your game.
You could include any information you need:
- Name
- Difficulty
- Level order in a map
- Images shown to the user
- Price in virtual coins (when needed)
- Publish date (for extra control)
- Version of the content

Step Two

Make your game check for new content every time it’s executed.

Either from the splash screen or the map screen.
In case the new levels are premium content, you could check while players are at the in-game shop.

Step Three

Download new or updated content when it is found, and import it into your game. This is where DLC comes into play.

Step Four

Let the user know that new content is available.

Automatically scroll the map screen to show the new levels, whether they’re locked or unlocked.
Set a “New” badge, or banner notification on the content category that got updated in your shop.

Now that you know the mechanism, publishing new content should be easy:

Create the new content
Create the XML file or JSON document that defines it
Make it available on the content server

By doing this, all your active players will have access to new content automatically.

What can you make of this mechanism

The advantages of developing your match-three game (or any game) in small pieces of downloadable content are plenty! You get a more flexible game that can adapt faster to make players happier about the game.

Let’s dive into some scenarios in which it is beneficial for you as well.

Scenario #1 Difficulty adjustment

Let’s say your analytics reveal your game has a severe player drop at level 6, indicating that maybe this level is too difficult for most players. You don’t want to have a majority of frustrated players, because they could decide to cop out of the game.

You could address this player drop rate by adjusting some key parameters in the level. In a matter of hours, all players in all devices would have the updated level 6, avoiding all store delays. Having a quick reaction to player behavior is really important to optimize your business model.

Scenario #2 Bug fixing

Some bugs, especially in the first versions of the game, are unavoidable. Imagine you have a bunch of players complaining about an annoying bug that happens only in certain levels.

If the bug is easy to fix, you could have the error free level up and running as soon as you solve it. It makes your players happier, and hopefully, more loyal to the game they like playing.

Scenario #3 Game extension

As mentioned before, you don’t have to release the complete package of levels at once. You could launch with a fraction of the total, and make your game download follwing levels according to the player’s progression.

What are the upsides?

Your game is smaller, meaning it’s easy to download and test. Win-win situation.
Your game is ligther on the device because it simply takes the minimum necessary space. In case a player needs to free disk space to install new games or apps, your game will appear at the bottom of the disk-usage list, and will be more likely to live another day. Once again, a winning situation.
Your game is updated and players will have downloaded the levels only when needed. They will always have the latest version, which is bugfixed, and with the latest additions to keep them loving the game. Player engagement is also a winning situation.

Once you get the hang of the technique, I’m sure you’ll be able to implement more complex solutions and deliver players a better gaming experience.

That's all for now! I hope this introduction on how to expand your games got you thinking of new interesting ways of using DLC and server code in your games.

How do plan to use this? Please, leave a comment.

This was originally posted in Gamedonia blog.

↧

12 mistakes of The Powder Toy Simulator

May 21, 2015, 2:07 am

≫ Next: Calling Functions With Pre-Set Arguments in Modern C++

≪ Previous: How to Expand Games (and more) Eluding App Store Updating

The Powder Toy is a free physics sandbox game, which simulates air pressure and velocity, heat, gravity and a countless number of interactions between different substances. The game provides you with various building materials, liquids, gases and electronic components which can be used to construct complex machines, guns, bombs, realistic terrains and almost anything else.

You can browse and play thousands of different saves made by the community or upload your own. However, not everything is that good in the game: for a small project of about 350 files, it triggers too many warnings from our static analyzer. In this article, I'm going to show you the most interesting issues found in the project.

The Powder Toy was checked by PVS-Studio. The project is built under Windows in msys with the help of a Python script - that's why we had to use a special utility PVS-Studio Standalone to do the check. To learn more about the standalone version, see the article: PVS-Studio Now Supports Any Build System under Windows and Any Compiler. Easy and Right Out of the Box.

Analysis results

V501 There are identical sub-expressions to the left and to the right of the '||' operator: !s[1] ||!s[2] ||!s[1] graphics.cpp 829

void Graphics::textsize(const char* s, int& width, int& height)
{
  ....
  else if (*s == '\x0F')
  {
    if(!s[1] || !s[2] || !s[1]) break;     //<==
    s+=3;                                  //<==
  }
  ....
}

At a certain condition, a series of three items of an array of characters are to be checked, but because of a typo, item s[3] can't be checked, which is probably the reason for the program's incorrect behavior in certain situations.

V523 The 'then' statement is equivalent to the 'else' statement. button.cpp 142

void Button::Draw(const Point& screenPos)
{
  ....
  if(Enabled)
    if(isButtonDown || (isTogglable && toggle))
    {
      g->draw_icon(Position.X+iconPosition.X,
                   Position.Y+iconPosition.Y,
                   Appearance.icon, 255, iconInvert);
    }
    else
    {
      g->draw_icon(Position.X+iconPosition.X,
                   Position.Y+iconPosition.Y,
                   Appearance.icon, 255, iconInvert);
    }
  else
    g->draw_icon(Position.X+iconPosition.X,
                 Position.Y+iconPosition.Y,
                 Appearance.icon, 180, iconInvert);
  ....
}

This is a function fragment with suspiciously similar code blocks. The conditional expression contains a series of logical operations, so I assume that it's not that this code fragment contains a pointless check, but there is a typo in the next to last function parameter 'draw_icon()'. That is, a value other than 255 should be written somewhere.

Similar fragments:

V523 The 'then' statement is equivalent to the 'else' statement. luascriptinterface.cpp 2758
V523 The 'then' statement is equivalent to the 'else' statement. searchview.cpp 305

V530 The return value of function 'empty' is required to be utilized. requestbroker.cpp 309

std::vector<Request*> Children;

RequestBroker::Request::~Request()
{
  std::vector<Request*>::iterator iter = Children.begin();
  while(iter != Children.end())
  {
    delete (*iter);
    iter++;
  }
  Children.empty();             //<==
}

Instead of clearing the vector, the programmer called the 'empty()' function that doesn't change it. Since the code is inside a destructor, this error doesn't seem to affect program execution in any way. But I still thought this issue to be worth mentioning.

V547 Expression 'partsData[i] >= 256' is always false. The value range of unsigned char type: [0, 255]. gamesave.cpp 816

#define PT_DMND 28
//#define PT_NUM  161
#define PT_NUM 256

unsigned char *partsData = NULL,

void GameSave::readOPS(char * data, int dataLength)
{
  ....
  if(partsData[i] >= PT_NUM)
    partsData[i] = PT_DMND; //Replace all invalid elements....
  ....
}

This code contains a suspicious piece only its author can conceive. Earlier, if the i-th item of the 'partsData' array was larger than or equal to 161, the value 28 was used to be written into the item. Now, the constant 161 is commented out and replaced with 256, which causes the condition to never be true as the maximum value of 'unsigned char' is 255.

V547 Expression is always false. Probably the '||' operator should be used here. previewview.cpp 449

void PreviewView::NotifySaveChanged(PreviewModel * sender)
{
  ....
  if(savePreview && savePreview->Buffer &&
     !(savePreview->Width == XRES/2 &&           //<==
       savePreview->Width == YRES/2))            //<==
  {
    pixel * oldData = savePreview->Buffer;
    float factorX = ((float)XRES/2)/((float)savePreview->Width);
    float factorY = ((float)YRES/2)/((float)savePreview->Height);
    float scaleFactor = factorY < factorX ? factorY : factorX;
    savePreview->Buffer = Graphics::resample_img(....);
    delete[] oldData;
    savePreview->Width *= scaleFactor;
    savePreview->Height *= scaleFactor;
  }
  ....
}

Thanks to pure luck, part of the condition is always true. It is very probable that we are dealing with a typo here: perhaps it was the '||' operator that should have been used instead of '&&', or 'savePreview->Height' should be checked in one of the cases, for example.

V560 A part of conditional expression is always true: 0x00002. frzw.cpp 34

unsigned int Properties;

Element_FRZW::Element_FRZW()
{
  ....
  Properties = TYPE_LIQUID||PROP_LIFE_DEC;
  ....
}

Everywhere in the code, bit operations are performed over the 'Properties' variable but in two places '||' is used instead of '|'. It means that 1 will be written into Properties there.

Here's another issue of this kind:

V560 A part of conditional expression is always true: 0x04000. frzw.cpp 34

V567 Undefined behavior. The 'sandcolour_frame' variable is modified while being used twice between sequence points. simulation.cpp 4744

void Simulation::update_particles()
{
  ....
  sandcolour_frame = (sandcolour_frame++)%360;
  ....
}

The 'sandcolour_frame ' variable is used twice in one sequence point. It results in an unpredictable result of such an expression. To learn more, see the description of the V567 diagnostic.

V570 The 'parts[i].dcolour' variable is assigned to itself. fwrk.cpp 82

int Element_FWRK::update(UPDATE_FUNC_ARGS)
{
  ....
  parts[i].life=rand()%10+18;
  parts[i].ctype=0;
  parts[i].vx -= gx*multiplier;
  parts[i].vy -= gy*multiplier;
  parts[i].dcolour = parts[i].dcolour;              //<==
  ....
}

Suspicious initialization of a field to its own value.

V576 Incorrect format. Consider checking the third actual argument of the 'printf' function. To print the value of pointer the '%p' should be used. powdertoysdl.cpp 3247

int SDLOpen()
{
  ....
  SDL_SysWMinfo SysInfo;
  SDL_VERSION(&SysInfo.version);
  if(SDL_GetWMInfo(&SysInfo) <= 0) {
      printf("%s : %d\n", SDL_GetError(), SysInfo.window);
      exit(-1);
  }
  ....
}

To print a pointer, the %p specifier should be used.

V595 The 'gameSave' pointer was utilized before it was verified against nullptr. Check lines: 1063, 1070. gamecontroller.cpp 1063

void GameController::OpenLocalSaveWindow(bool asCurrent)
{
  Simulation * sim = gameModel->GetSimulation();
  GameSave * gameSave = sim->Save();                        //<==
  gameSave->paused = gameModel->GetPaused();
  gameSave->gravityMode = sim->gravityMode;
  gameSave->airMode = sim->air->airMode;
  gameSave->legacyEnable = sim->legacy_enable;
  gameSave->waterEEnabled = sim->water_equal_test;
  gameSave->gravityEnable = sim->grav->ngrav_enable;
  gameSave->aheatEnable = sim->aheat_enable;
  if(!gameSave)                                             //<==
  {
    new ErrorMessage("Error", "Unable to build save.");
  }
  ....
}

It would be more logical to check the 'gameSave' pointer for being null first and only then fill the fields.

A few other similar issues:

V595 The 'newSave' pointer was utilized before it was verified against nullptr. Check lines: 972, 973. powdertoysdl.cpp 972
V595 The 'gameSave' pointer was utilized before it was verified against nullptr. Check lines: 1271, 1278. gamecontroller.cpp 1271
V595 The 'gameSave' pointer was utilized before it was verified against nullptr. Check lines: 1323, 1330. gamecontroller.cpp 1323
V595 The 'state_' pointer was utilized before it was verified against nullptr. Check lines: 220, 232. engine.cpp 220

V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] userSession;'. apirequest.cpp 106

RequestBroker::ProcessResponse
APIRequest::Process(RequestBroker & rb)
{
  ....
  if(Client::Ref().GetAuthUser().ID)
  {
    User user = Client::Ref().GetAuthUser();
    char userName[12];
    char *userSession = new char[user.SessionID.length() + 1];
    ....
    delete userSession;          //<==
  }
  ....
}

Operators new, new[], delete, and delete[] should be used in corresponding pairs, i.e. a correct way to write this code is as follows: "delete[] userSession;".

It's not the only issue of this kind in the project:

V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] userSession;'. webrequest.cpp 106
V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] workingDirectory;'. optionsview.cpp 228

V614 Uninitialized pointer 'ndata' used. simulation.cpp 1688

void *Simulation::transform_save(....)
{
  void *ndata;
  ....
  //ndata = build_save(....); //TODO: IMPLEMENT
  ....
  return ndata;
}

Until the intended modification of this fragment is carried out, the function will keep returning an uninitialized pointer.

Another similar place:

V614 Potentially uninitialized pointer 'tempThumb' used. saverenderer.cpp 150

Conclusion

The Powder Toy is an interesting cross-platform project that can be used for game, education and experiments. Despite its small size, I found it interesting to look into it. I hope the authors will find time to carry out analysis of the source code and study the complete analysis log.

Using static analysis regularly will help you save plenty of time to solve more serious tasks and TODO's.

↧

Calling Functions With Pre-Set Arguments in Modern C++

June 9, 2015, 12:04 pm

≫ Next: Cache In A Multi-Core Environment

≪ Previous: 12 mistakes of The Powder Toy Simulator

Introduction

A good fellow of mine gave me this interesting problem: pass a pre-stored set of arguments into a function without using std::function. I'd like to share with you my solution to this problem. Please, don't judge it strictly. I've never meant it to be perfect or finished for production use. Instead, I wanted to do everything as simple as possible, minimalistic but sufficient. Besides, there will be two solutions in this article. And one of them I like more than the other.

Implementation

Good Solution

The first way of solving the task exploits the fact that C++ already has a mechanism that allows us to capture variables. I talk about lambda functions. Of course, it would be great to use lambdas for this task. I'd show you a simple code snippet that has a lambda in it, just in case some of you are not familiar with C++14:

auto Variable = 1;

auto Lambda = [Variable]() {
    someFunction(Variable);
};

A lambda function is being created in this call. This lambda captures the value of the variable named Variable. The object of the lambda function is being copied into a variable named Lambda. One can later call the lambda through that variable. A call to lambda will look like this:

Lambda();

It seems at first that the problem is solved, but really it's not. A lambda function can be returned from a function, a method or another lambda function, but it is hard to pass a lambda as an argument unless the receiver of that argument is a template.

auto makeLambda(int Variable) {
    return [Variable]() {
        someFunction(Variable);
    };
}

auto Lambda = makeLambda(3);

// What should be the signature of someOtherFunction()?
someOtherFunction(Lambda);

Lambda functions are objects of anonymous types. They have an internal structure which only the compiler knows of. Pure C++ (I mean C++ as a language without its libraries) does not give a programmer much operations at hand:

a lambda can be called;
a lambda can be converted to a function pointer, when the lambda is not capturing anything;
a lambda can be copied.

Frankly speaking, these operations are more than enough, because there are other mechanisms in the language which when combined give us a lot of flexibility. Let me share with you the solution to the problem which I ended up with.

#include <utility>
#include <cstdint>
#include <vector>

template <typename Function> class SignalTraits;

template <typename R, typename... A> class SignalTraits<R(A...)> {
public:
  using Result = R;
};

template <typename Function> class Signal {
public:
  using Result = typename SignalTraits<Function>::Result;

  template <typename Callable> Signal(Callable Fn) : Storage(sizeof(Fn)) {
    new (Storage.data()) Callable(std::move(Fn));

    Trampoline = [](Signal *S) -> Result {
      auto CB = static_cast<Callable *>(static_cast<void *>(S->Storage.data()));
      return (*CB)();
    };
  }

  Result invoke() { return Trampoline(this); }

private:
  Result (*Trampoline)(Signal *Self);

  std::vector<std::uint8_t> Storage;
};

I'll explain briefly what is happening in that code snippet: the created non-capturing lambda function knows the type of Callable because it (the lambda) is being constructed in the template constructor. That's why the lambda is able to cast the data in Storage to the proper type. Really, that's it. All the hard lifting is done by the compiler. I consider this implementation to be simple and elegant.

Not So Good Solution

I like the other solution less, because it is filled with handmade stuff. And all that stuff is needed to capture variables, something C++ language already does for us out of the box. I don't want to spend a lot of words on this, so let me show you the implementation, which is large and clumsy.

#include <cstdarg>
#include <cstdint>
#include <vector>

template <typename T> struct PromotedTraits { using Type = T; };
template <> struct PromotedTraits<char> { using Type = int; };
template <> struct PromotedTraits<unsigned char> { using Type = unsigned; };
template <> struct PromotedTraits<short> { using Type = int; };
template <> struct PromotedTraits<unsigned short> { using Type = unsigned; };
template <> struct PromotedTraits<float> { using Type = double; };

template <typename... Arguments> class StorageHelper;

template <typename T, typename... Arguments>
class StorageHelper<T, Arguments...> {
public:
  static void store(va_list &List, std::vector<std::uint8_t> &Storage) {
    using Type = typename PromotedTraits<T>::Type;
    union {                                       
      T Value;                                    
      std::uint8_t Bytes[sizeof(void *)];         
    };                                            
    Value = va_arg(List, Type);
    for (auto B : Bytes) {
      Storage.push_back(B);
    }
    StorageHelper<Arguments...>::store(List, Storage);
  }
};

template <> class StorageHelper<> {
public:
  static void store(...) {}
};

template <bool, typename...> class InvokeHelper;

template <typename... Arguments> class InvokeHelper<true, Arguments...> {
public:
  template <typename Result>
  static Result invoke(Result (*Fn)(Arguments...), Arguments... Args) {
    return Fn(Args...);
  }
};

template <typename... Arguments> class InvokeHelper<false, Arguments...> {
public:
  template <typename Result> static Result invoke(...) { return {}; }
};

struct Dummy;

template <std::size_t Index, typename... Types> class TypeAt {
public:
  using Type = Dummy *;
};

template <std::size_t Index, typename T, typename... Types>
class TypeAt<Index, T, Types...> {
public:
  using Type = typename TypeAt<(Index - 1u), Types...>::Type;
};

template <typename T, typename... Types> class TypeAt<0u, T, Types...> {
public:
  using Type = T;
};

template <typename Function> class Signal;

template <typename Result, typename... Arguments>
class Signal<Result(Arguments...)> {
public:
  using CFunction = Result(Arguments...);

  Signal(CFunction *Delegate, Arguments... Values) : Delegate(Delegate) {
    initialize(Delegate, Values...);
  }

  Result invoke() {
    std::uintptr_t *Args = reinterpret_cast<std::uintptr_t *>(Storage.data());
    Result R = {};
    using T0 = typename TypeAt<0u, Arguments...>::Type;
    using T1 = typename TypeAt<0u, Arguments...>::Type;
    // ... and so on.
    switch (sizeof...(Arguments)) {
    case 0u:
      return InvokeHelper<(0u == sizeof...(Arguments)),
                          Arguments...>::template invoke<Result>(Delegate);
    case 1u:
      return InvokeHelper<(1u == sizeof...(Arguments)),
                          Arguments...>::template invoke<Result>(Delegate,
                                                                 (T0 &)Args[0]);
    case 2u:
      return InvokeHelper<(2u == sizeof...(Arguments)),
                          Arguments...>::template invoke<Result>(Delegate,
                                                                 (T0 &)Args[0],
                                                                 (T1 &)Args[1]);
      // ... and so on.
    }
    return R;
  }

private:
  void initialize(CFunction *Delegate, ...) {          
    va_list List;                                      
    va_start(List, Delegate);                          
    StorageHelper<Arguments...>::store(List, Storage); 
    va_end(List);                                      
  }                                                    

  CFunction *Delegate;

  std::vector<std::uint8_t> Storage; 
};

As for me, the only interesting things are the two helper classes: StorageHelper and InvokeHelper. The first class combines ellipsis with type list recursive algorithm to put arguments into Storage. The second class provides a type safe way of fetching arguments from that storage. And there's a tiny important detail: ellipsis promotes some types to others. I.e. float is promoted to double, char to int, short to int, etc.

Summary

I'd like to make a kind of a summary: I don't think the two solutions are perfect. They lack a lot and they try to reinvent the wheel. I'd say that the best way to pass pre-stored arguments into a function would be to use std::function + lambda. Though, as a mind exercise the problem is a lot of fun indeed.

I hope you liked what you read and learned something useful for you. Thanks a lot for reading!

Article Update Log

9 June 2015: Initial release

↧

Cache In A Multi-Core Environment

June 12, 2015, 10:14 pm

≫ Next: How to Pitch Angry Birds, If It Were an Unknown Indie Game

≪ Previous: Calling Functions With Pre-Set Arguments in Modern C++

Cache In A Multi-Core Environment

In my previous article I discussed the use of cache and some practices that can provide increased performance while also teaching you what cache is. I also stated that cache in a multicore environment is a whole other topic so I’ve written this article to cover the different considerations that come along with Multicore Programming.

Why does it matter if we’re using two cores?

Cache comes in levels, typically 3 each with their own group of cores that can access it, L1 Cache is only visible to a single core with each core having it’s own private cache and is the fastest of all caches. L2 Cache is usually visible to a group of cores, for instance the AMD 8150 shares L2 Cache between two cores and finally there’s L3 Cache that is accessible to all cores and is the slowest of caches, but still much faster in comparison to RAM.

Now that we know that there are different banks of cache for each core, what happens when two cores are accessing the same memory? If there was no system in place then both cores would cache the memory, then lets say one core wrote to that memory, This would be visible in memory, although the other core would still have it’s cache of the old value. To solve this when a core writes to it’s cached memory; Any other core that stores that cache line will be removed or updated, which is where our problem comes into play.

Let’s say you have 2 Integers on a single cache line and each core was writing to an integer each that are next to each other in an array, although they’re not the same variables and it won’t cause any unexpected results, because they’re on the same cache line. Every time one core writes to that memory, the other core loses its cache. This is referred to as False Sharing and there’s a simple solution to this, the hardest part is determining if you’re having this problem.

False Sharing

False Sharing can hinder the performance for any program. For this example I’ll go through the optimisations I did on a single producer single consumer queue and provide a few steps to solving most of your False Sharing problems. To test the queue I have two threads, one writing integers from 0 to 1 million and another reading them and checking if they’re all in order. The queue doesn’t undergo any resizing and is allocated with enough capacity for 1 million objects.

template<typedef T>
class alignas(64) Queue{
    T* data_;
    size_t push_position_;
    size_t pop_position_;
    std::atomic<size_t> size_;
    size_t capacity_;
};

The problem with this code is that all the variables are packed together with no spacing together, the whole structure would fit on up to two cache lines. This is perfect if we’re in a single core environment, although separate cores access pop_position and push_position therefor there’s high contention between these cache lines in a multicore Environment.

I break the problem down into a shared read section; shared write section and one section for each thread. A section may be larger than a single cache line and may require two cache lines to implement, it’s for this reason I call them a section. Our first step would be to determine what memory belongs to what section. With data_ and capacity_ both being shared, but rarely written to, they therefor belong to the shared read cache line, size_ is the only variable that is a shared write and push and pop both belong to their own cache line as each thread uses one each. In this example that leaves us with 4 cache lines. This leaves us with

template<typedef T>
class alignas(64) Queue{
    // Producers C-Line
    size_t push_position_;
    char pad_p[64 - sizeof(size_t)];
    // Consumers C-Line
    size_t pop_position_;
    char pad_c[64 - sizeof(size_t)];
    // Shared Read C-Line
    T* data_;
    size_t capacity_;
    char pad_sr[64 - sizeof(size_t) - sizeof(T*)];
    // Shared Write C-Line
    std::atomic<size_t> size_;
    char pad_sw[64 - sizeof(std::atomic<size_t>)];
};

Notice that the alignas(n) this is a new keyword added in C++14. The keyword ensures that the structure is aligned to a multiple of n bytes in memory and therefor allows us to assume that our first variable will be placed at the start of a cache line, which is vital for our separation.

Before accounting for False sharing, to push and pop 1 million Integers it took 60ms, but after accounting for False Sharing it’s been reduced to 34ms on a Intel Core I5 3210M @ 2.5Ghz. The majority of the time comes from the atomic access, which we use to check to see if there’s room to push and anything to pop. You could potentially optimise the atomic access out of most of the pushes by remembering how many objects can be pushed and popped until your next size check, this way we can lower the atomic access and dramatically improve performance again.

Example Source

While on the same subject of False Sharing, another example would occur when storing data within an array and having a number of threads access that array. Lets think about pool of threads which keep count how much work they’ve done and store it in an array. We need access to these variables to check how much work’s been done while running.

	int work_done[n];

An easy mistake to make, but would result in a plethora of cache misses. As each core goes to increment it’s work_done it would invalidate the other cores cache. A solution to this would be to turn the array into an array of pointers to store a pointer to a local variable inside the thread, this would require that we pass a pointer to work_done so we can populate that pointer with the address of the local variable. From a synthetic test where the worker thread is only iterating on work_done, we can see over 5 seconds of iteration across 4 cores we get a result of ~890M iterations per core while once we’d accounted for False Sharing and utilized local variables we get a result of ~1.8B iterations per core which is a ~2x improvement on the I5 3210M @ 2.5Ghz. The same test on an AMD 8150 @ 4.2Ghz reached 44M iterations with False Sharing, while without we reached 2.3B iterations which is a shocking ~52x improvement in speed, I had to double check this result because it’s left me in disbelief**! In this case we use a local variable instead of padding between all the variables to save space, but both would work equally as well.

Example Source

Volatile Data

Another problem that isn’t exactly cache related, although it’s still important is the use of volatiles. A register is a tiny piece of memory that’s specific to each core and can be accessed in typically 1 cycle. The key difference between a register and cache is that a register is local memory and isn’t supposed to be a fast clone of what’s in RAM like cache is. A program will quite often copy a value from RAM into a register, work on it there and then write back to RAM with the result. In a single threaded program this is fine, but in the situation where you have two cores editing the same integer, both may take a local copy of the integer, edit it and then write back. Each core would be overwriting the other’s result, because they never saw the other cores work. C++ has a solution to this assuming that you’ve prevented both cores from reading and writing at the exact same time, because they can still interfere, if this assumption isn't possible than a std::atomic is required. The volatile keyword tells the compiler to never assume that the value hasn't been changed, this prevents specific optimisations that would break your code, but now I’m getting into Multicore Programming and not cache for multicore programs.

	volatile int x = 0;

Summary

Only use an Atomic when necessary, check if a volatile will meet your needs first
Keep classes with multicore access segmented by cache lines to eliminate False Sharing
Local variables are preferred over sharing data outside of the thread

Conclusion

False Sharing can be a problematic side affect of multicore programming which should be a consideration whenever two cores use data in proximity to one another. From these tests on an Intel 3210M we can see that by eliminating False Sharing we receive a ~2x performance boost, obviously this would differ on different hardware. A tool that can be useful with multicore programming is volatile variable, although these have their own issues when two cores write to the variable, they can still be a useful replacement to an atomic in read only situations.

Notes

* AMD 8150 is tested on Windows 8.1 with Visual Studio 2013 and Intel 3210M is tested on OSX 10.10 LLVM 6.1.0.

** After seeing such a large difference, I went looking for a cause to such dramatic performance loss; I found that the L3 cache on the Bulldozer architecture is broken into 2MB per module (2 cores) that cannot be accessed by other modules [1]. Sharing would result in a cache miss all the way to RAM, while the I5 3210M shares it L3 Cache between all cores and would only need to go to the L3 cache in the case of a cache miss. This wouldn’t be a problem if the operating system had placed the threads onto the two cores in a module. I kept running the tests with 2 threads until the result went from 44M to 1.2B per thread, assuming that in these cases the threads were placed on the same module and therefor shared L3 cache. This is a perfect example of the importance of testing your software on different hardware.

[1] isscc.org/doc/2011/isscc2011.advanceprogrambooklet_abstracts.pdf pg. 40

↧

How to Pitch Angry Birds, If It Were an Unknown Indie Game

February 10, 2015, 2:05 pm

≫ Next: Designing a Mobile Game Technology Stack

≪ Previous: Cache In A Multi-Core Environment

It’s hard to imagine, but at some point Angry Birds had 0 downloads. It was released by the Rovio game studio in December of 2009 as the company’s 52nd game. Since then, it has been downloaded over 1 billion times. At the Red Fox Clan, we thought it would be fun to pretend that Angry Birds was still an unknown indie game as a template for how you can promote your game (aka the next Angry Birds). We’re specifically going to write a pitch that would be emailed to reporters online. If done well, this can be a high impact, low cost solution for getting your game out there.

*This first blog post covers how to write your subject line. If you're interested in the full white paper, visit the www.redfoxclan.com/resources

Type of Game

We’re going to stay as simple as possible and call it a puzzle game. For your game, I would also simplify it as much as possible. Is it a board game, racing game, sports game? (HINT: If you get stumped, visit the app store to find similar apps and use the categories they are listed under.)

< 3 Word Descriptor

We often describe other games as Angry Birds type games, but starting from scratch we have to determine what makes Angry Birds different than a “word game” puzzle or a “brain teaser” puzzle. Using our hint from before, we see that Rovio considered Angry Birds an arcade puzzle in the app store, but to get a little bit more descriptive and fun, we’ll call it a Slingshot Puzzle.

The unique benefit in < 10 Words

The unique benefit is all about storytelling. We have to get across why your game is different than the other puzzle games on their phones, and give them enough interesting details to have them open the email. This took us a while to come up with, but because we have already identified it as a slingshot puzzle, we can decide to give more details about the characters involved and the fun we’re trying to get across. We landed upon "Topples towers filled w/ greedy pigs to wreak havoc".

The finished subject line of our email is now, “Slingshot Puzzle 'Angry Birds', Topples towers filled w/ greedy pigs to wreak havoc.” By providing this, we’re letting the reporter known what type of game they’re about to learn about, its name, and what makes it different from the thousands of other games they have come into their email.

↧

Designing a Mobile Game Technology Stack

June 20, 2015, 4:16 am

≫ Next: Designing for multi-age coop play

≪ Previous: How to Pitch Angry Birds, If It Were an Unknown Indie Game

A lot of technology goes into developing stable, scalable mobile games. At Rumar Gaming we have invested a lot of time in building a solid platform before even thinking about game ideas. Our goal was to be able to develop a large variety of games in a short period of time by sharing the underlying technology stack.

This article describes the technology stack that we designed and that allows us to release new games rapidly without having to worry about non-gameplay functionality, databases or API hosting.

Stack overview

We separated the technology stack into three main tiers, each consisting of several sub tiers and I’ll discuss each one of them in more detail.

The mobile app
The back-end API
The cloud hosting stack

This is a schematic overview of the stack:

1) The mobile app

When we decided to start Rumar Gaming there was no doubt that we would be using Xamarin for our mobile development. Our games need to support both iOS and Android, so using a cross-platform development environment can cut our development times significantly. In my opinion, Xamarin is by far the most mature option for cross-platform mobile development. Add the fact that I’m an expert at C# and already have experience developing games in Xamarin and it was a done deal.

The mobile app itself consists of three tiers (from the bottom up):

Rumar Framework

This is our custom framework which contains the interfaces and logic that are shared by all the games.

Aside from holding some utility classes, its main responsibility is communication with the API to handle - among other things - device registration, session management, score registration, in-app purchases and advertisements.

The framework is mainly cross-platform, but has some platform-specific functionality on top of it as well. For example, in-app purchases need to be handled differently in iOS and Android.

Game-specific logic

This tier contains cross platform, game-specific logic. We try to put as much of the game logic in here as possible so we only have to develop and manage it once for both platforms.

iOS- & Android-specific logic

You will always need to have separate projects for each of the supported platforms because of platform-specific logic that is required.

2) The back-end API

The games need a back-end that handles things like registration of devices, session management, push notifications, authentication and score tracking. Again, the goal here is to share as much logic as possible through one framework API, but some game-specific functions will get their own API.

We’ve decided to use the .NET Web API framework for this, mainly because of our long history with .NET. The main alternative for us was node.js, which would be somewhat easier to scale, but because of a limited development timeframe we decided not to take the risk of choosing a technology we are not yet comfortable with.

By hosting the API on Windows Server Core instances, we are still able to cut down on hosting costs. More on this will follow in the next section.

3) The cloud hosting stack

No one can predict if (one of) our games will become a hit, although we are certainly doing our best covering all grounds to increase our chances. If a game does become successful, the back-end must be highly scalable. It should not matter whether we have 10 users or 1 million users, the back-end should perform in the same way and it should not require a lot of effort (ideally, not any effort) to scale it up.

So we are hosting the back-end in “the cloud” and we have chosen Amazon Web Services (AWS) for this because we have been really satisfied using it in the past. I’m definitely not suggesting you should not look into other services like Microsoft Azure or Google Cloud Services! AWS was the best fit for us, but it may be different in your situation, so I encourage you to do a comparison yourself first.

Let’s take a look at the moving parts of our cloud hosting stack.

EC2

The API will be hosted on EC2 instances. EC2 stands for Elastic Computing Cloud and offers you virtual machines to host your application. They can be automatically scaled up and down depending on traffic and performance requirements, meaning that – during upscaling - new instances are automatically deployed and added to the load balancer.

By hosting our API on Windows Server Core instances we save money on both licensing and computing. Core instances require less system resources, are deployed faster and because you don’t get a full Windows interface you pay less money for their license (which is integrated in the costs per hour).

Cognito

Cognito is used for authentication and user management. It offers several authentication providers and out of the box functionality for user data synchronization.

When a player starts a game session, we can start storing user data (such as game preferences) on the device. If the player gets authenticated at some point – by creating an account or using a social login provider – the offline user data will be synchronized to the cloud.

S3

Amazon’s Simple Storage Service (S3) is used when an app requires to store blob data such as images or videos. The keyword here, again, is scalability; we don’t need to worry if we store 1 asset or 1 billion assets, it will just work and we only get billed for what we use.

SNS

SNS stands for Simple Notification Service and it’s used to register devices for push notifications and to send the notifications. It supports both iOS and Android so that’s perfect. You only pay when you are sending out notifications and even then it’s free for the first million notifications.

DynamoDB

DynamoDB is AWS’s answer to No-SQL databases. It will be used to keep track of game sessions, progress and high scores. No-SQL doesn’t necessarily have to be the best choice when developing a mobile API, but it is certainly the easiest to scale and very cheap in use. So taking that into account – scalability and cost – DynamoDB seems to be the best choice for us.

Bottom up approach

When talking about developing games, most people expect you to start with a game idea and design the rest around that. The question I get the most from my acquaintances is “do you guys have any cool game ideas yet?”

Well yes, we do have some rough ideas, but that’s not what we are focusing on in the beginning. We are taking a bottom up approach, meaning that we start with setting up the cloud hosting stack, then we develop the framework API and we slowly work our way up to the game logic.

Even though we are really excited to start working on the actual games – which is definitely the most fun thing to work on - in order to create something that will scale and is future proof, we must start at the bottom.

Cost estimation

The monthly cost of this stack depends heavily on the size of your userbase and the requirements of your API, but I have worked out an estimation based on some assumptions.

Development
During development you're good to go with the Free Tier; the free tier is available for the first 12 months.
The numbers below are based on monthly use.

EC2

750 hours per month of EC2 time (so that's one instance for a full month)
t2.micro instance: 1 vCPU with 1 GB RAM (you can run Windows Core on this)

5 GB storage
15 GB data traffic
20.000 Get requests
2.000 Put requests

SNS

Up to 1 million push notifications sent.

Cognito

1 million sync requests per month
10 GB sync storage

Production scenario

I'm making the following assumptions:

You have 1.000 daily users.
Users generate 10.000 sessions daily.
Users generate 500.000 API requests daily.
Each daily user is unique (for sake of Cognito calculation)
A user profile contains 100kb of data.
Each user received 10 push notifications daily.
Each API request triggers 5 database requests.

EC2

500.000 API requests daily will average to around 6 requests per second.
Your API can run fine on a 1 vCPU / 4GB RAM system.
In that case a t2.medium instance will suffice, total costs: $56
Scaling works by adding more machines, thus multiplying these costs.

Cognito

Monthly sync operations: 10.000 sessions * 31 days * 2 syncs per session = 620.000
Monthly charged: (620.000 / 10.000) * $0.15 = $9.30
Profile storage: 31.000 users * 100kb = 3.1 GB * $0.15 = $0.47
Free tier for the first 12 months!
After that: $9.80

SNS

Monthly notifications: 31.000 users * 31 days * 10 notifications = 9.61 million
Monthly charge = 9.61 * $0.50 = $4.80

DynamoDB

Total requests: 500.000 API requests * 5 databases requests * 31 days = 77.5 million
Total storage: 10 GB
Free tier for the first 12 months!
After that: $10 (very hard to estimate, but it's on the high end)

MONTHLY COST: $60.80 (after free tier: $80.60)
If your API requires redundant servers: $116.80 (after free tier: $136.60)

Additional

This is assuming your need blog storage for your game:

You need 1 TB of blob storage, each user does 100 put+get requests daily.
You want full backup of blob storage (highest price).

Storage: 1 TB = $30
Requests (100 GET + 100 PUT per user daily): 3.1 million GET + 3.1 million PUT = $31
Total: $61

I hope you enjoyed this article and can put it to good use during your own mobile game projects. As development of Rumar Gaming progresses, I will regularly write new articles on what we are doing and how we are solving common issues to you run into during (mobile) game development. Any comments and questions are more than welcome and I’ll be happy to give you my feedback on them!

This article was originally posted, in slight different form, on the Rumar Gaming Blog.

↧

Designing for multi-age coop play

June 21, 2015, 9:34 am

≫ Next: Coverage Buffer as main Occlusion Culling technique

≪ Previous: Designing a Mobile Game Technology Stack

Over the course of my time developing disney infinity, I ended up creating a list of player types that we regularly observed when conducting coop playtests.

The key idea that I learned to push on, was attempting to enable different playstyles to play together. It is not about forcing people to play how you want them to play, nor is it the attempt to persuade them to change their play styles, but about facilitating different types to be able to cooperate through your systems.

One note - through the course of this article I refer to the more dominant, more skilled or more experienced player as the 'leading player'.

The Types

The Trailblazer Most commonly male, ages 12 - 15.

This type is the kind of player who wants to charge ahead. These players often accentuate the gap between player types as they race to the next cutscene or mission area, and in a lot of cases, end up stopping the other person from doing what they are doing (usually due to the design of the game systems).

The Easily Distracted Most commonly male, ages 6 - 10, female 8 - 12.

This type is the kind of person who likes to collect and explore. They often wander off and just go about their own business, enjoying playing with the game’s systems as opposed to charging towards an objective.

The Support Most commonly younger siblings or related older coop partners.
Those who enjoy helping the more skilled player by providing help and assisting in secondary ways. This seems to be mostly based on a self-doubting of their own abilities - many quite literally feel being too close to the action spatially is too stressful.

The Dutiful Helper Most commonly female parents.
This is a player who happily tries to aid the more dominant player. Often a parent or younger sibling, or possibly someone who doesn’t play games often. They derive their satisfaction from helping the leading player have fun.

The Left Behind Most commonly much younger female siblings, aged < 10.
Often the result of The Trailblazer, I most witnessed this with very young siblings playing with young teenagers. They mostly want to play with their coop partner but may be unable to follow due to simply not knowing where to go or not being able to overcome obstacles blocking the path there.

The Unskilled No common gender or age.

A player who simply lacks the skills to engage meaningfully with the main game. These are players who are not satisfied with secondary gameplay and actually want to play the main game, but feel frustrated that they can’t keep up with the leading payer.

Those less able to use a traditional gamepad No common gender or age.
Some disabled people need to be catered for so that they can keep pace with other players. This is solely about designing controls for people less able to use gamepads traditionally or with slower reflexes than other players.

The Prankster Older male siblings.
One of my favourites. The person who gets most joy from disturbing the play of others. This is almost always not meant in a nasty or negative way, but if we’re not careful our game design can translate their actions into that.

Designing for each type

So now we have our types, let's look at how we can enable each one to play how they want to while simultaneously cooperating with other player types. Again, it's not about forcing people together, but allowing that to happen naturally.

You'll notice that none of this actually covers coop gameplay mechanics, but rather closing the gap between player types so that true coop gameplay can occur.

Again, this list is by no means definitive and just represents some of the things I've used on previous projects.

How to design for: The Trailblazer

Some games actually tie the second player to the first, or reset the trailing player when they go off screen. These solutions are exactly what makes these players no fun to play with - no one wants to be continually pulled along.

The ideal solution allows these players to forge ahead in ways that don't create an enormous distance, figurative or literal, between them and the other players.

The interesting thing is that their motivation is almost always following what they perceive as the most important rules as set out by the game. When you de-emphasise the so-called 'mainline' the results can very surprising. Presenting the same scenario in playtests but providing different mission text had little effect on other types, but changed what these players did enormously.

Sections of the game where players have to complete a certain amount of activities in order to progress, or push towards any kind of common goal independently, let them forge ahead and show their skill, while still allowing other players catch up and interact with these players.

We also found that unlocking collectibles, side missions and any other number of activities to 'complete' in an area can stall these players until the others catch up IF the completion aspect is emphasised enough. (Think end-of-segment scoreboards, player comparison charts and anything that specifically pushes on these things)

Examples include Left 4 Dead that gives counts at the end of each level - zombies killed, allies helped and so on - all things that require them to stick around their allies. Mario does something similar - the player who did 'best' gets a visible crown, a badge of honour. This is again based on score gained from clearing up the level - an act which often slows this player type down.

Other games, such as Assassin's Creed: Unity present platforming challenges that open up shortcuts at key points in the progression - useful if you fall, and importantly, for helping straggling players to catch up.

How to design for: The Easily Distracted

This type can be facilitated by closely tying alternative activities in your game to the main progression. Making all the sub-objectives count towards a greater goal. In Disney Infinity: Pirates of the Caribbean we had a moment early on, where the players are told they have to earn money to buy their ship - the island they're on then opens up to them, and any and all activities give a payout: from smashing crates and fighting enemies to completing missions.

Of course, this is a very specific example. In more general terms curiosities and charming pieces of content can really keep these players happy while the others push on, and again, if these can benefit the mainline then you've really succeeded. Things like the collectible story fragments in The Last of Us are a good example of the kinf of content this kind of player enjoys seeking out, the only change that needs to be made is for that information to open up more content for both players to benefit from. A keycode to a weapons vault or a new mission for example.

The really interesting thing about these players is that we came to realise they are simply less interested in following the rules set out by the game. They like experimenting, exploring, and playing with and breaking systems. I began to think of these players as the anarchic mirror to the trailblazers very conservative rule-following style, and that distinction really helps understand how to design for them.

Essentially the solution is to plan for those distractions and have them ultimately contribute to the mainline in a significant and noticeable way.

How to design for: The Support

This one is a lot more specific than the previous types. Often unsure of their own abilities they prefer to participate from a safe distance.

Solutions to this were pretty obvious and very successful.

Different layers in the level that provide access to different kind of support equipment from a safe vantage point.
Secondary objectives that take time, but less skill to complete and result in a benefit to the main mission. (for example, something that will power down enemy defences around objectives)
Again being able to do things like collect resources that aid the overall macro objective

This is really all about allowing players things to do away from the heat of the action and is a relatively well explored gameplay type.

How to design for: The Dutiful Helper

This is usually the role of a parent or partner. The obvious answer is to actually provide ways for people to help, bringing resources to the player for example, or many of the aspects from the Support role. Mario Galaxy does this by letting the second player collect currency and subdue enemies for player 1, though I wouldn’t exactly call it a brilliant coop experience.

The real skill here is is to trick these players into having fun. The best example of this I can think of is the original Pirates of the Caribbean ride at Disneyland. They iterated again and again until they had something all the members of the family could enjoy. The role for the parents ended up being one of the navigator - they stood in a place where they could see the whole family and were the only ones who could see an overview of the game and so could give tactical advice. While the children were placed front-and-centre, they derived satisfaction at first from helping their kids have fun, but ultimately from the belief that their guidance actually helped the family win the game.

The real art, is to trick the person who was planning to get all their satisfaction from helping others into actually enjoying the gameplay you design for them.

Unlike the support, these players don't mind being placed in a role that is not so like the originals game - hence assuming different points of view, (for example looking at a map and tagging enemies, or using a second screen app to help), is something that often works extremely well.

How to design for: The Left Behind

Often players who find the game too confusing to know where to go next, or too hard to get there. These players differ from the Easily Distracted in that their priority is to be near other players.

Solutions include always showing a marker of where the other player is and, at large distances, indicating how the player can get to them. As with the Trailblazer, here we see the same solution for the opposite reason: Other players opening shortcuts that the second player can use to follow more easily.

In more linear games, time buffer activities that occur before a moment of progression, such an area transition, can work extremely well. For example, a brawl with a bunch of enemies that only stop respawning when the other player arrives OR after a few minutes have passed gives the second player time to catch up.

Though quite specific to a certain type of game, we found that spawning power-ups outside of these areas as a bonus for the slower player really let them jump into the action with a satisfying punch when they did arrive.

Note: we found, although at first counterintuitive, you almost never want to do the opposite and have points where the two players HAVE to be together, because the less skilled player types can often just make it completely impossible to progress.

How to design for: The Unskilled

We make a distinction here because these players don't want to play a secondary element of the game like the support and helper roles often do. These are players who are simply not as skilled as the same activities as the leading player.

Solutions are subtler but actually along the same lines as other types - mechanics that allow one player to be seemingly close to the action while pushing on less skill-based play simultaneously. For example, sniping in a traditional shooter (you are more removed from the action).

Also providing score metrics for other achievements in combat (or alternativly simple other things that aren’t your primary gameplay mechanic). I believe the 'assist' reward in many games cater to this - though we are essentially catering for a different role, it is so closely tied to the core mechanics of the game, players can feel like they are as necessary as the leading player.

How to design for: Those less able to use a traditional gamepad

The reason this player type is listed here is simply because many of the control schemes that enable disabled people to play computer games have inherent disadvantages compared to default schemes. We should try and compensate so as to allow both parties to play on similar levels.

In very basic terms, often these schemes involve using just half of the control pad and only one set, or none of the triggers. Because of this, the schemes often involve characters automatically using some actions when in the right range, or allowing the user to switch between sets of actions, such as moving or shooting.

The important thing here is to compensate for these restrictions so that the game can be fun. This can involve things such as:

Putting in code for things like making characters automatically jump when reaching the edge of platforms as it may be difficult to complete actions with tight time restrictions.
Modifications for boss fight phases: if you have boss fights with phases that have timeouts then remove or lengthen the timeout.
Enemy AI modifying their behaviour based on which target they are focusing on, or preferring different targets all together (not necessarily the other player). Note here that in order to try and keep this as subtle as possible, lean towards altering less noticeable aspects like reaction times or how they reposition rather than simply not engaging.
For schemes that switch between control methods - noticing if a certain control is active, such as aiming, and changing the enemies behaviour based on that.

How to design for: The Prankster

The trick here is to provide non-lethal ways to annoy the other players. The latest Rayman games are a great example of this - players can hit each other about but this does no damage.

Of course, you can use this maliciously. We found that a nice compromise was to monitor if these actions directly resulted in player death, and if so, provide a time based advantage to the other player to either get passed the griefing player, or get their revenge.

So...

I hope this helps. I'm certain this list is incomplete, but we found that by catering for each of these schemas we noticeably improved how well opposing player types play together.

I do feel, for at least the type of game I experienced this most with - large, open world action RPG’s, this list is very relevant, but from my own experience I have noticed these types in many other genres. I feel fairly confident that this list represents something close to completeness, though I’m sure not without many needed amendments and additions.

And as a final thought, and to perhaps pre-empt a criticism - this is not about making a game for everyone - but about designing for everyone that would want to play your particular game.

↧

Coverage Buffer as main Occlusion Culling technique

June 24, 2015, 9:23 am

≫ Next: When Your Best Isn't Good Enough: A Tale of Failure (Part I)

≪ Previous: Designing for multi-age coop play

Introduction

Recently I came across a awesome presentation from Crytek named Secrets of CryEGNINE 3 Graphics Technology authored by Nickolay Kasyan, Nicolas Schulz и Tiago Sousa. In this paper I've found a brief description of a technique called Coverage Buffer.
You can find whole presentation HERE.
This technology was presented as main occlusion culling method, actively used since Crysis 2. And, since there was no detailed paper about this technology, I decided to dig into this matter myself.

Coverage Buffer - Occlusion Culling technique

Overview

Main idea of method was clearly stated by Crytek presentation I mentioned before:

Get depth buffer from previous frame
Make reprojection to current frame
Softwarely rasterize BBoxes of objects to check, whether they can be seen from camera perspective or not - and, based on this test, make a decision to draw or not to draw.

There's, of course, nothing revolutionary about this concept. There's another very similar method called Software Occlusion Culling. But there are few differences between these methods, and a crucial one is that in SOC we must separate objects into two different categories - occluders and occludee, which is not always can be done.

Let's see some examples.
If we have FPS game level, like Doom 3, we have corridors, which are perfect occluders, and objects - barrels, ammo, characters, which are, in turn, perfect occludees. In this case we have a clear approach - to test objects' BBoxes agains corridors.
But what if we have, let's say, massive forest. Every tree can be both occluder - imagine large tree right in front of camera, which occludes all other world behind it, - and occludee - when some other tree occludes it. In case of forest we cannot use SOC in its pure form, it'd be counterproductive.

So, summarizing cons and pros of Coverage Buffer:

PROS:

we need not to separate objects into occluders/occludee
we can use already filled depth buffer from previous frame, we need not to rasterize large occluder's BBoxes

CONS:

small artifacts, caused by 1 frame delay (even reprojection doesn't completely solve it)
small overhead if there's no occlusion happened (that, I guess, is common for all OC methods I know, but still)

Choice

When I started to investigate this matter, it wasn't out of pure academic interest. On a existing and live project there was a particular problem, which needed to be solved. Large procedural forest caused giant lags because of overdraw issue (Dx9 alpha-test stage was disabled due to other issues, which are not discussed in this article, and in Dx11 alpha test kills Early-Z, which also causes massive overdraw).

Here's a short summary of initial problem:

We need to draw an island, full of different, procedural trees. (Engine used is Torque3D)

Engine by default offers nice batching system, which batches distant trees into... well, batches, but decisions about "draw/no draw" is taken based on frustum culling results only. Also, distant trees are rendering as billboards-imposters, which is also a nice optimization.
But this approach is not so effective, when we deal with large forest with thousands of trees. In this case there's a lot of overdraw done: batches behind mountains, batches behind the wall, batches behind other trees and so on. All of these overdraws cause FPS to drop gravely: even if we look through wall in the direction towards the center of an island, drawing of invisible trees took about 20-30ms.
As a result, players got a dramatic FPS drop by just looking towards the center of an isle.

Attached Image: 100951_1413924340_life_is_feudal_map_with_coordinates.jpg

To solve this particular issue it's been decided to use Coverage Buffer. I cannot say that I did not have doubts about this decision, but Crytek recommendations overruled all my other suggestions. Besides, CB fits into this particular issue like a glove - why not try it?

Implementation

Let's proceed to technical details and code.

Obtaining Depth Buffer.

First task was to obtain depth buffer. In Dx11 it's no difficult task. In Dx9 it's also not so difficult, there's a certain hack (found in Aras Pranckevičius blog, it's a guy, who runs render in Unity3D). Here's link: http://aras-p.info/texts/D3D9GPUHacks.html
It appears, that one CAN obtain depth buffer, but only with special format - INTZ. According to official NVidia and AMD papers, most of videocards since 2008 support this feature. For earlier cards there's RAWZ - another hacky format.
Links to papers:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Advanced-DX9-Capabilities-for-ATI-Radeon-Cards_v2.pdf
http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf

Usage code is trivial, but I'll put it here - just in case:

#define FOURCC_INTZ ((D3DFORMAT)(MAKEFOURCC('I','T','N','Z')))

// Determine if INTZ is supported
HRESULT hr;
hr = pd3d->CheckDeviceFormat(AdapterOrdinal, DeviceType, AdapterFormat,
 D3DUSAGE_DEPTHSTENCIL, D3DRTYPE_TEXTURE,
FOURCC_INTZ);
BOOL bINTZDepthStencilTexturesSupported = (hr == D3D_OK);

// Create an INTZ depth stencil texture
IDirect3DTexture9 *pINTZDST;
pd3dDevice->CreateTexture(dwWidth, dwHeight, 1,
 D3DUSAGE_DEPTHSTENCIL, FOURCC_INTZ,
 D3DPOOL_DEFAULT, &pINTZDST,
 NULL);

// Retrieve depth buffer surface from texture interface
IDirect3DSurface9 *pINTZDSTSurface;
pINTZDST->GetSurfaceLevel(0, &pINTZDSTSurface);

// Bind depth buffer
pd3dDevice->SetDepthStencilSurface(pINTZDSTSurface);

// Bind depth buffer texture
pd3dDevice->SetTexture(0, pINTZDST);

Next step is processing depth buffer, so we could use it.

Processing depth buffer.

downscale to low resolution (I picked 256x128)
reprojection

These steps are trivial. Downscale is performed with operator max - we're taking the elowest distance to camera, so we wouldn't occlude any of actually visible objects.
Reprojection is performed by applying inverted ViewProjection matrix of previous frame and applying ViewProjection matrix of current frame to results. Gaps are filled with maxValue to prevent artificial occlusion.

Here's some useful parts of code for reprojection:

float3 reconstructPos(Texture2D depthTexture, float2 texCoord, float4x4 matrixProjectionInverted )
{
	float depth = 1-depthTexture.Sample( samplerDefault, texCoord ).r;
		
	float2 cspos = float2(texCoord.x * 2 - 1, (1-texCoord.y) * 2 - 1);
	float4 depthCoord = float4(cspos, depth, 1);
	depthCoord = mul (matrixProjectionInverted, depthCoord);
	
	return depthCoord.xyz / depthCoord.w;
}

Projection performed trivially.

Software rasterization

This topic is well known and already implemented a lot of times. Best info which I could find was here:
https://software.intel.com/en-us/blogs/2013/09/06/software-occlusion-culling-update-2

But, just to gather all eggs in one basket, I'll provide my code, which was originally implemented in plain c++, and later translated to SSE, after which it became approximately 3 times faster.
My SSE is far from perfect, so, if you find any mistakes or places for optimization - please tell me =)

static const int sBBIndexList[36] =
{
  // index for top 
  4, 8, 7,
  4, 7, 3,

  // index for bottom
  5, 1, 2,
  5, 2, 6,

  // index for left
  5, 8, 4,
  5, 4, 1,

  // index for right
  2, 3, 7,
  2, 7, 6,

  // index for back
  6, 7, 8,
  6, 8, 5,

  // index for front
  1, 4, 3,
  1, 3, 2,
};

__m128 SSETransformCoords(__m128 *v, __m128 *m)
{
  __m128 vResult = _mm_shuffle_ps(*v, *v, _MM_SHUFFLE(0,0,0,0));
  vResult = _mm_mul_ps(vResult, m[0]);

  __m128 vTemp = _mm_shuffle_ps(*v, *v, _MM_SHUFFLE(1,1,1,1));
  vTemp = _mm_mul_ps(vTemp, m[1]);

  vResult = _mm_add_ps(vResult, vTemp);
  vTemp = _mm_shuffle_ps(*v, *v, _MM_SHUFFLE(2,2,2,2));

  vTemp = _mm_mul_ps(vTemp, m[2]);
  vResult = _mm_add_ps(vResult, vTemp);

  vResult = _mm_add_ps(vResult, m[3]);
  return vResult;
}

__forceinline __m128i Min(const __m128i &v0, const __m128i &v1)
{
  __m128i tmp;
  tmp = _mm_min_epi32(v0, v1);
  return tmp;
}
__forceinline __m128i Max(const __m128i &v0, const __m128i &v1)
{
  __m128i tmp;
  tmp = _mm_max_epi32(v0, v1);
  return tmp;
}


struct SSEVFloat4
{
  __m128 X;
  __m128 Y;
  __m128 Z;
  __m128 W;
};

// get 4 triangles from vertices
void SSEGather(SSEVFloat4 pOut[3], int triId, const __m128 xformedPos[])
{
  for(int i = 0; i < 3; i++)
  {
    int ind0 = sBBIndexList[triId*3 + i + 0]-1;
    int ind1 = sBBIndexList[triId*3 + i + 3]-1;
    int ind2 = sBBIndexList[triId*3 + i + 6]-1;
    int ind3 = sBBIndexList[triId*3 + i + 9]-1;

    __m128 v0 = xformedPos[ind0];
    __m128 v1 = xformedPos[ind1];
    __m128 v2 = xformedPos[ind2];
    __m128 v3 = xformedPos[ind3];
    _MM_TRANSPOSE4_PS(v0, v1, v2, v3);
    pOut[i].X = v0;
    pOut[i].Y = v1;
    pOut[i].Z = v2;
    pOut[i].W = v3;

    //now X contains X0 x1 x2 x3, Y - Y0 Y1 Y2 Y3 and so on...
  }
}


bool RasterizeTestBBoxSSE(Box3F box, __m128* matrix, float* buffer, Point4I res)
{
  //TODO: performance
  LARGE_INTEGER frequency;        // ticks per second
  LARGE_INTEGER t1, t2;           // ticks
  double elapsedTime;

  // get ticks per second
  QueryPerformanceFrequency(&frequency);

  // start timer
  QueryPerformanceCounter(&t1);


  //verts and flags
  __m128 verticesSSE[8];
  int flags[8];
  static Point4F vertices[8];
  static Point4F xformedPos[3];
  static int flagsLoc[3];

  // Set DAZ and FZ MXCSR bits to flush denormals to zero (i.e., make it faster)
  // Denormal are zero (DAZ) is bit 6 and Flush to zero (FZ) is bit 15. 
  // so to enable the two to have to set bits 6 and 15 which 1000 0000 0100 0000 = 0x8040
  _mm_setcsr( _mm_getcsr() | 0x8040 );


  // init vertices
  Point3F center = box.getCenter();
  Point3F extent = box.getExtents();
  Point4F vCenter = Point4F(center.x, center.y, center.z, 1.0);
  Point4F vHalf   = Point4F(extent.x*0.5, extent.y*0.5, extent.z*0.5, 1.0);

  Point4F vMin    = vCenter - vHalf;
  Point4F vMax    = vCenter + vHalf;

  // fill vertices
  vertices[0] = Point4F(vMin.x, vMin.y, vMin.z, 1);
  vertices[1] = Point4F(vMax.x, vMin.y, vMin.z, 1);
  vertices[2] = Point4F(vMax.x, vMax.y, vMin.z, 1);
  vertices[3] = Point4F(vMin.x, vMax.y, vMin.z, 1);
  vertices[4] = Point4F(vMin.x, vMin.y, vMax.z, 1);
  vertices[5] = Point4F(vMax.x, vMin.y, vMax.z, 1);
  vertices[6] = Point4F(vMax.x, vMax.y, vMax.z, 1);
  vertices[7] = Point4F(vMin.x, vMax.y, vMax.z, 1);

  // transforms
  for(int i = 0; i < 8; i++)
  {
    verticesSSE[i] = _mm_loadu_ps(vertices[i]);

    verticesSSE[i] = SSETransformCoords(&verticesSSE[i], matrix);

    __m128 vertX = _mm_shuffle_ps(verticesSSE[i], verticesSSE[i], _MM_SHUFFLE(0,0,0,0)); // xxxx
    __m128 vertY = _mm_shuffle_ps(verticesSSE[i], verticesSSE[i], _MM_SHUFFLE(1,1,1,1)); // yyyy
    __m128 vertZ = _mm_shuffle_ps(verticesSSE[i], verticesSSE[i], _MM_SHUFFLE(2,2,2,2)); // zzzz
    __m128 vertW = _mm_shuffle_ps(verticesSSE[i], verticesSSE[i], _MM_SHUFFLE(3,3,3,3)); // wwww
    static const __m128 sign_mask = _mm_set1_ps(-0.f); // -0.f = 1 << 31
    vertW = _mm_andnot_ps(sign_mask, vertW); // abs
    vertW = _mm_shuffle_ps(vertW, _mm_set1_ps(1.0f), _MM_SHUFFLE(0,0,0,0)); //w,w,1,1
    vertW = _mm_shuffle_ps(vertW, vertW, _MM_SHUFFLE(3,0,0,0)); //w,w,w,1
  
    // project
    verticesSSE[i] = _mm_div_ps(verticesSSE[i], vertW);

    // now vertices are between -1 and 1
    const __m128 sadd = _mm_setr_ps(res.x*0.5, res.y*0.5, 0, 0);
    const __m128 smult = _mm_setr_ps(res.x*0.5, res.y*(-0.5), 1, 1);

    verticesSSE[i] = _mm_add_ps( sadd, _mm_mul_ps(verticesSSE[i],smult) );
  }

  // Rasterize the AABB triangles 4 at a time
  for(int i = 0; i < 12; i += 4)
  {
    SSEVFloat4 xformedPos[3];
    SSEGather(xformedPos, i, verticesSSE);

    // by 3 vertices
    // fxPtX[0] = X0 X1 X2 X3 of 1st vert in 4 triangles
    // fxPtX[1] = X0 X1 X2 X3 of 2nd vert in 4 triangles
    // and so on
    __m128i fxPtX[3], fxPtY[3];
    for(int m = 0; m < 3; m++)
    {
      fxPtX[m] = _mm_cvtps_epi32(xformedPos[m].X);
      fxPtY[m] = _mm_cvtps_epi32(xformedPos[m].Y);
    }

    // Fab(x, y) =     Ax       +       By     +      C              = 0
    // Fab(x, y) = (ya - yb)x   +   (xb - xa)y + (xa * yb - xb * ya) = 0
    // Compute A = (ya - yb) for the 3 line segments that make up each triangle
    __m128i A0 = _mm_sub_epi32(fxPtY[1], fxPtY[2]);
    __m128i A1 = _mm_sub_epi32(fxPtY[2], fxPtY[0]);
    __m128i A2 = _mm_sub_epi32(fxPtY[0], fxPtY[1]);

    // Compute B = (xb - xa) for the 3 line segments that make up each triangle
    __m128i B0 = _mm_sub_epi32(fxPtX[2], fxPtX[1]);
    __m128i B1 = _mm_sub_epi32(fxPtX[0], fxPtX[2]);
    __m128i B2 = _mm_sub_epi32(fxPtX[1], fxPtX[0]);

    // Compute C = (xa * yb - xb * ya) for the 3 line segments that make up each triangle
    __m128i C0 = _mm_sub_epi32(_mm_mullo_epi32(fxPtX[1], fxPtY[2]), _mm_mullo_epi32(fxPtX[2], fxPtY[1]));
    __m128i C1 = _mm_sub_epi32(_mm_mullo_epi32(fxPtX[2], fxPtY[0]), _mm_mullo_epi32(fxPtX[0], fxPtY[2]));
    __m128i C2 = _mm_sub_epi32(_mm_mullo_epi32(fxPtX[0], fxPtY[1]), _mm_mullo_epi32(fxPtX[1], fxPtY[0]));

    // Compute triangle area
    __m128i triArea = _mm_mullo_epi32(B2, A1);
    triArea = _mm_sub_epi32(triArea, _mm_mullo_epi32(B1, A2));
    __m128 oneOverTriArea = _mm_div_ps(_mm_set1_ps(1.0f), _mm_cvtepi32_ps(triArea));

    __m128 Z[3];
    Z[0] = xformedPos[0].W;
    Z[1] = _mm_mul_ps(_mm_sub_ps(xformedPos[1].W, Z[0]), oneOverTriArea);
    Z[2] = _mm_mul_ps(_mm_sub_ps(xformedPos[2].W, Z[0]), oneOverTriArea);

    // Use bounding box traversal strategy to determine which pixels to rasterize 
    __m128i startX =  _mm_and_si128(Max(Min(Min(fxPtX[0], fxPtX[1]), fxPtX[2]),  _mm_set1_epi32(0)), _mm_set1_epi32(~1));
    __m128i endX   = Min(Max(Max(fxPtX[0], fxPtX[1]), fxPtX[2]), _mm_set1_epi32(res.x - 1));

    __m128i startY = _mm_and_si128(Max(Min(Min(fxPtY[0], fxPtY[1]), fxPtY[2]), _mm_set1_epi32(0)), _mm_set1_epi32(~1));
    __m128i endY   = Min(Max(Max(fxPtY[0], fxPtY[1]), fxPtY[2]), _mm_set1_epi32(res.y - 1));

    // Now we have 4 triangles set up.  Rasterize them each individually.
    for(int lane=0; lane < 4; lane++)
    {
      // Skip triangle if area is zero 
      if(triArea.m128i_i32[lane] <= 0)
      {
        continue;
      }

      // Extract this triangle's properties from the SIMD versions
      __m128 zz[3];
      for(int vv = 0; vv < 3; vv++)
      {
        zz[vv] = _mm_set1_ps(Z[vv].m128_f32[lane]);
      }

      //drop culled triangle

      int startXx = startX.m128i_i32[lane];
      int endXx  = endX.m128i_i32[lane];
      int startYy = startY.m128i_i32[lane];
      int endYy  = endY.m128i_i32[lane];

      __m128i aa0 = _mm_set1_epi32(A0.m128i_i32[lane]);
      __m128i aa1 = _mm_set1_epi32(A1.m128i_i32[lane]);
      __m128i aa2 = _mm_set1_epi32(A2.m128i_i32[lane]);

      __m128i bb0 = _mm_set1_epi32(B0.m128i_i32[lane]);
      __m128i bb1 = _mm_set1_epi32(B1.m128i_i32[lane]);
      __m128i bb2 = _mm_set1_epi32(B2.m128i_i32[lane]);

      __m128i cc0 = _mm_set1_epi32(C0.m128i_i32[lane]);
      __m128i cc1 = _mm_set1_epi32(C1.m128i_i32[lane]);
      __m128i cc2 = _mm_set1_epi32(C2.m128i_i32[lane]);

      __m128i aa0Inc = _mm_mul_epi32(aa0, _mm_setr_epi32(1,2,3,4));
      __m128i aa1Inc = _mm_mul_epi32(aa1, _mm_setr_epi32(1,2,3,4));
      __m128i aa2Inc = _mm_mul_epi32(aa2, _mm_setr_epi32(1,2,3,4));

      __m128i alpha0 = _mm_add_epi32(_mm_mul_epi32(aa0, _mm_set1_epi32(startXx)), _mm_mul_epi32(bb0, _mm_set1_epi32(startYy)));
      alpha0 = _mm_add_epi32(cc0, alpha0);
      __m128i beta0 = _mm_add_epi32(_mm_mul_epi32(aa1, _mm_set1_epi32(startXx)), _mm_mul_epi32(bb1, _mm_set1_epi32(startYy)));
      beta0 = _mm_add_epi32(cc1, beta0);
      __m128i gama0 = _mm_add_epi32(_mm_mul_epi32(aa2, _mm_set1_epi32(startXx)), _mm_mul_epi32(bb2, _mm_set1_epi32(startYy)));
      gama0 = _mm_add_epi32(cc2, gama0);

      int  rowIdx = (startYy * res.x + startXx);

      __m128 zx = _mm_mul_ps(_mm_cvtepi32_ps(aa1), zz[1]);
      zx = _mm_add_ps(zx, _mm_mul_ps(_mm_cvtepi32_ps(aa2), zz[2]));
      zx = _mm_mul_ps(zx, _mm_setr_ps(1.f, 2.f, 3.f, 4.f));

      // Texels traverse
      for(int r = startYy; r < endYy; r++,
        rowIdx += res.x,
        alpha0 = _mm_add_epi32(alpha0, bb0),
        beta0 = _mm_add_epi32(beta0, bb1),
        gama0 = _mm_add_epi32(gama0, bb2))
      {
        // Compute barycentric coordinates
        // Z0 as an origin
        int index = rowIdx;
        __m128i alpha = alpha0;
        __m128i beta = beta0;
        __m128i gama = gama0;

        //Compute barycentric-interpolated depth
        __m128 depth = zz[0];
        depth = _mm_add_ps(depth, _mm_mul_ps(_mm_cvtepi32_ps(beta), zz[1]));
        depth = _mm_add_ps(depth, _mm_mul_ps(_mm_cvtepi32_ps(gama), zz[2]));
        __m128i anyOut = _mm_setzero_si128();

        __m128i mask;
        __m128 previousDepth;
        __m128 depthMask;
        __m128i finalMask;
        for(int c = startXx; c < endXx;
          c+=4,
          index+=4,
          alpha = _mm_add_epi32(alpha, aa0Inc),
          beta  = _mm_add_epi32(beta, aa1Inc),
          gama  = _mm_add_epi32(gama, aa2Inc),
          depth = _mm_add_ps(depth, zx))
        {
          mask = _mm_or_si128(_mm_or_si128(alpha, beta), gama);
          previousDepth = _mm_loadu_ps(&(buffer[index]));

          //calculate current depth
          //(log(depth) - -6.907755375) * 0.048254941;
          __m128 curdepth = _mm_mul_ps(_mm_sub_ps(log_ps(depth),_mm_set1_ps(-6.907755375)),_mm_set1_ps(0.048254941));
          curdepth = _mm_sub_ps(curdepth, _mm_set1_ps(0.05));      

          depthMask = _mm_cmplt_ps(curdepth, previousDepth);    
          finalMask = _mm_andnot_si128(mask, _mm_castps_si128(depthMask));
          anyOut = _mm_or_si128(anyOut, finalMask);

        }//for each column  

        if(!_mm_testz_si128(anyOut, _mm_set1_epi32(0x80000000)))
        {
          // stop timer
          QueryPerformanceCounter(&t2);

          // compute and print the elapsed time in millisec
          elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;

          RasterizationStats::RasterizeSSETimeSpent += elapsedTime;

          return true; //early exit
        }

      }// for each row

    }// for each triangle
  }// for each set of SIMD# triangles

  return false;
}

Now we have Coverage Buffer technique up and running.

Results

Using C-Buffer for Occlusion Culling in our particular case reduced frame render time by 10-20 ms (and in some cases even more). But it also gave about 2ms overhead in "nothing culled" case.

This method was useful in our case, but it doesn't mean, that it can be used in all other cases. Actually, it puzzles me, how Crytek used it in Crysis 2 - imho, CB-unfriendly game. Perhaps I took some of it concepts wrong? Well, maybe =)

So, as it appears to me, main restriction for this method would be:

Do not use it unless you want to cull something, that takes forever to render (like forest with overdraw, for instance). CPU rasterization is a costly matter, and it's not worth it, when its applied to a simple easy-to-render objects with gpu-cheap materials.

↧

When Your Best Isn't Good Enough: A Tale of Failure (Part I)

June 29, 2015, 3:09 am

≫ Next: When Your Best Isn't Good Enough: A Tale of Failure (Part II)

≪ Previous: Coverage Buffer as main Occlusion Culling technique

Creating a video game where once there was nothing can be incredibly difficult. And it doesn’t end there - getting people to care about your game once it’s made can be even harder. It’s no surprise that making games is a journey with many missteps and failures along the way. In this two-part series we’ll be looking at circumstances of our unsuccessful crowdfunding campaign for the music puzzle game Cadence (part I – skip if you aren’t interested in crowdfunding) and the emotional aftermath of this failure (part II – relevant to anyone working as a creative professional).

For a long time we were dead set against the idea of Kickstarter. We’d heard tales of how much energy and effort they take, and of course there is a huge risk factor. Regardless of how we chose to fund the game, we were ultimately convinced by the realisation that most of the effort would be spent on crafting our marketing message – something which could only ever benefit the game. Self confidence was also part of the equation, for why should we be scared of running a Kickstarter if we truly believed in the game?

We were however adamant that if we were going to do a Kickstarter, by Jove, we were going to do it right. We did our research, and then did our research some more. We knew that it would be a full time job for at least two months. We knew that Kickstarter saturation was real and that we’d have to bring our very best in order to succeed. We even went so far as to build a scraper tool to analyse public info, like reward tiers and funding goals, to make sure our own assumptions were on point.

And so we set about preparing our campaign in time for GDC 2015. Creating a great video with high production values was a top priority, as we believed that a slam dunk video had the potential to make or break the campaign. To this end we enlisted the awesome cats from Cool Your Jets and were thrilled with the final product. Many late nights were sacrificed as they honed the video and we put huge amounts of energy into crafting our page. And so it was, almost collapsing with exhaustion, we flicked the switch and send out our message out only hours before I boarded a plane headed for GDC.

Initially things seemed to be following the script perfectly. Our followers and local community rallied behind us and my phone started going ballistic with notifications. It was awesome to see the outpour of enthusiasm and the show of faith from friends and family was beyond overwhelming! We hit our initial targets with ease and it seems like we were on track for a great campaign. However, 48 hours after I’d landed in San Francisco it was clear that we’d already reached everyone who cared and our campaign hit a brick wall.

Whilst alarmed, we weren’t deterred - we knew about the infamous Kickstarter trough. So it meant we weren’t going to have a fairytale campaign, but that was something we were ready to accept as we settled in for the long slog. And besides, I was about to attend both GDC and SXSW – what better place to promote a flagging Kickstarter?

As it turns out GDC and SXSW were both amazing experiences that benefited Cadence immeasurably, but a miserable result for the Cadence Kickstarter itself. My press strategy of hunting down anyone with a press tag and cornering them until they’d heard about Cadence resulted only in coverage we probably could have secured anyway (thanks RPS)! Perhaps it was merely luck, but the press simply wasn’t anywhere that I was. Still, I jumped at every possible chance to recruit allies and many developers and personal heroes soon learned about Cadence and in many cases tweeted about the game.

This kept a steady trickle of new eyes coming, but still it was never more than a trickle. As it became clear the campaign was flagging, some of our supporters started to speculate as to what we were doing wrong. In particular we received some “heated” criticism for not including a demo. Originally we decided to omit a demo because we’ve seen examples of this having a negative effect on sales. But we weren’t zealots and were willing to experiment. Curiously, our demo release made almost zero difference to the trajectory of our campaign. Perhaps it would have been different if it was there from start, but I have my doubts.

Many aspects of our campaign got picked apart and analysed, from our reward tiers to the video to feedback that our message was confusing and muddled. I believe that all of these criticisms have merit: certainly we could have done a better job of explaining the game, to explain why it’s worth a backer’s money, to have compelling rewards that let backers feel like they’re getting their money’s worth. To craft a message that allows backers to feel like they are part of a movement, and something bigger than themselves.

I don’t blame people for focusing on these factors, indeed they are the very same things we focused on whilst in research mode. The common denominator all these attributes share is that they happen to be the forward facing elements publicly visible on any Kickstarter page. From this it’s easy to assume there is a correlation between getting each of these elements right and Kickstarter success. But now I believe this is a dangerous way of thinking that glosses over the most important fact: it’s all about eyeballs.

At the end of the day only 10 000 odd people ever clicked play on our video – resulting in 526 backers and 37% percent of our funding goal. I imagine that if that number was closer to 30 000 there is every reason to believe that we would have been funded. Clearly, we’re not experts on how we could have changed this equation, but we must admit we dropped the ball by not focusing on one obvious area: youtubers.

Considering that a single Let’s Play by a medium-sized indie-friendly caster could have delivered those views, focusing our efforts elsewhere was a very costly mistake. Of course youtubers can be quite enigmatic in what they choose to cover, but at the very least we messed up some simple basics like giving two week lead-in time before the start of the campaign. This is one of those instances where I felt like it would’ve been much better for me to be at home, jockeying a keyboard and sending emails, rather than navigating two conferences 10 000 miles away.

In many of my discussions at GDC, I heard other developers mention that Kickstarters often work better the second time round, and I think I can see why. Having a captive audience is perhaps the most valuable asset you can have when selling any kind of game. Building this audience however requires painstaking effort, much like a stalagmite growing one drop at a time. Starting from scratch when you launch a Kickstarter campaign is a very tall order. I think this goes a long way to explain why Kickstarters with nostalgia appeal do so well.

Looking back it would be easy to say the campaign was a disaster, but given the number of positive reactions we still believe the game deserves to be made. Of course, the money alarm bells were ringing frantically, so we spent the final days of the campaign throwing together our own tongue-in-cheek “Noodlestarter” page to try and capture Kickstarter momentum. It was refreshing to poke a bit of fun at what is essentially a pre-order page. We never had high expectations, but for a couple of days work, the month or two of funding it secured was well worth it.

In Part II I’ll retell the story, focusing on the view from my seat in the emotional rollercoaster. In particular this highlights how failure undermines your ability to be effective, and looks forward to our launch on Steam Early Access and beyond.

↧

When Your Best Isn't Good Enough: A Tale of Failure (Part II)

June 29, 2015, 3:23 am

≫ Next: How the PVS-Studio Team Improved Unreal Engine's Code

≪ Previous: When Your Best Isn't Good Enough: A Tale of Failure (Part I)

This is the second of two part series about failure. In part I we analysed how the Cadence (a.k.a. a musical playground of beautiful puzzles) crowdfunding campaign fell short of its mark. In this second part, I’m going to take a deeper look at the emotional impact of enduring failure, and the subtle ways in which it undermines your ability to stay productive and share your creativity with the world.

I remember the lead-up to our Kickstarter campaign (early 2015) as being a time alive with optimism. Not only was I earning some welcome breathing room on a well-paid contract gig, but it felt like doors were finally starting to open for Cadence. In a way, it felt like the Kickstarter was going to be the thing that would break the dam wall and be the beginning of the rest of my life as game developer. Along the way we started to pick up a few award nominations, bolstering the sense we were pointing in the right direction. In short, I was flourishing and things were bright.

I mention this because it’s important to acknowledge how feeling emotionally high enabled a destructive behaviour that paved the road to emotional burnout. Unlike traditional methods of funding, Kickstarter success is cut and dry. Either you get the money, or else everyone very publicly sees you get nothing. Psychologically speaking, this makes for a very high stakes game. In fact I can remember looking at other failed Kickstarters and thinking: “thank god that isn’t going to happen to us”. But perhaps this sentiment is best described as posturing, because I was still deeply anxious about how things would pan out.

Consequently this made it was very easy for me to slip into a pattern of “let’s work just a little bit harder on this, you know, to make sure”. Of course, if you keep working just a little bit harder here and there, you eventually end up in a situation where you’re horrendously overcommitted, and the only fuel left for the fire is your sleep and physical well-being. As much as I’ve learnt to recognise the symptoms and swore I’d never let it happen again, I was undone by that very simple thought: “imagine how much better life will be afterwards”.

In the same breath, collaborating with a small team of awesome people with a common goal can be a wonderful feeling. In this case we’d camped out in our video guys’ home office for the final days of the lead-up to optimise communication time. The looming deadline for something we cared about created a bold sense of camaraderie and brotherhood – in fact I can remember Rodain, my development partner, saying he’d never felt indie fellowship as fiercely before. But this can seductively encourage you to push even harder, because now you don’t want to let down those around you.

When we eventually, exhaustedly, flicked the switch to go live I was totally shattered. The following hours were a hazy blur that could fit right into a drug-addled Johnny Depp biopic. I can’t remember another time I’ve so desperately wanted to sleep, only to be denied by an adrenaline hangover pushing thoughts around my head. It couldn't help that I knew my silenced phone was simultaneously blowing up with notifications.

24 hours later, a modicum of sleep acquired, the gravity of what happened started to dawn on me. The support from friends and the local community was amazing. But I was caught off guard when some friends and family started making very sizeable contributions to our Kickstarter. My first reaction was pang of guilt, that people I care about were spending so much on my silly game – but then I realised that actually this was their way of showing me that they really believe in me. Considering game development is so often filled with self doubt and anxiety, this was an overwhelming feeling that bought more than a tear to my eye.

There wasn’t much time to catch my breath however, as I was soon on a plane off to San Francisco for Game Developers Conference (GDC). I was at least smart enough to plan in a few days to recover from jet lag once I arrived (I may have also nabbed an airport-priced massage during my layover that was money gladly spent). I can’t say my rest was peaceful though, as it was rapidly becoming apparent our Kickstarter was losing momentum and the path forward started to look increasingly steeper. My stomach was in double and triple knots.

GDC is a massive conference: thousands of people, parties, events and meeting personal heroes by the minute – more than enough to make your first time completely overwhelming. But conferences are also what you make of them, the reason you spend so much money and travel halfway around the world is because this chaotic environment has the ability to create connections and introduce you to people you might never otherwise have access to. Of course, no one is going to hand this to you – you must be open to possibilities, network like a demon, and always be selling because you never know who is listening.

As a recovered introvert, this is already a tough ask, and I’m always envious of anyone who could shamelessly perform in this manner. But against the backdrop of a flagging Kickstarter, the pressure to create some “magic” was immense. Besides, I’d come too far to not give it my all, so time and again I’d throw myself into the fray and pitch Cadence to people I’d only just met. Ideally in such situations you want to be friendly, energetic, and most of all believe you’re making the other person’s life better by sharing your awesome thing with them.

But instead, being so emotionally depleted, it always felt like I was operating with a very obvious ulterior motive. That I was the noise distracting them from what he or she actually cared about. This felt particularly brutal when that someone was a person I deeply admired and respected. Often I left an interaction feeling like my psyche had been raked over a bed of hot coals. At its worst I remember taking a few minutes to lay spread-eagled on my hostel dorm-room floor, just trying to recapture a little bit of myself.

Even though they didn’t know exactly what I was going through, I was immensely grateful to have friends, both new and old, at GDC. By simply being around and caring about my cause they lifted my mood and I managed to leave GDC having had a good time. But I think this points to one of the most insidious things about feeling unsuccessful: it tends to make you cower, to want to hide yourself – to not see opportunities where before there were none and sabotage your own luck.

The remainder of our Kickstarter campaign was a slow death, but it didn’t take me long to accept the outcome. I was exhausted and simply had no more fight left in me. It seemed far wiser to simply lick our wounds and save our energy for the next scrap. Nevertheless the post mortem was really difficult. I remember people, with the best of intentions, trying to give constructive criticism – but it was almost impossible to hear it without getting very upset. Thankfully, I’m old enough to know when it’s time to go for a walk, but I think here lies another clue.

Clearly, over the hours of hard graft I had invested countless little pieces of myself. This meant that when the campaign failed to meet its goal, it felt like I had failed. Additionally, when people analysed the campaign, it felt like they were picking apart my actions and personally attacking my motivations. But I tried my best to keep the right perspective, to remember the many positives of the campaign and to try and frame the failure as a learning experience.

In the months that followed I noticed another knock-on effect. Do you know any friends who love a particular restaurant but then after a bad experience suddenly it’s the worst? The psychological term for this is ‘splitting’, which acts as a defence mechanism to protect you from unpleasantness by writing it off wholesale. In this case I found myself trying to get as much psychological distance from the Kickstarter as possible. This isn’t to say I went around trashing Kickstarter as a platform, but it made it very difficult to look at our own page or to back any other Kickstarters.

Alarmingly, this effect spilled over into Cadence as well. I found myself far less enthused about the game, despite all of positive feedback we received during and after the campaign. This points to a crisis of confidence that makes it hard to take those half chances. Competitions don’t seem like they are worth entering anymore. Keeping your followers engaged with regular updates is a chore you can’t manage. Your productivity wavers, and it becomes hard to put your head down and focus on any one task because it’s hard to believe you’re still heading in the right direction.

Lately, I’ve been asking myself a valuable question: did we actually fail? The story of Cadence is a long way from done, so who knows what might happen when we use our hard won experiences to launch on Steam early access or do a full release. This is a curious thing I’m learning about the relationship between success and failure – in the two years I’ve been working on Cadence, my failure rate has gone up significantly. I’ve been rejected and fallen short of expectations more times than I ever did academically or as an employee. But in between, we’ve also had amazing experiences and been presented with opportunities that could only be won by walking this path.

One perfect example happened during that crazy GDC week. One of my hostel dorm mates innocuously mentioned Stugan – a non-profit accelerator for indie developers to work on their game in Swedish country side. And now, four months later, I’m thrilled to be writing this article from a lakeside cabin, one week into my two month long stay. Being surrounded by 22 other indies so far has been an incredible boost to morale. Maybe it was a mistake to try and promote a Kickstarter during GDC? But then again I wouldn’t be here, and right now the view is pretty great!

CINfunVWwAAzlTK.jpg:large

Look out for Cadence on Steam Early Access by the time Stugan ends in mid August.

↧

How the PVS-Studio Team Improved Unreal Engine's Code

June 29, 2015, 6:07 am

≫ Next: Math for Game Developers: Probability and Randomness

≪ Previous: When Your Best Isn't Good Enough: A Tale of Failure (Part II)

This article was originally published at Unreal Engine Blog. Republished by the editors' permission.

Our company develops, promotes, and sells the PVS-Studio static code analyzer for C/C++ programmers. However, our collaboration with customers is not limited solely to selling PVS-Studio licenses. For example, we often take on contract projects as well. Due to NDAs, we're not usually allowed to reveal details about this work, and you might not be familiar with the projects names, anyway. But this time, we think you'll be excited by our latest collaboration. Together with Epic Games, we're working on the Unreal Engine project. This is what we're going to tell you about in this article.

As a way of promoting our PVS-Studio static code analyzer, we've thought of an interesting format for our articles: We analyze open-source projects and write about the bugs we manage to find there. Take a look at this updatable list of projects we have already checked and written about. This activity benefits everyone: readers enjoy learning from others' mistakes and discover new means to avoid them through certain coding techniques and style. For us, it's a way to have more people learn about our tool. As for the project authors, they too benefit by gaining an opportunity to fix some of the bugs.

Among the articles was "A Long-Awaited Check of Unreal Engine 4". Unreal Engine's source code was extraordinarily high quality, but all software projects have defects and PVS-Studio is excellent at surfacing some of the most tricky bugs. We ran an analysis and reported our findings to Epic. The Unreal Engine team thanked us for checking their code, and quickly fixed the bugs we reported. But we didn't want to stop there, and thought we should try selling a PVS-Studio license to Epic Games.

Epic Games was very interested in using PVS-Studio to improve the engine continuously over time. They suggested we analyze and fix Unreal Engine's source code so that they were completely clear of bugs and the tool wouldn't generate any false positives in the end. Afterwords, Epic would use PVS-Studio on their code base themselves, thus making its integration into their development process as easy and smooth as possible. Epic Games promised to not only purchase the PVS-Studio license, but would also pay us for our work.

We accepted the offer. The job is done. And now you are welcome to learn about various interesting things we came across while working on Unreal Engine's source code.

Pavel Eremeev, Svyatoslav Razmyslov, and Anton Tokarev were the participants on the PVS-Studio's part. On the Epic Game's, the most active participants were Andy Bayle and Dan O'Connor - it all would have been impossible without their help, so many thanks to them!

PVS-Studio integration into Unreal Engine's build process

To manage the build process, Unreal Engine employs a build system of its own - Unreal Build Tool. There is also a set of scripts to generate project files for a number of different platforms and compilers. Since PVS-Studio is first of all designed to work with the Microsoft Visual C++ compiler, we used the corresponding script to generate project files (*.vcxproj) for the Microsoft Visual Studio IDE.

PVS-Studio comes with a plugin that can integrate into the Visual Studio IDE and enables a "one-click" analysis. However, projects generated for Unreal Engine are not the "ordinary" MSBuild projects used by Visual Studio.

When compiling Unreal Engine from Visual Studio, the IDE invokes MSBuild when starting the build process, but MSBuild itself is used just as a "wrapper" to run the Unreal Build Tool program.

To analyze the source code in PVS-Studio, the tool needs a preprocessor's output - an *.i file with all the headers included and macros expanded.

Quick note. This section is only interesting if you have a customized build process like Unreal's If you are thinking of trying PVS-Studio on a project of yours that has some intricate peculiarities about its build process, I recommend reading this section to the end. Perhaps it will be helpful for your case. But if you have an ordinary Visual Studio project or can't wait to read about the bugs we have found, you can skip it.

To launch the preprocessor correctly, the tool needs information about compilation parameters. In "ordinary" MSBuild projects, this information is inherent; the PVS-Studio plugin can "see" it and automatically preprocess all the necessary source files for the analyzer that will be called afterwards. With Unreal Engine projects, things are different.

As I've already said above, their projects are just a "wrapper" while the compiler is actually called by Unreal Build Tool. That's why compilation parameters in this case are not available for the PVS-Studio plugin for Visual Studio. You just can't run analysis "in one click", though the plugin can be used to view the analysis results.

The analyzer itself (PVS-Studio.exe) is a command-line application that resembles the C++ compiler regarding the way it is used. Just like the compiler, it has to be launched individually for every source file, passing this file's compilation parameters through the command line or response file. And the analyzer will automatically choose and call the appropriate preprocessor and then perform the analysis.

Note. There's also an alternative way. You can launch the analyzer for preprocessed files prepared in advance.

Thus, the universal solution for integrating the PVS-Studio analyzer into the build process is to call its exe-file in the same place where the compiler is called, i.e. inside the build system - Unreal Build Tool in our case. Sure, it will require modifying the current build system, which may not be desirable, as in our case. Because of that, just for cases like this, we created a compiler call "intercepting" system - Compiler Monitoring.

The Compiler Monitoring system can "intercept" compilation process launches (in the case with Visual C++, this is the cl.exe proces), collecting all of the parameters necessary for successful preprocessing, then re-launch preprocessing for files under compilation for further analysis. That's what we did.

Figure 1. A scheme of the analysis process for the Unreal Engine project

Unreal Engine analysis integration comes down to calling, right before the build process, the monitoring process (CLMonitor.exe) that will make all the necessary steps to do the preprocessing and launch the analyzer at the end of the build process. To run the monitoring process, we need to run a simple command:

CLMonitor.exe monitor

CLMonitor.exe will call itself in "tracking mode" and terminate. At the same time, another CLMonitor.exe process will remain running in the background "intercepting" the compiler calls. When the build process is finished, we need to run another simple command:

CLMonitor.exe analyze "UE.plog"

Please pay attention: in PVS-Studio 5.26 and above you should write:

CLMonitor.exe analyze -l "UE.plog"

Now CLMonitor.exe will launch the analysis of previously-collected source files, saving the results into the UE.plog file that can be easily handled in our IDE plugin.

We set a nightly build process of the most interesting Unreal Engine configurations followed by their analysis on our Continuous Integration server. It was a means for us to, first, make sure our edits hadn't broken the build and, second, to get in the morning a new log about Unreal Engine's analysis with all the edits of the previous day taken into account. So, before sending a Pull Request for submitting our edits to the Unreal Engine project repository on GitHub, we could easily make sure that the current version was stable in our repository by simply rebuilding it on the server.

Non-linear bug fixing speed

So, we have solved the project build process and analysis. Now let's talk about bug fixes we've done based on the diagnostic messages output by the analyzer.

At first glance, it may seem natural that the number of warnings output by the analyzer should drop evenly from day to day: about the same number of messages is suppressed by certain PVS-Studio mechanisms as the number of fixes that are done in the code.

That is, theoretically you could expect a graph looking somewhat like this:

Figure 2. A perfect graph. The number of bugs drops evenly from day to day.

In reality, however, messages are eliminated faster during the initial phase of the bug fixing process than at the later stages. First, at the initial stage, we suppress warnings triggered by macros, which helps quickly reduce the overall number of issues. Second, it happened so that we had fixed the most evident issues first and put off more intricate things until later. I can explain on this. We wanted to show the Epic Games developers that we had started working and there was a progress. It would be strange to start with difficult issues and get stuck there, wouldn't it?

It took us 17 working days in total analyzing the Unreal Engine code and fixing bugs. Our goal was to eliminate all the general analysis messages of the first and second severity levels. Here is how the work progressed:

Table 1. The number of warnings remaining on each day.

Notice the red figures. During the first two days, we were getting accustomed to the project and then suppressed warnings in some macros, thus greatly reducing the number of false positives.

Seventeen working days is quite a lot and I'd like to explain why it required this amount of time. First, it was not the whole team that worked on the project, but only two of its members. Of course, they were busy with some other tasks as well during this time. Secondly, Unreal Engine's code was entirely unfamiliar to us, so making fixes was quite a tough job. We had to stop every now and then to figure out if and how we should fix a certain spot.

Now, here is the same data in the form of a smoothed graph:

Figure 3. A smoothed graph of the warning numbers over time.

A practical conclusion - to remember ourselves and tell others: It's a bad idea to try estimating the time it will take you to fix all the warnings based on only the first couple of days of work. It's very pacey at first, so the forecast may appear too optimistic.

But we still needed to make an estimate somehow. I think there should be a magical formula for this, and hopefully we'll discover it and show it to the world someday. But presently, we are too short of statistical data to offer something reliable.

About the bugs found in the project

We have fixed quite a lot of code fragments. These fixes can be theoretically grouped into 3 categories:

Real bugs. We will show you a few of these as an example.
Not actually errors, yet these code fragments were confusing the analyzer and so they can confuse programmers who will study this code in the future. In other words, it was "sketchy" code that should be fixed as well. So we did.
Edits made solely because of the need to "please" the analyzer that would generate false positives on those fragments. We were trying to isolate false warning suppressions in a special separate file or improve the work of the analyzer itself whenever possible. But we still had to do some refactoring in certain places to help the analyzer figure things out.

As I promised, here are some examples of the bugs. We have picked out the most interesting defects that were clear to understand.

The first interesting message by PVS-Studio: V506 Pointer to local variable 'NewBitmap' is stored outside the scope of this variable. Such a pointer will become invalid. fontcache.cpp 466

void GetRenderData(....)
{
  ....
  FT_Bitmap* Bitmap = nullptr;
  if( Slot->bitmap.pixel_mode == FT_PIXEL_MODE_MONO )
  {
    FT_Bitmap NewBitmap;
    ....
    Bitmap = &NewBitmap;
  }
  ....
  OutRenderData.RawPixels.AddUninitialized(
    Bitmap->rows * Bitmap->width );
  ....
}

The address of the NewBitmap object is saved into the Bitmap pointer. The trouble with it is that right after this, the NewBitmap object's lifetime expires and it is destroyed. So it turns out that Bitmap is pointing to an already destroyed object.

When trying to use a pointer to address a destroyed object, undefined behavior occurs. What form it will take is unknown. The program may work well for years if you are lucky enough that the data of the dead object (stored on the stack) is not overwritten by something else.

A correct way to fix this code is to move NewBitmap's declaration outside the if operator:

void GetRenderData(....)
{
  ....
  FT_Bitmap* Bitmap = nullptr;

  FT_Bitmap NewBitmap;
  if( Slot->bitmap.pixel_mode == FT_PIXEL_MODE_MONO )
  {
    FT_Bitmap_New( &NewBitmap );
    // Convert the mono font to 8bbp from 1bpp
    FT_Bitmap_Convert( FTLibrary, &Slot->bitmap, &NewBitmap, 4 );

    Bitmap = &NewBitmap;
  }
  else
  {
    Bitmap = &Slot->bitmap;
  }
  ....
  OutRenderData.RawPixels.AddUninitialized(
    Bitmap->rows * Bitmap->width );
  ....
}

The next warning by PVS-Studio: V522 Dereferencing of the null pointer 'GEngine' might take place. Check the logical condition. gameplaystatics.cpp 988

void UGameplayStatics::DeactivateReverbEffect(....)
{
  if (GEngine || !GEngine->UseSound())
  {
    return;
  }
  UWorld* ThisWorld = GEngine->GetWorldFromContextObject(....);
  ....
}

If the GEngine pointer is not null, the function returns and everything is OK. But if it is null, it gets dereferenced.

We fixed the code in the following way:

void UGameplayStatics::DeactivateReverbEffect(....)
{
  if (GEngine == nullptr || !GEngine->UseSound())
  {
    return;
  }

  UWorld* ThisWorld = GEngine->GetWorldFromContextObject(....);
  ....
}

An interesting typo is waiting for you in the next code fragment. The analyzer has detected there a meaningless function call: V530 The return value of function 'Memcmp' is required to be utilized. pathfollowingcomponent.cpp 715

int32 UPathFollowingComponent::OptimizeSegmentVisibility(
  int32 StartIndex)
{
  ....
  if (Path.IsValid())
  {
    Path->ShortcutNodeRefs.Reserve(....);
    Path->ShortcutNodeRefs.SetNumUninitialized(....);
  }
  FPlatformMemory::Memcmp(Path->ShortcutNodeRefs.GetData(),
                          RaycastResult.CorridorPolys,
                          RaycastResult.CorridorPolysCount *
                            sizeof(NavNodeRef));
  ....
}

The return result of the Memcmp function is not used. And this is what the analyzer didn't like.

The programmer actually intended to copy a region of memory through the Memcpy() function but made a typo. This is the fixed version:

int32 UPathFollowingComponent::OptimizeSegmentVisibility(
  int32 StartIndex)
{
  ....
  if (Path.IsValid())
  {
    Path->ShortcutNodeRefs.Reserve(....);
    Path->ShortcutNodeRefs.SetNumUninitialized(....);

    FPlatformMemory::Memcpy(Path->ShortcutNodeRefs.GetData(),
                            RaycastResult.CorridorPolys,
                            RaycastResult.CorridorPolysCount *
                              sizeof(NavNodeRef));
  }
  ....
}

Now let's talk about a diagnostic message you are sure to encounter in nearly every project - so common is the bug it refers to. We are talking about the V595 diagnostic. In our bug database, it is at the top of the list regarding the frequency of its occurrence in projects (see examples). At first glance, that list is not as large as, say, for the V501 diagnostic. But it's actually because V595 diagnostics are somewhat boring and we don't write out many of them from every single project. We usually just cite one example and add a note like: And 161 additional diagnostic messages. In half of the cases, these are real errors. This is what it looks like:

Figure 4. The dread of V595 diagnostic.

Diagnostic rule V595 is designed to detect code fragments where a pointer is dereferenced before being checked for null. We always find some quantity of these in projects we analyze. The pointer check and dereferencing operation may be set quite far from each other within a function - tens or even hundreds of lines away, which makes it harder to fix the bug. But there are also small and very representative examples like, for example, this function:

float SGammaUIPanel::OnGetGamma() const
{
  float DisplayGamma = GEngine->DisplayGamma;
  return GEngine ? DisplayGamma : 2.2f;
}

PVS-Studio's diagnostic message: V595 The 'GEngine' pointer was utilized before it was verified against nullptr. Check lines: 47, 48. gammauipanel.cpp 47

We fixed this in the following way:

float SGammaUIPanel::OnGetGamma() const
{
  return GEngine ? GEngine->DisplayGamma : 2.2f;
}

Moving on to the next fragment:

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 289, 299. automationreport.cpp 289

void FAutomationReport::ClustersUpdated(const int32 NumClusters)
{
  ...
  //Fixup Results array
  if( NumClusters > Results.Num() )         //<==
  {
    for( int32 ClusterIndex = Results.Num();
         ClusterIndex < NumClusters; ++ClusterIndex )
    {
      ....
      Results.Add( AutomationTestResult );
    }
  }
  else if( NumClusters > Results.Num() )    //<==
  {
    Results.RemoveAt(NumClusters, Results.Num() - NumClusters);
  }
  ....
}

In its current form, the second condition will never be true. It is logical to assume that the mistake has to do with the sign used in it that initially was meant to provide for removing unnecessary items from the Result array:

void FAutomationReport::ClustersUpdated(const int32 NumClusters)
{
  ....
  //Fixup Results array
  if( NumClusters > Results.Num() )
  {
    for( int32 ClusterIndex = Results.Num();
         ClusterIndex < NumClusters; ++ClusterIndex )
    {
      ....
      Results.Add( AutomationTestResult );
    }
  }
  else if( NumClusters < Results.Num() )
  {
    Results.RemoveAt(NumClusters, Results.Num() - NumClusters);
  }
  ....
}

And here's a code sample to test your attentiveness. The analyzer's warning: V616 The 'DT_POLYTYPE_GROUND' named constant with the value of 0 is used in the bitwise operation. pimplrecastnavmesh.cpp 2006

/// Flags representing the type of a navigation mesh polygon.
enum dtPolyTypes
{
  DT_POLYTYPE_GROUND = 0,
  DT_POLYTYPE_OFFMESH_POINT = 1,
  DT_POLYTYPE_OFFMESH_SEGMENT = 2,
};

uint8 GetValidEnds(...., const dtPoly& Poly)
{
  ....
  if ((Poly.getType() & DT_POLYTYPE_GROUND) != 0)
  {
    return false;
  }
  ....
}

Everything looks fine at a first glance. You may think that some bit is allocated by mask and its value is checked. But it is actually just named constants that are defined in the dtPolyTypes enumeration and they are not meant for allocating any certain bits.

In this condition, the DT_POLYTYPE_GROUND constant equals 0, which means the condition will never be true.

The fixed code:

uint8 GetValidEnds(...., const dtPoly& Poly)
{
  ....
  if (Poly.getType() == DT_POLYTYPE_GROUND)
  {
    return false;
  }
  ....
}

A typo detected: V501 There are identical sub-expressions to the left and to the right of the '||' operator: !bc.lclusters ||!bc.lclusters detourtilecache.cpp 687

dtStatus dtTileCache::buildNavMeshTile(....)
{
  ....
  bc.lcset = dtAllocTileCacheContourSet(m_talloc);
  bc.lclusters = dtAllocTileCacheClusterSet(m_talloc);
  if (!bc.lclusters || !bc.lclusters)   //<==
    return status;
  status = dtBuildTileCacheContours(....);
  ....
}

When copy-pasting a variable, the programmer forgot to rename it from bc.lclusters into bc.lcset.

Regular analysis results

The examples above are by far not all the bugs found in the project, but just a small part of them. We cited them to show you what kind of bugs PVS-Studio can find, even in world-class thoroughly-tested code.

However, we'd remind you that running a single code base analysis is not the right way to use a static analyzer. Analysis needs to be performed regularly - only then will it enable you to catch a huge bulk of bugs and typos early in the coding stage, instead of the testing or maintenance stages.

The Unreal Engine project is a wonderful opportunity to prove our words with real-life examples.

Initially we fixed defects in the code without keeping track of whether they were fresh changes or old. It simply wasn't interesting in the early stages, when there were so many bugs to get through. But we did notice how the PVS-Studio analyzer started detecting bugs in freshly written or modified code after we cut the number of warnings to 0.

In fact, it took us a bit longer than 17 days to finish with this code. When we stopped making edits and achieved a "zero defect" message from the analyzer, we had to wait for two days more for the Unreal Engine team to integrate our final Pull Request. During this time, we continually updated our version of the code base from the Epic's repository, and analyzing the new code.

We could see the analyzer detect bugs in new code during those two days. Those bugs, we also fixed. This is a great example of how useful regular static analysis checks are.

In fact, the tip of the "number of warnings" graph now looked like this:

Figure 5. A schematic graph representing the growth of the warning number after it was cut to 0.

Now let's see what we managed to find during those last two days, when analyzing fresh updates of the project code.

Day one

Message one: V560 A part of conditional expression is always true: FBasicToken::TOKEN_Guid. k2node_mathexpression.cpp 235

virtual FString ToString() const override
{
  if (Token.TokenType == FBasicToken::TOKEN_Identifier ||
      FBasicToken::TOKEN_Guid) //<==
  {
    ....
  }
  else if (Token.TokenType == FBasicToken::TOKEN_Const)
  {
    ....
}

The programmer forgot to write Token.TokenType ==. It will cause the condition to always be true since the named constant FBasicToken::TOKEN_Guid is not equal to 0.

Message two: V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] CompressedDataRaw;'. crashupload.cpp 222

void FCrashUpload::CompressAndSendData()
{
  ....
  uint8* CompressedDataRaw = new uint8[BufferSize];         //<==

  int32 CompressedSize = BufferSize;
  int32 UncompressedSize = UncompressedData.Num();
  ....
  // Copy compressed data into the array.
  TArray<uint8> CompressedData;
  CompressedData.Append( CompressedDataRaw, CompressedSize );
  delete CompressedDataRaw;                                 //<==
  CompressedDataRaw = nullptr;
  ....
}

This bug doesn't always show up in practice as we are dealing with allocation of an array of items of the char type. But it is still a bug that can cause undefined behavior and must be fixed.

Day two

Message one: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. unrealaudiodevicewasapi.cpp 128

static void GetArrayOfSpeakers(....)
{
  Speakers.Reset();
  uint32 ChanCount = 0;
  // Build a flag field of the speaker outputs of this device
  for (uint32 SpeakerTypeIndex = 0;
       SpeakerTypeIndex < ESpeaker::SPEAKER_TYPE_COUNT,    //<==
       ChanCount < NumChannels; ++SpeakerTypeIndex)
  {
    ....
  }

  check(ChanCount == NumChannels);
}

A nice, fat bug.

The comma operator ',' is used to execute the two expressions to the either side of it in the left-to-right order and get the value of the right operand.

As a result, the loop termination condition is represented by the following check only: ChanCount < NumChannels.

The fixed condition:

static void GetArrayOfSpeakers(....)
{
  Speakers.Reset();
  uint32 ChanCount = 0;
  // Build a flag field of the speaker outputs of this device
  for (uint32 SpeakerTypeIndex = 0;
       SpeakerTypeIndex < ESpeaker::SPEAKER_TYPE_COUNT &&
       ChanCount < NumChannels; ++SpeakerTypeIndex)
  {
    ....
  }
  check(ChanCount == NumChannels);
}

Message two. V543 It is odd that value '-1' is assigned to the variable 'Result' of HRESULT type. unrealaudiodevicewasapi.cpp 568

#define S_OK       ((HRESULT)0L)
#define S_FALSE    ((HRESULT)1L)

bool
FUnrealAudioWasapi::OpenDevice(uint32 DeviceIndex,
                               EStreamType::Type StreamType)
{
  check(WasapiInfo.DeviceEnumerator);

  IMMDevice* Device = nullptr;
  IMMDeviceCollection* DeviceList = nullptr;
  WAVEFORMATEX* DeviceFormat = nullptr;
  FDeviceInfo DeviceInfo;
  HRESULT Result = S_OK;                      //<==
  ....
  if (!GetDeviceInfo(DataFlow, DeviceIndex, DeviceInfo))
  {
    Result = -1;                              //<==
    goto Cleanup;
  }
  ....
}

HRESULT is a 32-bit value split into three different fields: error severity code, device code, and error code. To work with HRESULT, special constants are used such as S_OK, E_FAIL, E_ABORT, and so on. And to check HRESULT values, such macros as SUCCEEDED and FAILED are used.

Warning V543 is output only when the programmer attempts to write values -1, true, or false into a variable of the HRESULT type.

Writing the value "-1" is incorrect. If you want to report some unknown error, you should use the value 0x80004005L (Unspecified failure). This and other similar constants are defined in "WinError.h".

Wow, this was a lot of work!

It may make some programmers and managers feel sad to learn that they need over two weeks to integrate static analysis into their project. But you don't necessarily have to go this way. You just should understand that the Epic Games developers chose an ideal path, yet not the simplest and quickest one.

Yes, the ideal scenario is to get rid of all the bugs right away and then promptly address only new messages triggered by freshly written code. But you can also start benefiting from static analysis without having to spend time up front fixing the old code.

PVS-Studio actually offers a special "message marking" mechanism for this purpose. Below is a general description of this feature:

All the messages output by the analyzer are marked in a special database as inactive. After that, the user can see only those messages which refer to freshly written or modified code. That is, you can start benefiting from static analysis right away. And then, when you have time and mood, you can gradually work on messages for the old code.

For details on this subject, see the following sources: documentation, how to quickly integrate static analysis into your project.

"Have you reported the bugs to the authors?"

After publishing every new article about checking some project, people will ask: "Have you reported the bugs to the project authors?" And of course we always do! But this time, we've not only "reported the bugs to the authors" but fixed all those bugs ourselves. Everyone interested can benefit from the results themselves in the Unreal Engine repository on GitHub (after you've created an Epic Games account and linked your GitHub account)

Conclusion

We hope that developers using Unreal Engine will appreciate PVS-Studio's role in improving Unreal Engine's source codend we are looking forward to seeing many awesome new Unreal Engine-based projects!

Here are some final conclusions to draw from the results of our work:

The Unreal Engine project's code is extremely high-quality. Don't mind the large number of warnings at the initial stage: it's a normal thing. Most of those warnings were eliminated through a variety of techniques and settings. The number of real bugs detected in the code is very small for such a large project.
Fixing someone else's code you are not familiar with is usually very difficult. Most programmers probably have an instinctive understanding of this. We are just telling an old truth.
The speed of "sorting out" analyzer warnings is not a linear one. It will gradually drop and you need to keep that in mind when estimating the time it will take you to finish the job.
You can only get the best from static analysis when you use it regularly.

Thanks to everyone for reading this article. May your code stay bugless! Sincerely yours, developers of the PVS-Studio analyzer. It's a good time right now to download and try it on your project.

↧

Math for Game Developers: Probability and Randomness

July 9, 2015, 11:27 am

≫ Next: 15 Metrics All Game Developers Should Know by Heart

≪ Previous: How the PVS-Studio Team Improved Unreal Engine's Code

Math for Game Developers is exactly what it sounds like - a weekly instructional YouTube series wherein I show you how to use math to make your games. Every Thursday we'll learn how to implement one game design, starting from the underlying mathematical concept and ending with its C++ implementation. The videos will teach you everything you need to know, all you need is a basic understanding of algebra and trigonometry. If you want to follow along with the code sections, it will help to know a bit of programming already, but it's not necessary. You can download the source code that I'm using from GitHub, from the description of each video. If you have questions about the topics covered or requests for future topics, I would love to hear them! Leave a comment, or ask me on my Twitter, @VinoBS

Note:
The video below contains the playlist for all the videos in this series, which can be accessed via the playlist icon at the top of the embedded video frame. The first video in the series is loaded automatically

Probability and Randomness

↧

15 Metrics All Game Developers Should Know by Heart

July 13, 2015, 8:51 am

≫ Next: 3 Steps to Mastering Your Game’s Soft Launch

≪ Previous: Math for Game Developers: Probability and Randomness

Mobile game analytics can feel complicated. When it comes to metrics, there are hundreds of numbers to track. On the simpler end of the spectrum, there are metrics like downloads, sessions, and DAUs. These numbers are relatively straightforward and measure concrete actions. More complicated metrics includes things like churn, Average Revenue Per Paying User (ARPPU) and DAU/MAU. These are less intuitive to interpret, and they might raise more questions than answers.

“Am I waiting the right amount of time until I consider a user churned?”

“What is a good ARPPU?”

And we haven’t even introduced more advanced analytics concepts like segmentation, funnels and custom events! For now, we’ll stick to just the metrics and look at what these numbers actually tell you about your game. While there’s no one-size-fits-all policy for game analytics, there are some useful metrics that can help shed light on how you can improve your mobile game.

Daily Active Users (DAUs)

Starting with the basics, DAU is the number of unique users that start at least one session in your app on any given day. By themselves, DAU and other high level metrics don’t provide much insight into an app’s performance. However, knowing these simple metrics is a useful starting point for an educated analytics discussion.

Let’s look at an example. Take a hardcore game that has 10,000 engaged users. These users all play the game several times each day and actively monetize. Compare that to a news or messaging app that has 1,000,000 DAUs but no monetization mechanics. A third app that has poor retention might run a user acquisition campaign. Today they have 500,000 DAUs, but tomorrow they are down to 100,000. A DAU count is merely a snapshot in time, and the surrounding context can be just as important, if not more important, than a large user base.

Sessions

Every time any user, not just a unique user, opens your app, that counts as a session. Similar to DAUs, the total number of sessions requires some context to be a helpful number. Specifically, focus on the average number of sessions per DAU, as this metric can tell you about how engaged users are with your game.

An app’s genre does have an effect on Sessions/DAU, as some game styles lend themselves to more frequent sessions. However, if users are coming back five to ten times each day, it’s safe to assume they enjoy the game. If users only open an app one to two times per day, it is unlikely to keep their attention for long.

DAU/MAU

The ratio of Daily Active Users to Monthly Active Users shows how well an app retains users and is often referred to as the stickiness of a game. This metric shows you how frequently users log in to your app. This metric will be easier to discuss with an example.

Let’s say an app has 100,000 MAU and averages 15,000 DAU. Then, the DAU/MAU ratio would be 15 percent. This means that the average user logged in on roughly 15 percent of the days that month.

Since this is a ratio, the metric DAU/MAU can only be a value between zero and one. Values closer to one, mean users are opening the app on a higher percentage of days. Popular social networking apps like Facebook have reported DAU/MAU ratios as high as 50 percent. But most successful gaming apps have ratios closer to 20 percent.

Retention

Retention is arguably the most important metric in a free-to-play game. Successful free-to-play games create long-term relationships with users. Users that enjoy the experience enough are willing to pay to for a competitive advantage. A game needs to have strong retention to have time to build this relationship.

To calculate retention, separate your users into cohorts based on the day they download your app. The day that the download occurs is Day 0. If a user opens your app the next day (Day 1), they are marked as retained. If they do not open the app, they are not retained. This calculation is performed for user cohort on each day after they download the app. Common days used for retention are 1, 3, 7 and 30.

Conversion Rate

Moving right along to everyone’s favorite topic: money! The above metrics focus on measuring your relationship with your users. How often do they come back to your app? But the most important metric for many indie developers is whether their game is making enough money.

The conversion rate measures the percentage of unique users that have made a purchase out of the total number of users during that time period. You can also measure the conversion rate of ads served in a free-to-play game.

Getting a user to pay real money in a game that they can play for free is a difficult assignment. But, as with many other industries, repeat purchasers generate the majority of revenue in free-to-play games. Encourage users to make that first conversion by offering them a virtual item of incredible value.

ARPDAU

The Average Revenue Per Daily Active User, or ARPDAU, is one of the most commonly discussed metrics in mobile games. ARPDAU is a useful metric because it allows you to understand how your game performs on a daily basis.

This is a great metric to track before and during user acquisition campaigns. Before acquiring users, make sure you know the range of your ARPDAU and how it fluctuates normally. During a campaign, segment your new users by source and see which networks or games perform the best in your app. We’ll discuss segmentation in a later post.

ARPPU

Average Revenue Per Paying User (ARPPU) measures only the subset of users who have completed a purchase in a game. This metric can vary dramatically based on game genre. Hardcore games tend to have higher monetization metrics like ARPPU, but they also lack the mass appeal of more casual games.

Churn

Churn is roughly the opposite of retention. How many players that downloaded your game are no longer playing? The churn metric makes the most sense in a subscription business model and there are some nuances involved when applying it to free-to-play games.

The main consideration is user play style. With a subscription service, churn is black and white. Either a user is paying or they are not. In a free-to-play game some users may play multiple times per day, while more casual players log in once or twice a week. To generalize for these differences between users, we measure churn as a user who has not played in 28 days.

In-Game Metrics

Beyond understanding user engagement, retention and monetization, it is important to measure and balance the game economy. If it is too easy to earn virtual currency, users have no reason to monetize. But users still need enough currency to enjoy and explore the game. There is a happy medium somewhere in between, and the following metrics can help find it.

Source, Sink and Flow

Sources are places where users can earn virtual currency. In the GameAnalytics dashboard, the source metric measures the amount of currency a user has earned. It also includes any currency he or she has generously been given by you, the benevolent game designer.

A sink is the opposite of a source. These are the locations in your game where users spend their precious currency. Both sources and sinks can refer to premium (hard) and secondary (soft) currencies. Keep these different types of currencies separate during your analysis.

Combining sources and sinks gives you the flow. Flow is the total balance of currency that your players have spent and earned. Generally the flow should look stable like in the chart below.

If the chart skews upward like an exponential curve, your player base will have too much currency and no need to monetize. If the chart slopes negatively to zero, players won’t have enough resources to do anything in your game.

Start, Fail and Complete

Lastly, we will look at some progression metrics. Whether or not the user has to explicitly start a new level, many game types have a leveling component. Starts measure the number of times a player starts a new level.

Second in the creatively named metrics category are fails. A fail occurs when a user starts a level but does not complete it.

As you might expect, a complete counts the number of times users complete a certain level. Tying all three of these together helps you analyze the levels in your game.

Are your choke points appropriately difficult? Are users getting stuck on certain levels unexpectedly? Which levels are users having the most fun playing and repeating? Starts, fails and completes can answer these types of questions.

While there is no magic recipe for game analytics, the above metrics are standards that can help you get started in the world of analytics. The most important part of mobile game analytics is to get started and establish benchmarks for your own games. Once you understand how your users behave, you can measure things like the impact of a game update or changes to your user acquisition strategy.

Originally posted on the GameAnalytics blog.

↧

3 Steps to Mastering Your Game’s Soft Launch

July 21, 2015, 6:29 am

≫ Next: Get Professional - A Professional Skillset for Software Engineers

≪ Previous: 15 Metrics All Game Developers Should Know by Heart

In one of the largest supermarket outlets in the whole of the UK, just down the road from where your humble blogger lives, sits a kitchen. Built in plain sight, this bizarre little pantry feels a somewhat curious companion to the seemingly endless run of aisles selling all the goods you'd expect a supermarket to sell.

Resembling something of a greenhouse – with windows looking out to the rest of the store so those inside can view all and sundry in the midst of their weekly shopping – said kitchen is actually an especially important facet of this supermarket chain's national operations. The customers may not know it, but in this kitchen, unreleased dishes – radical new ready meals, loaves of bread, cakes, soups, drinks, or indeed any foodstuff you can think of – are put through their paces.

Carefully comprised focus groups are brought into the store, sat down in the kitchen, and tasked with trying out these fresh recipes in a controlled environment. The feedback these groups give can make or break these products – most will never make it to market, relegated to the kitchen bin of history, simply because the group of people designed to be representative of the meal's target market weren't bowled over by it.

The concept of consumer testing is by no means one limited to the supermarkets or even food industry, nor is it in any way a new approach. For mobile developers, however, the idea that you could get a mass of players from a select market to give you valuable data in regard to the likely performance of your game worldwide ahead of release is something that's only come around in a practical fashion since the rise of the App Store in 2008. Apple and Google offer no official 'soft launch' programs, but the region by region nature of their stores means there's nothing to stop you launching in select territories ahead of a global debut in order to find out what works and what doesn't.

It's also something all the cool kids are doing. Think of any major free-to-play developer or publisher from the last 3-4 years, and almost every one of them will have mastered the soft launch along the way. Want to know what Supercell's most recent project is, for instance? Then you might want to hop over to Canada or down to Australia and download Smash Land. (But do it quick, because the game has already been canned, such is the harsh world of the soft launch).

As with any process, however, there's no point in rolling out your game in Australia, Canada, Ireland or New Zealand if you simply do it as a matter of course. The art of a successful soft launch requires careful planning, and the results you generate from it are only as good as the parameters you set for your game before you set out. Parameters such as:

1. Know what questions you want answered

Is there a particular question you want to find out the answer to? Is there a distinct game mode, level, or even small facet of the game that you're not sure if it works or not? As well as any base question – which you need to set out pre-soft launch – there are other basic areas you need to measure during the soft launch to get the most out of it:

User interaction – How are players moving through your game? How often are they playing, how are they moving through the gameplay, and where are they leaving the game?
Monetisation – Are people spending money in play? Does spending cash unlock too much of the game, or is it too easy to play on without spending anything at all? Are the objects for sale of any value to the player, or are you pushing the wrong component of play?
Virality – Are players talking about your game of their own volition? Is it gaining a following on Twitter or Facebook? What are people saying? Have you got all the tools in place to make virality easy?
Retention - How many people come back to your game over a period of 1,7 or even 30 days? Are there any posts or notifications that perform better than others?

2. Be prepared to change

Indeed, in the current Games as a Service era, your game is never finished.

It sounds obvious but, even with all the data in the world from the soft launch telling them something is wrong with their game, many developers are stubborn so and sos, dismissing anything negative that comes up and simply viewing the soft launch as the first bastion of a staggered roll out instead of a test designed to help them iron out the dips and troughs.

If the data from your soft launch tells you something isn't working, that isn't going to magically change once you roll the game out in other territories. Part of this comes from accepting that your game isn't finished when you soft launch. Indeed, in the current Games as a Service era, your game is never finished. Either way, a soft launch is only any use if, once the questions above have been answered, you take steps to ensure that any highlighted problems are rectified. It may even be that the soft launch highlights such fundamental issues that the game never actually sees the light of day worldwide.

This is by no means rare. German giant Wooga has built a name for itself by canning projects that data shows just aren't up to scratch using the aptly termed, "Hit Filter". The worldwide release of Supercell's Smash Land, polished though it was, has actually been pulled during the writing of this article. But why? The damage a bad game does for your brand can be immeasurable, and you can't guarantee that you'll have a Rovio-like revival even if you have your Angry Birds waiting in the wings.

Successful games development means learning from your mistakes and the soft launch is an early warning system - it can tell you just what's wrong with your game before you suffer the indignity of finding out in front of the glare of the world's gamers.

This is not a beta test – this is consumer ready launch

3. Make sure the game is ready

This might sound like a contradiction, but while you should be prepared to make changes to your game, likewise don't send it off to soft launch with known issues.

If you launch it in a region with bugs, unfinished elements or half-baked gameplay, all your data will do is prove you correct, causing you to overlook other issues you might not be aware of. You have to launch the game in a state that you would be comfortable launching it in globally, while at the same time prepared to be flexible and make changes should it highlight problems.

This is not a beta test – this is a consumer ready launch, but one that negates the risk of showcasing your game's problems to a massive audience. It's a little window into how the game would play out if you launched now, but for that to work, the game has to be in a retail ready state to the best of your knowledge, even if the data you amass afterwards proves otherwise.

In short, don't soft launch with any known faults. Every pitfall should be a surprise.

Aside from aligning yourself with the right partners – study your chosen soft launch region carefully, and if you need to, pick a partner who has a handle on the procedure – that, in broad strokes, is all you need to know. The long and short of it is, if you want a soft launch to be of any use at all, you need to treat it with respect and take it seriously. The digital nature of mobile means we have the ability to gather essential player data that can make or break a game ahead of time. Don't waste that.

Originally posted on the GameAnalytics blog.

↧

Get Professional - A Professional Skillset for Software Engineers

June 6, 2015, 5:06 am

≫ Next: Google Analytics-Driven Game Development by Example

≪ Previous: 3 Steps to Mastering Your Game’s Soft Launch

Everybody is a great coder

It's now over fifteen years since I first got money for a program that I wrote, and since over ten years I consider myself a professional software engineer. When I started out programming I was convinced that raw coding ability and deep, innate technical knowledge of a few programming languages would be all it would take to make it in the industry. But over the years I learned that there is more to being a professional programmer than hacking together complex code. Having worked both in academics and industry, this often became most apparent to me when working with newbies fresh out of college. Often they are great programmers, but lack some of the surrounding skills to make them look professional to me. So what distinguishes the common coder and/or hobbyist from the pros? Before I forget: This article is written on a purely subjective basis of my personal experiences, feel free to tell me that I'm wrong

Diversify your knowledge

Get things done with your code

While being an able coder is not the only thing that counts, not being one immediately disqualifies you to become a successful programmer. Be sure to master at least one full fledged, common programming language such as C#, Java, C++ etc. Go into details of the language, know the standard libraries, know how to wield the language to great effectiveness and know how to cope with any limitations it will have. As a bonus to your main language, know at least one scripting language to automate small tasks, create quick experiments and do ugly proof-of-concepts. Don't be slowed down because you don't know your basic tool - the programming language - well enough.

In short: Make sure you know how to get things done and get them done reliably.

Show me your mind

Getting complex things to work is only one side of the medal, the other side is that those complex structures want to be explained. Maybe your fellow code monkeys want to discuss a problem with you, maybe it's your boss trying to get a grip with your skills, maybe it's plain old documentation that needs to be written. Whatever the cause, sometimes just writing code comments is not enough. Know how to visualize (and read) software structures. Be it UML diagrams, flow charts, ERD or simply venn diagrams, know how to draw a diagram that others can understand. You don't need to be fully proficient in the last details of the "official" iconographics of any visual language, but not being able to draw a simple class diagram or state machine will leave your thoughts behind closed doors for others. On the other hand, learn to recognize (and name) patterns in these charts, however badly scribbled by one of your fellow hackers.

Prove me that it works

Yay, your great ideas and concepts are approved by your team and you've written that beautiful piece of code to solve the problems of the world. But does it survive the impact of the users? Testing, Testing, Testing is the only answer to this. Know how to test and debug your program. Relying on print-debugging and testing by "it appears to run correctly" may work for small projects and in the short-term, but there is a certain point where we want to guarantee that a program works in exactly 'that' way. While you don't need to embrace test-driven development in the full, know at least about unit testing, know how to use a debugger efficiently, use software asserts and read into design by contract. Learn how to do a performance analysis of your code and other non-functional requirements. Be able to write consistent and good tests for your software, know what is covered by your tests and what is not.

Your code has a history

While well-tested code is a must, sometimes the path that lead to any particular solution is also important. Sometimes you just have to see how code evolved to understand it or find the exact moment when a bug was introduced, luckily there are plenty of versioning tools out there which help with this task. Be sure to know about source code management and versioning systems. It doesn't matter whether this is SVN, Git, Mercurial, TFS or any of the others on the market, just be at home with the usage of at least one and get the basic concepts behind them. It's not just about versioning, but also about backing up code and tracability of features. It's astounding how many hours of productivity I've seen go down the drain because some guy in a team of coders just did not use versioning software properly. Be it in files lost, because of an acciential overwrite, being unable to merge a conflict with a team member or just messing up the workflow for everybody else by not following the basic concepts behind source code management.

Take one for the team

Now you're able to produce well-designed, peer-reviewed, tested and traceable code which greatly increases your chances to be seen as a professional software engineer. Time to start working together with other great professionals out there. Software development is no longer an occupation for loners who work alone, nowadays software development is considered a team sport by many successful companies. So be sure that you are prepared to work together with other humans and become a teamplayer. Grasp the basic fundamentals of software development processes in teams, know about iterative vs. incremental software development, about defining deliverables and features. There are tons of development processes out there to read into, be it scrum, kanban, XP, feature driven development - whatever floats your (or your employer's) boat. No need (yet) to get any certificate on anything about these processes, but be aware that they exist, and that they are used to great success by many software producing companies. Another major point is if you want to be part of a tight-knit development team, learn to be a "good citizen" for that team. Be honest towards yourself and others, learn how to communicate on a professional level if things don't go according to your plan and last but not least, sometimes you just have to bite the bullet, swallow your pride and take one for the team.

That's it?

Am I missing something in this article? Certainly.

Of course there is more to being a successful programmer/software engineer than the five points mentioned above, and of course not all of them apply to the same extent to anyone or any company. Working with a startup-indie studio that needs to produce something now with limited personal resources puts a different weight on things than being hired by an AAA company were you are just a tiny cogwheel in a great machine.

Also some of the things mentioned above are easy to learn, while others need time and experience to grow. Learning tools is easy, learning people is hard. It's a catch-22 you need to work together with other professionals to gain the skills needed to be accepted as a professional yourself. Keep an open mind for well meant critiques and be reflective about yourself and don't give up.

Feel free to comment on this article or reach me over PM.

Article Update Log

No updates yet

↧