The Travel Ban and Your Studio: What You Can and Can’t Do to Protect Your Employees

February 3, 2017, 3:02 pm

≫ Next: What Languages to Localize Your Steam Game Into?

≪ Previous: Composing Adaptive Music (Non-linear)

Introduction

The Trump Administration’s travel ban generated considerable shock, fear, and confusion for citizens, legal residents, and refugees alike. It has also raised concerns for game studios and publishers that employ the people affected by the Executive Order (“EO”). While many are discussing the outrage and constitutionality of the EO, lawyers and immigration professionals have limited information on the future of the ban for those employers who desperately want to protect their non-citizen employees.

There is a good reason for this. The Executive Order is vague and broad in its scope and reach. There has certainly been some confused mishandling in its execution. Although there are still quite a few “indefinites”, here’s what we do know:

About the Ban

Affected Parties.

The EO applies to all nationals from Iran, Iraq, Syria, Sudan, Libya, Somalia, and Yemen. This includes nationals with existing visitor visas (B-1/B-2) and work visas (e.g. H-1B, L-1, O-1). For purposes of the EO it is best to assume that those visas are provisionally suspended until the ban is lifted. Although there has been confusion to what extent the travel ban applies to legal permanent residents (“LPRs”, e.g. green card holders), the ban does not apply to naturalized citizens of the U.S. who originated from the referenced countries.

Additionally, all refugee applications by nationals of the referenced countries are suspended for 120 days following the EO, and Syrian refugees are banned indefinitely. The currently available information indicates that affected parties holding dual citizenship with another country are also included in the ban. However, so far (according to reports from the UK and Canada), this only seems to apply if the entrant is traveling from a referenced country.

Effect of the Executive Order.

The EO has a few notable functions. First, it imposes a 90-day ban on admission from all ports of entry into the United States for all Affected Parties, excluding LPRs, who may be subject to secondary screening. This means non-LPR nationals of the 7 referenced countries seeking entry into the U.S. will either be detained or sent back, regardless of whether they have a valid visa or not.

But there’s more—the EO requires that the Department of Homeland Security (DHS) and State Department request information concerning individuals seeking entry from the referenced countries. If the referenced countries refuse to provide that information, the ban may become permanent. The EO also orders the USCIS to immediately suspend processing for all immigration benefit applications filed by or on behalf of nationals from the seven referenced counties. This includes work visas renewals, petitions for asylum, green card applications, adjustment of status and naturalization.

Finally, the order also suspends the Visa Interview Waiver program, which allows eligible foreign nationals wishing to renew a nonimmigrant visa to request a waiver to the in-person interview requirement.

Ongoing Concerns.

As mentioned above, there is a likelihood that the ban may become permanent for those countries that do not comply with the DHS information requests. Additionally, there is a draft Executive Order circulating that may target legal immigration. Specifically, the order would require DHS to perform site visits to employers of L-1 nonimmigrant workers. Suits have been filed in several jurisdictions challenging the validity, constitutionality, and enforcement of the EO. One or more of these suits may impact short term and long term ramifications of the EO.

For all practical purposes, that summarizes as much as we know about the breadth and scope of this travel ban. The confusion and fear is mostly the result of the unknown ramifications of the order. This bears particular impact on our industry for a few reasons: 1) studio work environments are often fluid—this allows studios to employ or contract foreign nationals who may telecommute from their countries of origin and only come to the U.S. for limited periods of time under a temporary work visa or visitor visa; 2) GDC is on the horizon and falls squarely in the middle of the ban period; and 3) we want to recruit the best and brightest minds regardless of their country of origin. The travel ban creates a more than inconvenient barrier to these objectives. So how can you, as a game studio, prepare yourself for what’s ahead?

Coping With the Executive Order

Step 1. Cancelling Travel Plans

If you employ non-LPRs from one of the referenced countries that are here on a work or visitor’s visa, you should cancel any international business travel plans on behalf of those individuals. Simply put, they won’t be allowed back in the country at this time, and there is no telling how long the ban will actually be in place. You can’t force them to stay, as courts tend to frown on excessive corporate paternalism over employees, but to the extent you can control the situation you should strive to do so.

For LPRs from the referenced countries, international travel may be incredibly inconvenient right now, but should still be possible if proper measures are taken. As mentioned above, LPRs with a valid green card are to a certain extent excluded from the ban. However, Although the administration as confirmed that LPRs should still be able to travel, the fact remains that LPRs have in fact been detained.

If an LPR from the referenced country must travel for business, they should prepare for secondary inspection upon re-entry subject to enhanced screening. They will be asked about their religious beliefs, political views, and social media accounts. They may also be pressured by CBP agents to sign form I-407, “Record of Abandonment of Lawful Permanent Resident Status”. First and most obviously, they shouldn’t sign I-407. Additionally, LPRs departing from the US should a) consult with an immigration attorney and company counsel prior to departure and obtain both a signed form G-28 (Notice of Entry of Appearance as Attorney or Accredited Representative) and a legal opinion letter specifying the basis for re-entry; and b) remain respectful but silent concerning their religious beliefs, political views, etc.

It’s also important to advise those affected employees against personal international travel. Inform them of the risks involved (namely, not being allowed back in the country) and the threat to their personal safety and physical assets in the US if they leave right now. Once again, you cannot force them to stay if they choose to go, but they should be made aware of the consequences.

Unfortunately, this includes those employees or contractors working abroad who planned to attend DICE or GDC. If you have employees or contractors who are nationals of the referenced countries that intend to travel to San Francisco or Vegas for these events, you should contact an immigration attorney immediately to discuss your options.

Step 2. Reviewing Documentation

If you haven’t done so recently, now would be a good time to perform an I-9, Visa, and immigration benefits documentation audit for any employee that may be impacted by the ban. Additionally, you should determine the status of any applicable outstanding work visas, and any pending requests for immigration benefits on behalf of affected employees, including renewals. However, if any such benefits/visas are set to expire during the ban period, those employees should be advised of the risks of over-staying their validity period. Applications and immigration benefit matters for adjudication submitted during the ban may be returned or rejected.

Step 3. Consulting an Immigration Attorney

Now is the time to consult a specialist if you have employees that are affected by the ban. Unfortunately, contacting the USCIS, CBP or ICE right now isn’t likely to give you the specialized information you need for your studio. An immigration attorney with specialized knowledge in the area of immigration benefits and nonimmigrant work and travel visas will be able to give you an idea of the potential outcomes of the ban and how it will impact your current and future employees and contractors.

Step 4. Setting Aside or Finding Alternative Placement Solutions for Future Affected Employees

If you want to hire from the referenced countries in the future, you need to be mindful of the possibility that this ban may become permanent, unless it’s overruled by the Courts or Congress. That being said, if you offer a flexible work environment that does not require your contractor or employee to reside in the US, you should still consider retaining these individuals. As stated previously, our industry seeks out the best and brightest minds regardless of religion, race, gender, or country of origin, and I hope that is a practice we continue.

Step 5. Preparing for Delays

Remember that your studio is not the only one impacted by the ban. Your publishers, investors, distributors, and third party licensors may also experience significant burdens and set-backs because of the Executive Order. We may also face retaliation from the referenced countries, which not only may impose similar bans, but may make trade with the US prohibitive. Expect considerable delays and keep an eye on your force majeure clauses. Any ongoing negotiations should take the ban and future ramifications into consideration. Any future deals should include force majeure events reflecting the current unsettled environment.

Conclusion

Our industry strives for inclusiveness. I would be remiss if I didn’t emphasize this point. If you are minded to continue doing business with individuals and businesses based in the 7 referenced countries, I do not want this ban to discourage your determination. However, doing so will require some vigilance and safeguarding on your part. We are not yet aware of the full reach of this EO or the intent of the Administration in pursuing such action.

↧

What Languages to Localize Your Steam Game Into?

January 5, 2017, 5:09 am

≫ Next: Frustum Culling

≪ Previous: The Travel Ban and Your Studio: What You Can and Can’t Do to Protect Your Employees

This blog was originally posted on Level Up Translation's blog.

As the developer or publisher of a title that took a considerable amount of time and money to develop, the localization of your game is clearly a point you should not neglect.

Localization strategies differ from one platform to another though.

Here are a few tips to help you decide what languages to localize your Steam game into.

7 languages cover 65% of Steam users

Your game is going to hit Steam and you don't even know where to start with its localization? Don't worry, we've got you covered!

Here are the 7 languages (including English) you should absolutely consider localizing your game into:

1 - Russian

10.88% of Steam users are Russian. They make up the second largest gaming population on Steam after the US.

Russian players also own a whopping 8,66% of the total games owned on Steam, and PC is by far their favourite gaming platform.

Don't think twice, localize your game in Russian!

2 - German

4.93% of Steam users are German and they account for 6.23% of the games owned on the platform (31.87 per user on average, against 20.09 for Russian players).

Germany is also the first European country in terms of game revenue, so localizing your game in German is not only a safe bet, it's a must.

3 - Brazilian Portuguese

The share of Steam users from Brazil keeps on increasing.
4.72% of Steam users are Brazilian and they account for 3.55% of the total games owned on the platform.

Brazil is the most important market in South America and English proficiency is relatively low. Still hesitating to localize your game in Brazilian Portuguese? Think again!

4 - French

3.61% of Steam users are from France and they account for 3.49% of the games owned.

Localizing your title in French also gives you access to Quebec as well as French-speaking countries in North and West Africa. However, if you are specifically targeting French-speaking gamers located in Canada, we do recommend that you localize your game into Quebecois as well.

5 - Chinese

Chinese gamers mostly play on PC (57% of the Chinese gaming population) and with 4.86% of Steam users coming from China, your game definitely has to be localized for that market.

Chinese users own relatively few games (2.46% of total games owned on Steam) but this is probably due to the relatively low number of games available in Chinese on the platform at the moment.

Who said niche? If your game has the potential to find an audience in China, you know what to do next.

6. Spanish, but...

Although “only” 1.43% of Steam users are from Spain, as much as about 6% of Steam users come from Spanish-speaking countries.

Spanish is a pretty special case though. Should you decide to tackle the Latin American market (the second fastest growing region in terms of game revenues), we highly recommend that you go for specific locale versions.

Localizing your game in the above 6 languages will have more than 35% of Steam users covered. Providing your game was developed in English (an additional 30%), this makes your game available to 65% of Steam users!

Raise your hand if you would like to miss 65% of the Steam market! Anyone? No? Good...

7c9354_57e50f2179984782ab588c6599b19d7f~

7c9354_57e50f2179984782ab588c6599b19d7f~

Other languages worth considering for Steam

Italian

Looking at the numbers, the Italian gaming market is far from its days of glory. However, one could hardly recommend to ignore the "I" in the traditional FIGS (French, Italian, German, Spanish).

Not only has Italy a relatively low English proficiency, but choosing not to localize your title in Italian might expose you to negative criticism for not living up to the expectations of Italian gamers. Many consider the lack of Italian localization an eliminatory criteria for playing a game, and just like many French and Spanish players, Italian gamers tend to swiftly uninstall a game if it is not available in their native tongue.

Polish, Ukrainian

Given the share of Steam users speaking these two languages (respectively 9th and 11th population of Steam users), localizing your game in Polish and Ukrainian is a pretty smart move.

They are also cheaper than French, German, Italian or Spanish, so if you have the budget, go for it!

Turkish

2.04% of Steam users speak Turkish. For comparison, Swedish players represent 1.54% of Steam's audience.

On the other hand, translating from English to Turkish takes nearly 50% longer than translating into FIGS. Turkish is therefore relatively expensive when it comes to localization, and we only recommend it if your budget can handle it.

7c9354_551bb61c2bbe4d3f94d3ca44f45ca429~

7c9354_551bb61c2bbe4d3f94d3ca44f45ca429~

Has this post helped you clarify where your Steam games could sell best? Then gear up for your global quest and work with our game localization specialists who will pour their heart and soul (as well as a considerable amount of coffee/tea) into the localization of your game!

Contact us now!

Follow Level Up Translation on Facebook, Twitter and LinkedIn to get all our tips and insights to help you with your game localization!

If you like what you just read, there's more for you!
Just follow us for more game localization tips and insights:

Facebook
Twitter
LinkedIn

Got a game that needs to be localized?
Tell us about it! We've got plenty of XP to help you level up!

Level Up Translation - Expert Video Game Localization Services - www.leveluptranslation.com

↧

Frustum Culling

February 12, 2017, 1:37 pm

≫ Next: 7 Effective Tips To Create an Engaging Game Level Design

≪ Previous: What Languages to Localize Your Steam Game Into?

Introduction

Frustum culling is process of discarding objects not visible on the screen. As we don't see them – we don't need to spend resources on computer to prepare it for rendering and rendering itself.

In this paper I will cover next themes:

culling of: Bounding Spheres, Axis-Aligned Bounding Boxes (AABB), Oriented Bounding Boxes (OBB)
culling of huge amount of objects
using SSE
multithreaded culling
GPU culling
comparison of approaches efficiency, working speed

What I will not cover:

using hierarchical structures, trees. We can unite objects in groups according to world positions and first check visibility of whole group.
optimizations of one object, like using last 'successful' culling plane
visibility test taking into account scene depth buffer. Object can be inside frustum, but can be completely blocked by another, closer to viewer object. Hence we also might discard this object from rendering
software culling. We may perform blocking of one objects by another on CPU side.
Culling for shadows

Simple culling

We have area of visibility, which is set by frustum of viewing pyramid. Objects that are not inside this area should be discarded from rendering process. Frustum one usually set with 6 planes.

We have objects in the scene. Each object might be aproxinated with simple geometry such as sphere or box. All object's geometry lies inside this primitive.

Visibility test of such simple geometry performs very fast. Our aim is to understand if this object visible in frustum.
Consider the definition of visibility: spheres and boxes. There are different kinds of boxes: world axis aligned (AABB) and aligned according local object's axis (OBB). One could clearly see that OBB beter aproximates object, but it's visibility test performs harder than AABB.

Sphere-frustum

Algorithm: for object's center we find distance to each frustum plane. If point is behind any plane more than sphere radius, then sphere not in frustum. And found one is splitting plane.

__forceinline bool SphereInFrustum(vec3 &pos, float &radius, vec4 *frustum_planes)
{
	bool res = true;
	//test all 6 frustum planes
	for (int i = 0; i < 6; i++)
	{
		//calculate distance from sphere center to plane.
		//if distance larger then sphere radius - sphere is outside frustum
		if (frustum_planes[i].x * pos.x + frustum_planes[i].y * pos.y + frustum_planes[i].z * pos.z + frustum_planes[i].w <= -radius)
			res = false;
			//return false; //with flag works faster
	}
	return res;
}

AABB-frustum

Bounding sphere sometimes not great choise to approximate object. For more precise test one often use boxes. World Axis-Aligned or Oriented Bounding Boxes.
Basic idea to test the box visibility in frustum: if all 8 box points lie behind one of frustum planes then box is not in frustum. In next example implemented AABB-fustum test. But if we place OBB world-space points in equations – we get OBB-frustum test.

__forceinline bool RightParallelepipedInFrustum2(vec4 &Min, vec4 &Max, vec4 *frustum_planes)
{
//this is just example of basic idea - how BOX culling works, both AABB and OBB
//Min & Max are 2 world space box points. For AABB-drustum culling
//We may use transformed (by object matrix) to world space 8 box points. Replace Min & Max in equations and we get OBB-frustum.
	//test all 6 frustum planes
	for (int i = 0; i<6; i++)
	{
	//try to find such plane for which all 8 box points behind it
	//test all 8 box points against frustum plane
	//calculate distance from point to plane
	//if point infront of the plane (dist > 0) - this is not separating plane
		if (frustum_planes[i][0] * Min[0] + frustum_planes[i][1] * Max[1] + frustum_planes[i][2] * Min[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Min[0] + frustum_planes[i][1] * Max[1] + frustum_planes[i][2] * Max[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Max[0] + frustum_planes[i][1] * Max[1] + frustum_planes[i][2] * Max[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Max[0] + frustum_planes[i][1] * Max[1] + frustum_planes[i][2] * Min[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Max[0] + frustum_planes[i][1] * Min[1] + frustum_planes[i][2] * Min[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Max[0] + frustum_planes[i][1] * Min[1] + frustum_planes[i][2] * Max[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Min[0] + frustum_planes[i][1] * Min[1] + frustum_planes[i][2] * Max[2] + frustum_planes[i][3]>0)
			continue;
		if (frustum_planes[i][0] * Min[0] + frustum_planes[i][1] * Min[1] + frustum_planes[i][2] * Min[2] + frustum_planes[i][3]>0)
			continue;
		return false;
    }
    return true;
}

AABB-frustum test might be implemented more optimal.
Algorithm: from 8 point find closest one to the plane ant test if it is behind the plane. If yes – the object is not in frustum.

__forceinline bool RightParallelepipedInFrustum(vec4 &Min, vec4 &Max, vec4 *frustum_planes)
{
	bool inside = true;
	//test all 6 frustum planes
	for (int i = 0; i<6; i++)
	{
	//pick closest point to plane and check if it behind the plane
	//if yes - object outside frustum
		float d = max(Min.x * frustum_planes[i].x, Max.x * frustum_planes[i].x) +
				  max(Min.y * frustum_planes[i].y, Max.y * frustum_planes[i].y) +
				  max(Min.z * frustum_planes[i].z, Max.z * frustum_planes[i].z) +
				  frustum_planes[i].w;
		inside &= d > 0;
		//return false; //with flag works faster
	}
	return inside;
}

OBB-frustum

Algorithm: transform 8 box points from local space to clip-space. It is easy to test points against plane in such space as frustum becomes the unit cube [-1..1] (but we should take into account that for DirectX for Z axis we have another size [0..1]). If for one of axis all 8 vertexes <-1 or >1, then box outside the frustum.

__forceinline bool OBBInFrustum(const vec3 &Min, const vec3 &Max, mat4 &obj_transform_mat, mat4 &cam_modelview_proj_mat)
{
	//transform all 8 box points to clip space
	//clip space because we easily can test points outside required unit cube
	//NOTE: for DirectX we should test z coordinate from 0 to w (-w..w - for OpenGL), look for      transformations / clipping box differences
	//matrix to transfrom points to clip space
	mat4 to_clip_space_mat = cam_modelview_proj_mat * obj_transform_mat;

	//transform all 8 box points to clip space
	vec4 obb_points[8];
	obb_points[0] = to_clip_space_mat * vec4(Min[0], Max[1], Min[2], 1.f);
	obb_points[1] = to_clip_space_mat * vec4(Min[0], Max[1], Max[2], 1.f);
	obb_points[2] = to_clip_space_mat * vec4(Max[0], Max[1], Max[2], 1.f);
	obb_points[3] = to_clip_space_mat * vec4(Max[0], Max[1], Min[2], 1.f);
	obb_points[4] = to_clip_space_mat * vec4(Max[0], Min[1], Min[2], 1.f);
	obb_points[5] = to_clip_space_mat * vec4(Max[0], Min[1], Max[2], 1.f);
	obb_points[6] = to_clip_space_mat * vec4(Min[0], Min[1], Max[2], 1.f);
	obb_points[7] = to_clip_space_mat * vec4(Min[0], Min[1], Min[2], 1.f);

	bool outside = false, outside_positive_plane, outside_negative_plane;

	//we have 6 frustum planes, which in clip space is unit cube (for GL) with -1..1 range
	for (int i = 0; i < 3; i++) //3 because we test positive & negative plane at once
	{
	//if all 8 points outside one of the plane
	//actually it is vertex normalization xyz / w, then compare if all 8points coordinates <      -1 or > 1
		outside_positive_plane =
			obb_points[0][i] > obb_points[0].w &&
			obb_points[1][i] > obb_points[1].w &&
			obb_points[2][i] > obb_points[2].w &&
			obb_points[3][i] > obb_points[3].w &&
			obb_points[4][i] > obb_points[4].w &&
			obb_points[5][i] > obb_points[5].w &&
			obb_points[6][i] > obb_points[6].w &&
			obb_points[7][i] > obb_points[7].w;

		outside_negative_plane =
			obb_points[0][i] < -obb_points[0].w &&
			obb_points[1][i] < -obb_points[1].w &&
			obb_points[2][i] < -obb_points[2].w &&
			obb_points[3][i] < -obb_points[3].w &&
			obb_points[4][i] < -obb_points[4].w &&
			obb_points[5][i] < -obb_points[5].w &&
			obb_points[6][i] < -obb_points[6].w &&
			obb_points[7][i] < -obb_points[7].w;

		outside = outside || outside_positive_plane || outside_negative_plane;
		//if (outside_positive_plane || outside_negative_plane)
			//return false;
	}
	return !outside;
	//return true;
}

Table 1: culing results of 100к objects. Intel Core I5. Single Thread.

Simple Culling	Sphere	AABB	OBB
Just cullung	0,92	1,42	9,14
Whole frame	1,94	2,5	10,3

The results are obvious. The harder calculations the slower it works. OBB test much slower than Spheres or AABB tests. But we get more precise culling with OBB.
May be, optimal solution is spiting objects into groups, For each group depending on distance to camera we use appropriate primitive. For closest groups use OBB, for middle one groups use ABB and Spheres for the rest.
Also should be notices than whole frame time is larger than just culling. 1 ms. in average. Because of transferring data about visible objects to gpu has cost, couple of dips and API commands. But it is necessary actions.

SSE

SSE (Streaming SIMD Extensions) – with one instructions we perform calculations on group of operands. SSE includes in it's architecture eight 128 bit registers and set of instructions to perform any operations on them.

Theoretically we might speedup code execution 4 times as we make operations with 4 operands simultaneously. Offcourse on practice perfomance win will be less because of SSE drawbacks.

Not all algorithms could be easily rewrited in SSE
data should be packed according to SSE requirements in registers to perform calculations
SSE has some restrictions with vertical operations like dot products
there are no conditions. One use so called static branching, when we execute bot 2 parts of condition an take just one interesting to us result.
Loading data in registers and storing results back into memory
don't forget about sse data striding

Algorithm of SSE Spheres-frustum and SSE AABB-frustum culling almost identical to simple implementation. In exception of that we perform calculations on 4 objects simultaneously.

void sse_culling_spheres(BSphere *sphere_data, int num_objects, int *culling_res, vec4 *frustum_planes)
{
	float *sphere_data_ptr = reinterpret_cast<float*>(&sphere_data[0]);
	int *culling_res_sse = &culling_res[0];

	//to optimize calculations we gather xyzw elements in separate vectors
	__m128 zero_v = _mm_setzero_ps();
	__m128 frustum_planes_x[6];
	__m128 frustum_planes_y[6];
	__m128 frustum_planes_z[6];
	__m128 frustum_planes_d[6];

	int i, j;
	for (i = 0; i < 6; i++)
	{
		frustum_planes_x[i] = _mm_set1_ps(frustum_planes[i].x);
		frustum_planes_y[i] = _mm_set1_ps(frustum_planes[i].y);
		frustum_planes_z[i] = _mm_set1_ps(frustum_planes[i].z);
		frustum_planes_d[i] = _mm_set1_ps(frustum_planes[i].w);
	}

	//we process 4 objects per step
	for (i = 0; i < num_objects; i += 4)
	{
	//load bounding sphere data
		__m128 spheres_pos_x = _mm_load_ps(sphere_data_ptr);
		__m128 spheres_pos_y = _mm_load_ps(sphere_data_ptr + 4);
		__m128 spheres_pos_z = _mm_load_ps(sphere_data_ptr + 8);
		__m128 spheres_radius = _mm_load_ps(sphere_data_ptr + 12);
		sphere_data_ptr += 16;

	//but for our calculations we need transpose data, to collect x, y, z and w coordinates in separate vectors
		_MM_TRANSPOSE4_PS(spheres_pos_x, spheres_pos_y, spheres_pos_z, spheres_radius);
		__m128 spheres_neg_radius = _mm_sub_ps(zero_v, spheres_radius); // negate all elements

		__m128 intersection_res = _mm_setzero_ps();
		for (j = 0; j < 6; j++) //plane index
		{
		//1. calc distance to plane dot(sphere_pos.xyz, plane.xyz) + plane.w
		//2. if distance < sphere radius, then sphere outside frustum
			__m128 dot_x = _mm_mul_ps(spheres_pos_x, frustum_planes_x[j]);
			__m128 dot_y = _mm_mul_ps(spheres_pos_y, frustum_planes_y[j]);
			__m128 dot_z = _mm_mul_ps(spheres_pos_z, frustum_planes_z[j]);

			__m128 sum_xy = _mm_add_ps(dot_x, dot_y);
			__m128 sum_zw = _mm_add_ps(dot_z, frustum_planes_d[j]);
			__m128 distance_to_plane = _mm_add_ps(sum_xy, sum_zw);

			__m128 plane_res = _mm_cmple_ps(distance_to_plane, spheres_neg_radius); //dist < -sphere_r ?
			intersection_res = _mm_or_ps(intersection_res, plane_res); //if yes - sphere behind the plane & outside frustum
		}

		//store result
		__m128i intersection_res_i = _mm_cvtps_epi32(intersection_res);
		_mm_store_si128((__m128i *)&culling_res_sse[i], intersection_res_i);
	}
}

void sse_culling_aabb(AABB *aabb_data, int num_objects, int *culling_res, vec4 *frustum_planes)
{
	float *aabb_data_ptr = reinterpret_cast<float*>(&aabb_data[0]);
	int *culling_res_sse = &culling_res[0];

	//to optimize calculations we gather xyzw elements in separate vectors
	__m128 zero_v = _mm_setzero_ps();
	__m128 frustum_planes_x[6];
	__m128 frustum_planes_y[6];
	__m128 frustum_planes_z[6];
	__m128 frustum_planes_d[6];

	int i, j;
	for (i = 0; i < 6; i++)
	{
		frustum_planes_x[i] = _mm_set1_ps(frustum_planes[i].x);
		frustum_planes_y[i] = _mm_set1_ps(frustum_planes[i].y);
		frustum_planes_z[i] = _mm_set1_ps(frustum_planes[i].z);
		frustum_planes_d[i] = _mm_set1_ps(frustum_planes[i].w);
	}

	__m128 zero = _mm_setzero_ps();
	//we process 4 objects per step
	for (i = 0; i < num_objects; i += 4)
	{
	//load objects data
	//load aabb min
		__m128 aabb_min_x = _mm_load_ps(aabb_data_ptr);
		__m128 aabb_min_y = _mm_load_ps(aabb_data_ptr + 8);
		__m128 aabb_min_z = _mm_load_ps(aabb_data_ptr + 16);
		__m128 aabb_min_w = _mm_load_ps(aabb_data_ptr + 24);

	//load aabb max
		__m128 aabb_max_x = _mm_load_ps(aabb_data_ptr + 4);
		__m128 aabb_max_y = _mm_load_ps(aabb_data_ptr + 12);
		__m128 aabb_max_z = _mm_load_ps(aabb_data_ptr + 20);
		__m128 aabb_max_w = _mm_load_ps(aabb_data_ptr + 28);
		aabb_data_ptr += 32;

	//for now we have points in vectors aabb_min_x..w, but for calculations we need to xxxx yyyy zzzz vectors representation - just transpose data
		_MM_TRANSPOSE4_PS(aabb_min_x, aabb_min_y, aabb_min_z, aabb_min_w);
		_MM_TRANSPOSE4_PS(aabb_max_x, aabb_max_y, aabb_max_z, aabb_max_w);

		__m128 intersection_res = _mm_setzero_ps();
		for (j = 0; j < 6; j++) //plane index
		{
		//this code is similar to what we make in simple culling
		//pick closest point to plane and check if it begind the plane. if yes - object outside frustum
		//dot product, separate for each coordinate, for min & max aabb points
			__m128 aabbMin_frustumPlane_x = _mm_mul_ps(aabb_min_x, frustum_planes_x[j]);
			__m128 aabbMin_frustumPlane_y = _mm_mul_ps(aabb_min_y, frustum_planes_y[j]);
			__m128 aabbMin_frustumPlane_z = _mm_mul_ps(aabb_min_z, frustum_planes_z[j]);

			__m128 aabbMax_frustumPlane_x = _mm_mul_ps(aabb_max_x, frustum_planes_x[j]);
			__m128 aabbMax_frustumPlane_y = _mm_mul_ps(aabb_max_y, frustum_planes_y[j]);
			__m128 aabbMax_frustumPlane_z = _mm_mul_ps(aabb_max_z, frustum_planes_z[j]);

		//we have 8 box points, but we need pick closest point to plane. Just take max
			__m128 res_x = _mm_max_ps(aabbMin_frustumPlane_x, aabbMax_frustumPlane_x);
			__m128 res_y = _mm_max_ps(aabbMin_frustumPlane_y, aabbMax_frustumPlane_y);
			__m128 res_z = _mm_max_ps(aabbMin_frustumPlane_z, aabbMax_frustumPlane_z);

		//dist to plane = dot(aabb_point.xyz, plane.xyz) + plane.w
			__m128 sum_xy = _mm_add_ps(res_x, res_y);
			__m128 sum_zw = _mm_add_ps(res_z, frustum_planes_d[j]);
			__m128 distance_to_plane = _mm_add_ps(sum_xy, sum_zw);

			__m128 plane_res = _mm_cmple_ps(distance_to_plane, zero); //dist from closest point to plane < 0 ?
			intersection_res = _mm_or_ps(intersection_res, plane_res); //if yes - aabb behind the plane & outside frustum
		}

		//store result
		__m128i intersection_res_i = _mm_cvtps_epi32(intersection_res);
		_mm_store_si128((__m128i *)&culling_res_sse[i], intersection_res_i);
	}
}

OBB culling is a bit harder. We perform calculations on one object at once. But make calculations for three xyz axes simultaneously. It is not optimal but it reflects basic idea of algorithm. Besides, vector math (matrix multiplications and point transformations) with SSE perform faster.

void sse_culling_obb(int firs_processing_object, int num_objects, int *culling_res, mat4 &cam_modelview_proj_mat)
{
	mat4_sse sse_camera_mat(cam_modelview_proj_mat);
	mat4_sse sse_clip_space_mat;

	//box points in local space
	__m128 obb_points_sse[8];
	obb_points_sse[0] = _mm_set_ps(1.f, box_min[2], box_max[1], box_min[0]);
	obb_points_sse[1] = _mm_set_ps(1.f, box_max[2], box_max[1], box_min[0]);
	obb_points_sse[2] = _mm_set_ps(1.f, box_max[2], box_max[1], box_max[0]);
	obb_points_sse[3] = _mm_set_ps(1.f, box_min[2], box_max[1], box_max[0]);
	obb_points_sse[4] = _mm_set_ps(1.f, box_min[2], box_min[1], box_max[0]);
	obb_points_sse[5] = _mm_set_ps(1.f, box_max[2], box_min[1], box_max[0]);
	obb_points_sse[6] = _mm_set_ps(1.f, box_max[2], box_min[1], box_min[0]);
	obb_points_sse[7] = _mm_set_ps(1.f, box_min[2], box_min[1], box_min[0]);

	ALIGN_SSE int obj_culling_res[4];
	__m128 zero_v = _mm_setzero_ps();

	int i, j;
	//process one object per step
	for (i = firs_processing_object; i < firs_processing_object+num_objects; i++)
	{
	//clip space matrix = camera_view_proj * obj_mat
		sse_mat4_mul(sse_clip_space_mat, sse_camera_mat, sse_obj_mat[i]);
		__m128 outside_positive_plane = _mm_set1_ps(0xffffffff);
		__m128 outside_negative_plane = _mm_set1_ps(0xffffffff);

	//for all 8 box points
		for (j = 0; j < 8; j++)
		{
		//transform point to clip space
			__m128 obb_transformed_point = sse_mat4_mul_vec4(sse_clip_space_mat, obb_points_sse[j]);

		//gather w & -w
			__m128 wwww = _mm_shuffle_ps(obb_transformed_point, obb_transformed_point, _MM_SHUFFLE(3, 3, 3, 3)); //get w
			__m128 wwww_neg = _mm_sub_ps(zero_v, wwww); // negate all elements

		//box_point.xyz > box_point.w || box_point.xyz < -box_point.w ?
		//similar to point normalization: point.xyz /= point.w; And compare: point.xyz > 1 && point.xyz < -1
			__m128 outside_pos_plane = _mm_cmpge_ps(obb_transformed_point, wwww);
			__m128 outside_neg_plane = _mm_cmple_ps(obb_transformed_point, wwww_neg);

		//if at least 1 of 8 points in front of the plane - we get 0 in outside_* flag
			outside_positive_plane = _mm_and_ps(outside_positive_plane, outside_pos_plane);
			outside_negative_plane = _mm_and_ps(outside_negative_plane, outside_neg_plane);
		}

		//all 8 points xyz < -1 or > 1 ?
		__m128 outside = _mm_or_ps(outside_positive_plane, outside_negative_plane);

		//store result
		__m128i outside_res_i = _mm_cvtps_epi32(outside);
		_mm_store_si128((__m128i *)&obj_culling_res[0], outside_res_i);

		//for now we have separate result separately for each axis
		//combine results. If outside any plane, then objects is outside frustum
		culling_res[i] = (obj_culling_res[0] != 0 || obj_culling_res[1] != 0 ||  obj_culling_res[2] != 0) ? 1 : 0;
	}
}

Table 2. SSE culling result of 100k objects. Intel Core I5. Single Thread. SSE.

SSE Culling	Sphere	AABB	OBB
Just culling	0,26	0,46	3,48
Whole frame	1,2	1,43	4,6

SSE implementation in average 3 times faster than simple one in C++.

Multithreading

Nowadays processors has several cores. Calculations might be performed simultaneously on all cores.

Architecture on new games should be planned taking into account multithreading, i.e. split work on independent parts/tasks and solve them simultaneously, loading evenly all the processor cores. The design should be flexible. Too large amount of small tasks leads to overhead of synchronizing work and switching between tasks. Too small abound of big tasks leads to uneavenly loading of cores. Need a balance. In current games there might be from several hundreds to thousand tasks per frame.

In our case of frustum culling each object is independent from the rest. Thats why we easily could split work into equal groups and cull them simultaneously with different cores of processor. After running jobs execution we need to wait threads to do their job and gather results.
Off course we should not ask results right after execution start.

Worker::Worker() : first_processing_oject(0), num_processing_ojects(0)
{
	//create 2 events: 1. to signal that we have a job 2.signal that we finished job
	has_jobs_event = CreateEvent(NULL, false, false, NULL);
	jobs_finished_event = CreateEvent(NULL, false, true, NULL);
}

void Worker::doJob()
{
	//make our part of work
	cull_objects(first_processing_oject, num_processing_ojects);
}

unsigned __stdcall thread_func(void* arguments)
{
	printf("In thread...\n");
	Worker *worker = static_cast<worker*>(arguments);

	//each worker has endless loop untill we signal to quit (stop_work flag)
	while (true)
	{
	//wait for starting jobs
	//if we have no job - just wait (has_jobs_event event). We do not wasting cpu work. Events designed for this.
		WaitForSingleObject(worker->has_jobs_event, INFINITE);

	//if we have signal to break - exit endless loop
		if (worker->stop_work)
			break;

	//do job
		worker->doJob();

	//signal that we finished the job
		SetEvent(worker->jobs_finished_event);
	}
	_endthreadex(0);
	return 0;
}

void create_threads()
{
	//create the threads
	//split the work into parts between threads
	int worker_num_processing_ojects = MAX_SCENE_OBJECTS / num_workers;
	int first_processing_oject = 0;

	int i;
	for (i = 0; i < num_workers; i++)
	{
	//create threads
		workers[i].thread_handle = (HANDLE)_beginthreadex(NULL, 0, &thread_func, &workers[i], CREATE_SUSPENDED, &workers[i].thread_id);
		thread_handles[i] = workers[i].thread_handle;

	//set threads parameters
		workers[i].first_processing_oject = first_processing_oject;
		workers[i].num_processing_ojects = worker_num_processing_ojects;
		first_processing_oject += worker_num_processing_ojects;
	}
	//run workers to do their jobs
	for (int i = 0; i < num_workers; i++)
		ResumeThread(workers[i].thread_handle);
}

void process_multithreading_culling()
{
	//signal workers that they have the job
	for (int i = 0; i < num_workers; i++)
		SetEvent(workers[i].has_jobs_event);
}

void wate_multithreading_culling_done()
{
	//wait threads to do their jobs
	HANDLE wait_events[num_workers];
	for (int i = 0; i < num_workers; i++)
		wait_events[i] = workers[i].jobs_finished_event;
	WaitForMultipleObjects(num_workers, &wait_events[0], true, INFINITE);
}

Table 3. Culling results of 100k objects. Intel Core I5 (4 cores). In brackets – speedup relatively to simple c++ implementation.

Method	Sphere	AABB	OBB
Simple c++	0,92 (1)	1,42 (1)	9,14 (1)
SSE	0,26 (3,54)	0,46 (3,08)	3,48 (2,62)
Simple c++, Mulithreaded	0,25 (3,68)	0,4 (3,55)	2,5 (3,65)
SSE, Multithreaded	0,1 (9,2)	0,18 (7,89)	1 (9,14)

Multithreaded version faster than single threaded in 3,6 times in average.
Using SSE gieves us 3 times speedup, relatively to simple c++ implementation.
Both using SSE and Multithreading gives us 8,7 times speedup!
I.e. we optimize our calculations by almost 9 times, depending on used culling primitive type.

GPU culling

GPU designed to perform the same operation on huge amount of data. GPU has a lot more parallel threads (thousands) than in CPU (2-8 in most desktop cases). But culling on gpu not allwas comfortably:

this assumes special graphics engine architecture
there is unpleasant moment that we evaluate dip execution on cpu side. For this we need to know the amout of generated primitives by GPU (visible in frustum objects in our case). Thats why we need to ask feedback from GPU. There are special commands for this purpose.
The problem is if we want get result in the same frame with culling and rendering we get GPU-stall, because we need to wait the result. This is bad for perfomance. If read result from previous frame – we get bugs. Full solution to this problem is using DrawIndirect commands and preparing information about dip on GPU side. This is available since DirectX11 and Opengl 4.

Implementation on gpu culling consist from next steps:

Pack all instances data in vertex buffer. Assume that one vertex is one object for culling. Amount of atributes for vertex equal to amount of data per one object.
Enable transform feedback. Send prepared vertex buffer on render. All results redirect to another vertex buffer with visible instances data.
In vertex shader check visibility on the object
In geometry shader discard object / kill the vertex if instance is not visible in frustum.
Thus, we formed buffer with just visible instances data.
But now we need to get information amout amount of visible objects from GPU to make the dip on CPU side. In this case we do this with transform feedback from previous frame (just for code simplicity).

void do_gpu_culling()
{
	culling_shader.bind();

	int cur_frame = frame_index % 2;
	int prev_frame = (frame_index + 1) % 2;

	//enable transform feedback & query
	glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, dips_texture_buffer);
	glBeginQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN,
	num_visible_instances_query[cur_frame]);
	glBeginTransformFeedback(GL_POINTS);

	//render cloud of points which we interprent as objects data
	glBindVertexArray(all_instances_data_vao);
	glDrawArrays(GL_POINTS, 0, MAX_SCENE_OBJECTS);
	glBindVertexArray(0);

	//disable all
	glEndTransformFeedback();
	glEndQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN);
	glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 0);
	glDisable(GL_RASTERIZER_DISCARD);

	//get feedback from prev frame
	num_visible_instances = 0;
	glGetQueryObjectiv(num_visible_instances_query[prev_frame], GL_QUERY_RESULT,      &num_visible_instances);

	//next frame
	frame_index++;
}

Vertex shader from 3rd step:

#version 330 core
in vec4 s_attribute_0;
in vec4 s_attribute_1;

out vec4 instance_data1;
out vec4 instance_data2;
out int visible;

uniform mat4 ModelViewProjectionMatrix;
uniform vec4 frustum_planes[6];

int InstanceCloudReduction()
{
	//sphere - frustum test
	bool inside = true;
	for (int i = 0; i < 6; i++)
	{
		if (dot(frustum_planes[i].xyz, s_attribute_0.xyz) + frustum_planes[i].w <= -s_attribute_0.w)
			inside = false;
	}
	return inside ? 1 : 0;
}

void main()
{
//read instance data
	instance_data1 = s_attribute_0;
	instance_data2 = s_attribute_1;

//visibility
	visible = InstanceCloudReduction();
	gl_Position = ModelViewProjectionMatrix * vec4(s_attribute_0.xyz,1);
}

Geometry shader from 4th step:

#version 400 core
layout (points) in;
layout (points, max_vertices = 1) out;

in vec4 instance_data1[];
in vec4 instance_data2[];
in int visible[];

out vec4 output_instance_data1;
out vec4 output_instance_data2;

void main( )
{
	//if primitive is not visible - discard it !
	//visible comes from vertex shader
	if (visible[0] == 1)
	{
	//just transfer data
		output_instance_data1 = instance_data1[0];
		output_instance_data2 = instance_data2[0];
		gl_Position = vec4(0.0, 0.0, 0.0, 1.0);
		EmitVertex();
		EndPrimitive();
	}
}

Whole frame time with GPU culling is about 1,19 ms. Allmost the same as with the fastest CPU variant, SSE multithreaded sphere-frustum culling.

Perfomance comparision of all methods

We compared culling speed CPU methods, but mention only culling time. For now we are going to measure whole frame time, so measurements will differs a bit as we need take into account data transferring and some other work.

Table 4: Perfomance comparition of all culling methods. Whole Frame time in ms.

Method	Sphere	AABB	OBB
Simple c++	1,94	2,46	10,3
SSE	1,2	1,45	4,63
Simple c++, multithreading	1,23	1,42	3,6
SSE, multithreading	1,18	1,2	2,02
GPU	1,19	-	-

There might be some inaccuracy in measurements (in 2nd sigh after coma). Even averaged time always differs from frame to frame.

Conclusion

Using both SSE and multithreading gives us 9 times perfomance win in comparition with simple c++ implementation. If one can use Opengl 4 (map buffers with GL_MAP_PERSISTENT_BIT and GL_MAP_COHERENT_BIT flags), then data transfering to gpu becomes really fast. Plus we able to optimise culling using hierarchial structures and different trics like remembering last frustum plane which culled the object.

GPU culling works as fast as the most optimized CPU method. And has couple advantages:

no need to transfer visible instances data to GPU. They are already there. If using DrawIndirect (DX11+, OpenGL 4+, new API: DX12, Vulkan, Metal) and form information about dip on CPU side, then we can get even better performance.
Relatively easily able to make additional culling taking into account self blocking/hiding/inner visibility (Hierarchical-Z map based occlusion culling)

So – what to use CPU or GPU culling?
Depends on several things:

project and amount of changes to make GPU culling
Do we able to use DrawIndirect
what is bottle neck: CPU or GPU?
Do we want make additional culling according to depth complexity of the scene. On GPU side it will be much easier and faster.

If there is such opportunity – one should use GPU culling.

Source code of all examples.

Links:

Gerlits Anatoliy.
February, 2017 year.

↧

7 Effective Tips To Create an Engaging Game Level Design

November 28, 2016, 10:20 pm

≫ Next: Opengl API overhead

≪ Previous: Frustum Culling

A good game speaks for itself; the design, gameplay, plot, user interface and every aspect of the game would be well-thought and planned to complement each other. There are multiple factors to be regulated to develop an enjoyable and engaging game, and that’s what mobile game development is all about.

Most popular games have a very clear storyline, making it easy for the players to understand what their role is and what they are expected to do. The best games are intuitive and direct the player with their designs. This article is all about creating exciting and engaging game level designs that are sure to be popular.

Below are the Top Engaging Tips For Your Game Level Designing

Storyline

Any game has a story to tell, even the oldest games like Mario had a simple story associated with it. It is important to include a story narrative and present the same in such a manner that the player fully understands it. Once the storyline is clear, it becomes easier to plan the various levels, and the player is able to figure out what to do. An interesting story would have the player immersed in the game in no time.

Engaging Challenges

Challenges are the single most important factor that motivates the player to keep playing. A classic example of this is ‘Flappy Bird’ – a side-scrolling, 2D retro style graphic game that had just one challenge – to get the bird to successfully fly between the pipes that come its way, was a major hit. Challenges immerse the player in the game, testing his/her skills. You can begin with simple tasks that can get complex as the game proceeds. It also helps to include instructions as the game proceeds so that the player has a clear picture of what has to be done. So, when the game starts the player reacts accordingly and solves the various challenges as the game progresses. Another interesting way to keep gamers engaged in a challenging game is to give them the opportunity to ask for clues or help. Often trivia, crime, and mystery games employ this trick to keep the player engaged even as the challenge gets tougher.

Planned Levels

During mobile game development, levels are not about randomly placing buildings, walls, rooms and other objects present in the built-in library. Each and every level has to be planned in such a manner that they complement the story narrative and is intuitive enough for the player to keep him/her hooked. You will have to chalk out how each level will unfold to take your game forward. From a design point of view, it helps to create a point of focus to catch the player’s eye and guide him about the direction, or on what to do next.

Scenery that unifies the game

Varying the scenery improves engagement as it would be a refreshing deviation from the mundane scenes that are typical of your game. However, while varying the scenes, it is essential to maintain rhythm and unity so that the player isn’t confused. This can be achieved by keeping some aspects of the scene common such as color, furniture, architectural aspects, etc. This way the new scene would still feel and look like it is an integral part of the game. Other aspects that unify and give rhythm are repetitive motifs, music beats, textures, patterns, etc.

Perfect balance between player discovery and design influence

It is not fun to always have everything in black and white. An interesting gameplay would be able to lead players without spoon feeding them. Gamers love figuring things out on their own, and the game level design should be such that it leads the player in the right direction. An experienced Mobile game development company would be able to strike a perfect balance between design influence and player discovery.

Player Incentives

These rewards are a major reason that gamers keep coming back to play for. You would have to sufficiently reward the player depending on the various tasks achieved. Small challenges like bumping into coins are also rewards which can be used buy ammunition or any other gear to help the gamer in achieving tougher challenges.

Design Aspects

Besides the above factors, a good game development company would also focus on the various design aspects to ensure that the gamer finds the game interesting and enjoyable. Some of these aspects are balance, scale, proportion and the uniformity in theme.

Mobile game development is a highly competitive market, and you have to be the best the get the most out of your creation. A bad game level can destroy your chances of making it big; even you have the fanciest graphics and detailed models. It’s these simple points that set an experienced game development company apart from the crowd.

Image Credit:- http://www.gamedev.net/uploads/af27cb9d0bb4173bd91ac4a72fea8dd0.jpg

↧

Opengl API overhead

February 13, 2017, 12:37 am

≫ Next: Faking 2D Shadows

≪ Previous: 7 Effective Tips To Create an Engaging Game Level Design

Introduction

In modern projects, to get a nice looking picture the engine will render thousands of different objects: characters, buildings, landscape, nature, effects and other.
Of course, there are several ways to render geometry on the screen. In this article, we consider how to do that effectively, measure and compare the cost of rendering API calls.
Consider cost of API calls:

state changes (frame buffers, vertex buffers, shaders, constants, textures)
different types of geometry instancing and compare their performance
several practical examples of how one should optimize geometry render in projects.

I will cover only the OpenGL API. I will not describe details, parameters and variations of each API call. There are reference books and manuals for this purpose.
Computer configuration for all tests: Intel Core i5-4460 3.2GHz., Radeon R9 380. In all calculations, time is in ms.

States changing

We want to see 'reach' picture on the screen, a lot of unique objects with a lot of details. For this purpose engine takes all visible objects in camera, sets their parameters (vertex buffers, shaders, material parameters, textures, etc.) and send them to render. All these actions performed with special API commands. Let's consider them, make some tests to understand how to organize the rendering process optimally.

Let's measure the cost of different OpenGL calls: dip (draw index primitive), change of shaders, vertex buffers, textures, shader parameters.

Dips

Dip (draw indexed primitive) — command to GPU to render a bunch of geometry, more often triangles. Off course first we need to tell – what geometry we want to show, with what shader, set some options. But dip renders geometry; all other commands just describe parameters of what we want to show. The dip's price usually includes all related state changes – not only one command. Of course, all depends on the amount of state changes.
First, consider the simplest case – cost of one thousand simple dips, without state changes.

void simple_dips()
{
    glBindVertexArray(ws_complex_geometry_vao_id); //what geometry to render
    simple_geometry_shader.bind(); //with what shader</p><p>    //a lot of simple dips
    for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
        glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i+1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES*sizeof(int))); //simple dip
}

Table 1. Test of simple dip's cost (depending on dips count)

2000	1000	500	100
0.4	0.21	0.107	0.0255

Time of whole frame is a bit larger than test time. In average for all tests it is around 0.2 ms. larger.
Here and below in the table numbers indicate just the test time. The cost of API call will be calculated at the end of the article.

Frame buffer change

FBO (frame buffer object) — is an object, which allows rendering image not to the screen, but to another surface, which lately one could use as texture in shaders. Fbo changes not so often as other elements, but at the same time, the change cost is quite expensive for the CPU.

void fbo_change_test()
{
//clear FBO
    glViewport(0, 0, window_width, window_height);
    glClearColor(0.0f / 255.0f, 0.0f / 255.0f, 0.0f / 255.0f, 0.0);
    for (int i = 0; i < NUM_DIFFERENT_FBOS; i++)
    {
        glBindFramebuffer(GL_FRAMEBUFFER, fbo_buffer[i % NUM_DIFFERENT_FBOS]);
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    }</p><p>//prepare dip
    glBindVertexArray(ws_complex_geometry_vao_id);
    simple_geometry_shader.bind();</p><p>//bind FBO, render one object... repeat N times
    for (int i = 0; i < NUM_FBO_CHANGES; i++)
    {
        glBindFramebuffer(GL_FRAMEBUFFER, fbo_buffer[i % NUM_DIFFERENT_FBOS]); //bind fbo
        glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
    }
    glBindFramebuffer(GL_FRAMEBUFFER, 0); //set rendering to the 'screen'
}

Table 2. Test fbo changes. (top-amount, time in ms.).

400	200	100	25
2.72	1.42	0.73	0.257

One needs to change FBO usually for post effects and different passes, like: reflections, rendering into cubemap, creating virtual textures, etc. Many things like virtual textures could be organized as atlases, to set FBO only once and change for example just viewport. Render in cubemap might be replaced on another technique. For example on dual paraboloid rendering. The matter of course, not only in FBO changes, but in the number of passes of scene rendering, material changes, etc. In general, the less state changes the better.

Shader changes

Shaders usually describe one of the scene's materials or effect techniques. The more materials, kinds of surfaces the more shaders. Several materials might vary slightly. These should be combined into one and switching between them make as condition in the shader, The number of materials directly influence on dips amount.

void shaders_change_test()
{
    glBindVertexArray(ws_complex_geometry_vao_id);</p><p>    for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
    {
        simple_color_shader[i%NUM_DIFFERENT_SIMPLE_SHADERS].bind(); //bind certain shader
        glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
    }
}

Table 3. Shader change test timing. (top-amount of shader changes, time in ms.)

2000	1000	500	100
5.16	2.6	1.28	0.257

Changing shader here also includes transferring world-view-proj matrix as a parameter. Otherwise we could not render anything. Cost of parameters changing we measure in next step.

Shader parameters changing

Often materials make universal with a lot of options to get different kinds of materials. An easy way to make a variety of pictures, each character/object unique.
We need somehow transfer to shader these parameters. This could be done with API commands glUniform*.

uniforms_changes_test_shader.bind();
glBindVertexArray(ws_complex_geometry_vao_id);</p><p>for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
{
    //set uniforms for this dip
    for (int j = 0; j < NUM_UNIFORM_CHANGES_PER_DIP; j++)
        glUniform4fv(ColorShader_uniformLocation[j], 1, &randomColors[(i*NUM_UNIFORM_CHANGES_PER_DIP + j) % MAX_RANDOM_COLORS].x);</p><p>    glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
}

It is not optimal to set parameters individually for each instance/object. Usually all instance data might be packed into 1 large buffer and transferred to gpu with one command. It only remains for each object to set a shift – where it's data placed.

//copy data to ssbo bufer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, instances_uniforms_ssbo);
float *gpu_data = (float*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, CURRENT_NUM_INSTANCES * NUM_UNIFORM_CHANGES_PER_DIP * sizeof(vec4), GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT);
memcpy(gpu_data, &all_instances_uniform_data[0], CURRENT_NUM_INSTANCES * NUM_UNIFORM_CHANGES_PER_DIP * sizeof(vec4)); //copy instances data
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);</p><p>//bind for shader to 0 'point' (shader will read data from this 'link point')
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, instances_uniforms_ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);</p><p>//render
uniforms_changes_ssbo_shader.bind();
glBindVertexArray(ws_complex_geometry_vao_id);
static int uniformsInstancing_data_varLocation = glGetUniformLocation(uniforms_changes_ssbo_shader.programm_id, "instance_data_location");</p><p>for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
{
    //set parameter to sahder - where object's data located
    glUniform1i(uniformsInstancing_data_varLocation, i*NUM_UNIFORM_CHANGES_PER_DIP);
    glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
}

Table 4. Tests to change shader parameters (top – amount of dips, time in ms.)

Test type	2000	1000	500	100
UNIFORMS_SIMPLE_CHANGE_TEST	2.25	1.1	0.54	0.1145
UNIFORMS_SSBO_TEST	1.3	0.628	0.32	0.0725

Using glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY); causes CPU and GPU synchronization which should be avoided. One should use glMapBufferRange with flag GL_MAP_UNSYNCHRONIZED_BIT, to prevent synchronization. But programmer should guaranty that overwriting data arren't using by GPU right now. Otherwise we get bugs as we rewriting data which are reading by GPU now. To completely resolve this problem, use triple buffering. When we use current buffer for writing data, the rest 2 uses GPU. Plus there is more optimal mapping buffer method with flags GL_MAP_PERSISTENT_BIT and GL_MAP_COHERENT_BIT.

Changing vertex buffers

There are a lot of objects with different geometries in the scene. This geometry usually placed in different vertex buffers. To render another object with different geometry, even with the same material we need to change vertex buffer. There are techniques which allow effectively render different geometry with same material with only one dip: MultiDrawIndirect, Dynamic vertex pulling. Such geometry should be placed in one buffer.

void vbo_change_test()
{
    simple_geometry_shader.bind();</p><p>    for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
    {
        glBindVertexArray(separate_geometry_vao_id[i % NUM_SIMPLE_VERTEX_BUFFERS]); //change vbo
        glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
    }
}

Table 5. VBO change test performance (top-amount, time in ms.)

2048	1024	512	128
1.6	0.785	0.396	0.086

Textures changes

Textures give surfaces a detailed view. You can get a very large variety in the picture simply by changing the textures, blending different textures in the shader. Textures have to be changed frequently, but you can put them in the so-called texture array, to bind it only once for lots of dips and access to textures through an index in the shader. Same geometry with different textures might be rendered using instancing.

void textures_change_test()
{
    glBindVertexArray(ws_complex_geometry_vao_id);
    int counter = 0;</p><p>    //switch between tests
    if (test_type == ARRAY_OF_TEXTURES_TEST)
    {
        array_of_textures_shader.bind();</p><p>        for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
        {
            //bind textures for this dip
            for (int j = 0; j < NUM_TEXTURES_IN_COMPLEX_MATERIAL; j++)
            {
                glActiveTexture(GL_TEXTURE0 + j);
                glBindTexture(GL_TEXTURE_2D, array_of_textures[counter % TEX_ARRAY_SIZE]);
                glBindSampler(j, Sampler_linear);
                counter++;
            }
            glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
        }
    }
    else
    if (test_type == TEXTURES_ARRAY_TEST)
    {
        //bind texture aray for all dips
        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_2D_ARRAY, texture_array_id);
        glBindSampler(0, Sampler_linear);</p><p>        //variable to tell shader - what textures uses this dip
        static int textureArray_usedTex_varLocation = glGetUniformLocation(textureArray_shader.programm_id, "used_textures_i");
        textureArray_shader.bind();</p><p>        float used_textures_i[6];
        for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
        {
            //fill data - what textures uses this dip
            for (int j = 0; j < 6; j++)
            {
                used_textures_i[j] = counter % TEX_ARRAY_SIZE;
                counter++;
            }
            glUniform1fv(textureArray_usedTex_varLocation, 6, &used_textures_i[0]); //transfer to shader, tell what textures this material uses
            glDrawRangeElements(GL_TRIANGLES, i * BOX_NUM_VERTS, (i + 1) * BOX_NUM_VERTS, BOX_NUM_INDICES, GL_UNSIGNED_INT, (GLvoid*)(i*BOX_NUM_INDICES * sizeof(int))); //simple dip
        }
    }
}

Table 6. Textures change test performance (top-amount of dips, time in ms.)

Test type	2048	1024	512	128
ARRAY_OF_TEXTURES_TEST	6.2	3.12	1.577	0.315
TEXTURES_ARRAY_TEST	1.42	0.7	0.35	0.08

Comparative estimation of state changes

Below is a table with the execution cost/time of all performed tests.

Table 7. State changes tests time (top-amount of dips, time in ms.)

Test type	2048	1024	512	128
SIMPLE_DIPS_TEST	0.4	0.21	0.107	0.0255
FBO_CHANGE_TEST	2.72	1.42	0.73	0.257
SHADERS_CHANGE_TEST	5.16	2.6	1.28	0.257
UNIFORMS_SIMPLE_CHANGE_TEST	2.25	1.1	0.54	0.1145
UNIFORMS_SSBO_CHANGE_TEST	1.3	0.628	0.32	0.0725
VBO_CHANGE_TEST	1.6	0.785	0.396	0.086
ARRAY_OF_TEXTURES_TEST	6.2	3.12	1.577	0.315
TEXTURES_ARRAY_TEST	1.42	0.7	0.35	0.08

Using this results we are able to calculate API call cost. Absolute cost per 1000 API calls. Relative cost calculate in relation to the simple dip call (glDrawRangeElements).

Table 8. API call cost. Intel Core i5-4460 3.2GHz. Time in ms. per 1k calls.

API call	Absolute cost	Relative cost %
glBindFramebuffer	7.1	3550%
glUseProgram	2.04	1020%
glBindVertexArray	0.765	382%
glBindTexture	0.584	292%
glDrawRangeElements	0.2	100%
glUniform4fv	0.09	45%

Of course, one should be very cautious to measurements as they will change depending on the version of the driver and hardware.

Instancing

Instancing invented to quickly render the same geometry with different parameters. Each object has a unique index according to which we can take desired for this object parameters in she shader, vary some options, etc. Main advantage of using instancing – we can greatly reduce the number of dips.

We can pack all instances parameters in the buffer, transfer them to GPU and make just one dip. Storing data in the buffer is a good optimization itself – we saving on what it is not necessary to constantly change the shader parameters. Also, if instance data do not change (for example we exactly know that it is static geometry), we don't need to transfer data to GPU every frame, actually just once at program/level start. In general, for optimal rendering we should first to pack all instances data to one buffer, transfer them to GPU with one command. For each dip, type og geometry – just set the offset where to find instances data for this dip. Using instance index (gl_InstanceID in OpenGL) we able to sample certain data for this instance/object.

There are a lot of ways to store data in OpenGL: vertex buffer (VBO), uniform buffer (UBO), texture buffer (TBO), shader storage buffer (SSBO), textures. There are various features for each buffer type. Consider that.

Texture instancing

All data stored in the texture. To effectively change data in texture one should use special structures - Pixel Buffer Object (PBO) which allow transferring data asynchronously from CPU to GPU. CPU does not wait until the data will be transferred and continues to work.

Creation code:

glGenBuffers(2, textureInstancingPBO);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, textureInstancingPBO[0]);
//GL_STREAM_DRAW_ARB means that we will change data every frame
glBufferData(GL_PIXEL_UNPACK_BUFFER, INSTANCES_DATA_SIZE, 0, GL_STREAM_DRAW_ARB);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, textureInstancingPBO[1]);
glBufferData(GL_PIXEL_UNPACK_BUFFER, INSTANCES_DATA_SIZE, 0, GL_STREAM_DRAW_ARB);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);</p><p>//create texture where we will store instances data on gpu
glGenTextures(1, &textureInstancingDataTex);
glBindTexture(GL_TEXTURE_2D, textureInstancingDataTex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_R, GL_REPEAT);
//in each line we store NUM_INSTANCES_PER_LINE object's data. 128 in our case
//for each object we store PER_INSTANCE_DATA_VECTORS data-vectors. 2 in our case
//GL_RGBA32F ÃÃÃÃÂ¢ we have float32 data
//complex_mesh_instances_data source data of instances, if we are not going to update data in the texture
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, NUM_INSTANCES_PER_LINE * PER_INSTANCE_DATA_VECTORS, MAX_INSTANCES / NUM_INSTANCES_PER_LINE, 0, GL_RGBA, GL_FLOAT, &complex_mesh_instances_data[0]);
glBindTexture(GL_TEXTURE_2D, 0);

Texture update:

glBindTexture(GL_TEXTURE_2D, textureInstancingDataTex);
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER, textureInstancingPBO[current_frame_index]);</p><p>// copy pixels from PBO to texture object
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, NUM_INSTANCES_PER_LINE * PER_INSTANCE_DATA_VECTORS, MAX_INSTANCES / NUM_INSTANCES_PER_LINE, GL_RGBA, GL_FLOAT, 0);</p><p>// bind PBO to update pixel values
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER, textureInstancingPBO[next_frame_index]);</p><p>//http://www.songho.ca/opengl/gl_pbo.html
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// until GPU to finish its job. To avoid waiting (idle), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
glBufferData(GL_PIXEL_UNPACK_BUFFER, INSTANCES_DATA_SIZE, 0, GL_STREAM_DRAW_ARB);</p><p>gpu_data = (float*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY_ARB);
if (gpu_data)
{
    memcpy(gpu_data, &complex_mesh_instances_data[0], INSTANCES_DATA_SIZE); // update data
    glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER); //release pointer to mapping buffer
}

Rendering using texture instancing:

//bind texture with instances data
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, textureInstancingDataTex);
glBindSampler(0, Sampler_nearest);</p><p>glBindVertexArray(geometry_vao_id); //what geometry to render
tex_instancing_shader.bind(); //with waht shader</p><p>//tell shader texture with data located, what name it has
static GLint location = glGetUniformLocation(tex_instancing_shader.programm_id, "s_texture_0");
if (location >= 0)
    glUniform1i(location, 0);</p><p>//render group of objects
glDrawElementsInstanced(GL_TRIANGLES, BOX_NUM_INDICES, GL_UNSIGNED_INT, NULL, CURRENT_NUM_INSTANCES);

Vertex shader to access the data:

#version 150 core</p><p>in vec3 s_pos;
in vec3 s_normal;
in vec2 s_uv;</p><p>uniform mat4 ModelViewProjectionMatrix;</p><p>uniform sampler2D s_texture_0;</p><p>out vec2 uv;
out vec3 instance_color;</p><p>void main()
{
    const vec2 texel_size = vec2(1.0 / 256.0, 1.0 / 16.0);
    const int objects_per_row = 128;
    const vec2 half_texel = vec2(0.5, 0.5);</p><p>    //calc texture coordinates - where our instance data located
    //gl_InstanceID % objects_per_row ÃÃÃÃÂ¢ no of object in the line
    //multiple by 2 as each object has 2 vectors of data
    //gl_InstanceID / objects_per_row ÃÃÃÃÂ¢ in what line our data located
    //multiple by texel_size gieves us 0..1 uv to sample from texture from interer texel id
    vec2 texel_uv = (vec2((gl_InstanceID % objects_per_row) * 2, floor(gl_InstanceID / objects_per_row)) + half_texel) * texel_size;

    vec4 instance_pos = textureLod(s_texture_0, texel_uv, 0);
    instance_color = textureLod(s_texture_0, texel_uv + vec2(texel_size.x, 0.0), 0).xyz;</p><p>    uv = s_uv;

    gl_Position = ModelViewProjectionMatrix * vec4(s_pos + instance_pos.xyz, 1.0);
}

Instancing through vertex buffer

Idea is to keep instance data in separate vertex buffer and have an axes to them in shader through vertex attributes.
Code of buffer creation with data itself is trivial. Our main task is to modify information about vertex for shader (vertex declaration, vdecl)

//...code of base vertex declaration creation
//special atributes binding
glBindBuffer(GL_ARRAY_BUFFER, all_instances_data_vbo);
//size of per instance data (PER_INSTANCE_DATA_VECTORS = 2 - so we have to create 2 additional attributes to transfer data)
const int per_instance_data_size = sizeof(vec4) * PER_INSTANCE_DATA_VECTORS;
glEnableVertexAttribArray(4);
//4th vertex attribute, has 4 floats, 0 data offset
glVertexAttribPointer((GLuint)4, 4, GL_FLOAT, GL_FALSE, per_instance_data_size, (GLvoid*)(0));
//tell that we will change this attribute per instance, not per vertex
glVertexAttribDivisor(4, 1);</p><p>glEnableVertexAttribArray(5);
//5th vertex attribute, has 4 floats, sizeof(vec4) data offset
glVertexAttribPointer((GLuint)5, 4, GL_FLOAT, GL_FALSE, per_instance_data_size, (GLvoid*)(sizeof(vec4)));
//tell that we will change this attribute per instance, not per vertex
glVertexAttribDivisor(5, 1);

Rendering code:

vbo_instancing_shader.bind();
//our vertex buffer wit modified vertex declaration (vdecl)
glBindVertexArray(geometry_vao_vbo_instancing_id);
glDrawElementsInstanced(GL_TRIANGLES, BOX_NUM_INDICES, GL_UNSIGNED_INT, NULL, CURRENT_NUM_INSTANCES);

Vertex shader to access data:
#version 150 core</p><p>in vec3 s_pos;
in vec3 s_normal;
in vec2 s_uv;
in vec4 s_attribute_3; //some_data;</p><p>in vec4 s_attribute_4; //instance pos
in vec4 s_attribute_5; //instance color</p><p>uniform mat4 ModelViewProjectionMatrix;</p><p>out vec3 instance_color;</p><p>void main()
{
    instance_color = s_attribute_5.xyz;
    gl_Position = ModelViewProjectionMatrix * vec4(s_pos + s_attribute_4.xyz, 1.0);
}

Uniform buffer instancing, Texture buffer instancing, Shader Storage buffer instancing

three methods are very similar to each other, they differ mostly by buffer type.
Uniform buffer (UBO) characterized by small size, but it should theoretically be faster than the others.
Texture buffer (TBO) has very big size. We able to store all scene instances data into it, skeletal transformations.
Shader Storage Buffer (SSBO) has both properties - fast with a large size. Also, we can write data to it. The only thing – it is new extension, and the old hardware does not support it.

Uniform buffer

Creation code:

glGenBuffers(1, &dips_uniform_buffer);
glBindBuffer(GL_UNIFORM_BUFFER, dips_uniform_buffer);
glBufferData(GL_UNIFORM_BUFFER, INSTANCES_DATA_SIZE, &complex_mesh_instances_data[0], GL_STATIC_DRAW); //uniform_buffer_data
glBindBuffer(GL_UNIFORM_BUFFER, 0);</p><p>//bind iniform buffer with instances data to shader
ubo_instancing_shader.bind(true);
GLint instanceData_location3 = glGetUniformLocation(ubo_instancing_shader.programm_id, "instance_data"); //link to shader
glUniformBufferEXT(ubo_instancing_shader.programm_id, instanceData_location3, dips_uniform_buffer); //actually binding

Instancing vertex shader with uniform buffer:

#version 150 core
#extension GL_EXT_bindable_uniform : enable
#extension GL_EXT_gpu_shader4 : enable</p><p>in vec3 s_pos;
in vec3 s_normal;
in vec2 s_uv;</p><p>uniform mat4 ModelViewProjectionMatrix;
bindable uniform vec4 instance_data[4096]; //our uniform ÃÃÃÃÃÂ±ÃÃÃÃÃÃÃÂµÃ with instances data</p><p>out vec3 instance_color;</p><p>void main()
{
    vec4 instance_pos = instance_data[gl_InstanceID*2];
    instance_color = instance_data[gl_InstanceID*2+1].xyz;
    gl_Position = ModelViewProjectionMatrix * vec4(s_pos + instance_pos.xyz, 1.0);
}

Texture Buffer

Creation code:

tbo_instancing_shader.bind();</p><p>//bind to shader as special texture
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_BUFFER, dips_texture_buffer_tex);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, dips_texture_buffer);</p><p>glBindVertexArray(geometry_vao_id);
glDrawElementsInstanced(GL_TRIANGLES, BOX_NUM_INDICES, GL_UNSIGNED_INT, NULL, CURRENT_NUM_INSTANCES);

Vertex shader:

#version 150 core
#extension GL_EXT_bindable_uniform : enable
#extension GL_EXT_gpu_shader4 : enable</p><p>in vec3 s_pos;
in vec3 s_normal;
in vec2 s_uv;</p><p>uniform mat4 ModelViewProjectionMatrix;
uniform samplerBuffer s_texture_0; //our TBO texture bufer</p><p>out vec3 instance_color;</p><p>void main()
{
    //sample data from TBO
    vec4 instance_pos = texelFetch(s_texture_0, gl_InstanceID*2);
    instance_color = texelFetch(s_texture_0, gl_InstanceID*2+1).xyz;
    gl_Position = ModelViewProjectionMatrix * vec4(s_pos + instance_pos.xyz, 1.0);
}

SSBO

Creation code:

glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER, INSTANCES_DATA_SIZE, &complex_mesh_instances_data[0], GL_STATIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0); // unbind

Render:

//bind ssbo_instances_data, link to shader at '0 binding point'
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo_instances_data);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssbo_instances_data);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);</p><p>ssbo_instancing_shader.bind();
glBindVertexArray(geometry_vao_id);
glDrawElementsInstanced(GL_TRIANGLES, BOX_NUM_INDICES, GL_UNSIGNED_INT, NULL, CURRENT_NUM_INSTANCES);
glBindVertexArray(0);

Vertex shader:

#version 430
#extension GL_ARB_shader_storage_buffer_object : require</p><p>in vec3 s_pos;
in vec3 s_normal;
in vec2 s_uv;</p><p>uniform mat4 ModelViewProjectionMatrix;</p><p>//ssbo should be binded to 0 binding point
layout(std430, binding = 0) buffer ssboData
{
    vec4 instance_data[4096];
};</p><p>out vec3 instance_color;</p><p>void main()
{
//gl_InstanceID is unique for each instance. So we able to set per instance data
    vec4 instance_pos = instance_data[gl_InstanceID*2];
    instance_color = instance_data[gl_InstanceID*2+1].xyz;
    gl_Position = ModelViewProjectionMatrix * vec4(s_pos + instance_pos.xyz, 1.0);
}

Uniforms instancing

Pretty simple. We have ability to set with special commands (glUniform*) several vectors with data to shader. Maximum amount depends on video card. Get the maximum number possible by calling glGetIntegerv with GL_MAX_VERTEX_UNIFORM_VECTORS parameter. For R9 380 will return 4096. Minimum value is 256.

uniforms_instancing_shader.bind();
glBindVertexArray(geometry_vao_id);</p><p>//variable - where in shader our array of uniforms located. We will write data to this array
static int uniformsInstancing_data_varLocation = glGetUniformLocation(uniforms_instancing_shader.programm_id, "instance_data");</p><p>//instances data might be written with just one call if there are enough vectors.
//Just for clarity, divide into groups, because usually much more there are much more data than available uniforms.
for (int i = 0; i < UNIFORMS_INSTANCING_NUM_GROUPS; i++)
{
    //write data to uniforms
    glUniform4fv(uniformsInstancing_data_varLocation, UNIFORMS_INSTANCING_MAX_CONSTANTS_FOR_INSTANCING, &complex_mesh_instances_data[i*UNIFORMS_INSTANCING_MAX_CONSTANTS_FOR_INSTANCING].x);</p><p>    glDrawElementsInstanced(GL_TRIANGLES, BOX_NUM_INDICES, GL_UNSIGNED_INT, NULL, UNIFORMS_INSTANCING_OBJECTS_PER_DIP);
}

Multi draw indirect

Separately consider a command that allows drawing a huge number of dips for one call. This is a very useful command which allows rendering a group of instances with different geometry, even thousands of different groups with one command. As an input, it receives an array that describes the parameters of dips: the number of indexes, shifting in vertex buffers, amount of instances per group. The restriction is that the entire geometry should be placed in one vertex buffer and rendered with one shader. Additional plus is that we can fill information about dips for MultiDraw command on GPU side, which is very useful for GPU frustum culling for example.

//fill indirect buffer with dips information. Just simple array
for (int i = 0; i < CURRENT_NUM_INSTANCES; i++)
{
    multi_draw_indirect_buffer[i].vertexCount = BOX_NUM_INDICES;
    multi_draw_indirect_buffer[i].instanceCount = 1;
    multi_draw_indirect_buffer[i].firstVertex = i*BOX_NUM_INDICES;
    multi_draw_indirect_buffer[i].baseVertex = 0;
    multi_draw_indirect_buffer[i].baseInstance = 0;
}

glBindVertexArray(ws_complex_geometry_vao_id);
simple_geometry_shader.bind();</p><p>glMultiDrawElementsIndirect(GL_TRIANGLES,
    GL_UNSIGNED_INT,
    (GLvoid*)&multi_draw_indirect_buffer[0], //our information about dips
    CURRENT_NUM_INSTANCES, //number of dips
    0);

glMultiDrawElementsIndirect command performs several glDrawElementsInstancedIndirect in one call. There is an unpleasant feature in the behavior of this command. Each such group (glDrawElementsInstancedIndirect) will have independent gl_InstanceID, i.e. each time it drops to 0 with new Draw*. Which makes difficult to access required per instance data. This problem solves by modifying vertex declaration of each type of objects being sent to the renderer. You can read an article about it Surviving without gl_DrawID.
It is worth noting that glMultiDrawElementsIndirect performed huge number of dips with a single command. You don't need to compare this command with the other types of instancing.

Performance comparison of different types of instancing

Table 8. Instancing tests performance. Amount of instances = 2000 (top-amount of iterations, repetition)

Instancing type	x1	x10	x100
UBO_INSTANCING	0.0067	0.02	0.15
TBO_INSTANCING	0.0245	0.06	0.49
SSBO_INSTANCING	0.009	0.0225	0.17
VBO_INSTANCING	0.01	0.0213	0.155
TEXTURE_INSTANCING	0.018	0.0262	0.183
UNIFORMS_INSTANCING	0.058	0.58	6.03
MULTI_DRAW_INDIRECT	0.136	1.33	13.53

As can be seen UBO faster than TBO. It is the fastest method. TBO instancing allows to store huge amount of information, but it is slow in comparison with UBO. If possible, you should use SSBO storage. It is fast, handy and has a huge size.

Texture instancing is also a good alternative to UBO. Supported by the old hardware, you can store any amount of information. A little uncomfortable to update.

Transfering data each frame through glUniform* obviously is the slowest instancing method.

glMultiDrawElementsIndirect in tests performed 2к, 20к и 200к dips ! But we tested repetition of test. Such amount of dips might be done by just one call. The only thing - with so many dips an array with dips description will be pretty huge (better to use GPU for this).

Recommendations for optimization and conclusions

In this paper we make an analysis of API calls, measured different types of instancing performance. In general, the less state switches, the better. Use the newest features the latest version of the API: textures array, SSBO, Draw Indirect, mapping buffers with GL_MAP_PERSISTENT_BIT and GL_MAP_COHERENT_BIT flags for fast data transferring.
Recommendations:

The less states changes the better. One should group objects by material.
You may wrap state changes (textures, buffers, shaders and other states). Check if state really changed before API call because it is much slower than just flag/index checking.
Unite geometry in one buffer.
Use texture arrays.
Store data in large buffers and textures
Use as little shaders as possible. But too complicated/universal shader with many branches obviously will be a problem. Especially on older video cards, where branching is expensive.
Use instancing
Use Draw Indirect if it is possible and generate information about dips on GPU side.

Some general advices:

It is necessary to calculate bottlenecks and optimize them first.
You need to know what limit performance - CPU or GPU and optimize it.
Don't make work twice. Reuse results of different passes, reuse previous frames result (reprojection techniques, sorting, tracing, anything).
Difficult calculation might be precalculated
The best optimization – not to do the work
Use parallel calculations: split work into parts and do them on parallel threads.

Source code of all examples.
GL_API_overhead.rar 83.59KB 3 downloads

Links:

Anatoliy Gerlits
February 2017

↧

Faking 2D Shadows

February 1, 2017, 12:43 pm

≫ Next: How to Make Effective Official Game Websites Really Fast

≪ Previous: Opengl API overhead

Introduction

The technique presented here runs at 60 fps on a decent android phone (one+3). and more than 30 fps on my old nexus 7 (2012). The main advantage of the technique are great looking soft shadows with minimal development effort.

The code was written for android and renderscript but the concepts are so simple it'll be easy to port to anything else.

Just fake it!

The idea here is that we're not going to get into any complex calculations or mind-bending
projection of our game world, we're just going to fake it. The algorithm isn't "pixel perfect",
it's essentially a hack and it's probably not the fastest technique but the advantages definitely
outweigh the drawbacks for quick and small projects.

Base concept

The concept is to paint every shadow-caster in black several times with increasing scale, Apply some blur and you're done.

By tweaking the bluring and the scaling steps, you get something like the pictures below.

This looks pretty good when the light-source is far away from the shadow-caster but when it's close to it, the shadows don't extend very far. we can extend the shadows further by adding more shadow painting passes but the algorithm becomes costly while if we increase the scaling steps, artifacts become apparent.

Still, this approach has great looking soft shadows and the shadows are actually correct for a
light-source floating a little above the screen. The other advantage of the approach is that it's
independent of the number of shadow-caster objects you have on screen, so unlike in a ray-casting
approach, you don't need any spacial indexing and collision detection work.

An improved approach

The problem with the algorithm above was that we need multiple render passes for longer shadows,
making things really slow, especially on a phone or tablet.

The idea is to paint the shadow-casters once in black and then for each pixels, to advance in steps towards the light until a black pixel is encountered. In the picture below, pixel A will be in the shade while pixel B will be in the light.

The risk is that by using a large step size, we may step over a shadow caster, so we'll need to
adapt our step-size depending on the size of shadow-casting objects.

Optimization

The most important optimization to keep in mind is that the shadow computing pass and the blurring
pass should always be done on a scaled down image.

We can also hardware accelerate the work. The following bit of code is a RenderScript script (Android exclusive) that will run on every pixel on an image where shadow-casters are painted in black, and every other pixel is transparent.

#pragma version(1)
#pragma rs java_package_name(com.olgames.hideandeatbrains.painting)

//these variables are set directly from java code.
float light_x;
float light_y;
float stepSize;

// The input allocation is a copy of the Bitmap, giving us access to all the pixels of the bitmap.
rs_allocation input;

// out contains the currently processed pixel while x and y contain the indices of that pixel.
void root(uchar4 *out, uint32_t x, uint32_t y) {

    // First get the current pixel's value in RGBA.
    float4 pixel = convert_float4(out[0]).rgba;

    // if the current pixel is already drawn to, we can early exit.
    if(pixel.a>0.f) {
        return;
    }

    // We'll increment this variable as we go.
    float currentStep = stepSize;

    // We need to calculate the distance to the light in order not to step past it.
    float dX = x-light_x;
    float dY = y-light_y;
    float dist = sqrt(pow(dX,2)+pow(dY,2));

    // we loop and check for "collision" until either we collide or until we step past the light.
    // if we collide, we set the current pixel to black.
    while(dist>currentStep) {
        float4 in = convert_float4(
        rsGetElementAt_uchar4(
            input,
            floor(x-((dX/dist)*currentStep)+0.5f),
            floor(y-((dY/dist)*currentStep)+0.5f)
        )).rgba;

        if (in.a >0.f) {
            pixel.r=0;
            pixel.g=0;
            pixel.b=0;
            pixel.a = 255;
            out->xyzw = convert_uchar4(pixel);
            break;
        }

        currentStep+= stepSize;
    }
}

We'll still need to apply blurring. RenderScript offers very efficient API for that purpose on android.

Conclusion

I almost gave up on implementing shadows for my androidgame because I simply did not have the time
to implement more complex algorithms. This technique can be implemented in a few hours.
I sincerely hope this will be useful to some of you.

Article Update Log

1 Feb 2017: Initial release

↧

How to Make Effective Official Game Websites Really Fast

February 28, 2017, 11:43 am

≫ Next: How to Double Your Devblog Traffic with Republishing

≪ Previous: Faking 2D Shadows

What should you do for your official game websites? What tools should you use, how should you design them, and where should you start? I can't throw a rock on social media without hitting a group of indie game developers asking these types of questions, and pitching in to help each other out.

There's a hundred ways you could make a website for your game, but I believe I've got the absolute best.

And the only reason I believe that is because I've spent most of my career building official game websites, as well as coaching indie game developers on how to optimize their websites to sell more games.

This is my signature blueprint for how to make the most effective official game websites as fast as possible. And bonus fry, it's cheaper too.

Official Game Websites Are Kind of a Pain in the Ass

Your game needs a website. That, we know. And getting one is relatively easy, right? You design and develop games, surely you can throw up a website. And since you're just getting started, there's no way you're going to spend thousands of dollars to hire someone else to do it.

And if your games aren't generating revenue, I don't think you should either.

But what starts to happen after you throw up a website is you start diving into customizations, tweaks, and trying a bunch of things. An hour turns into a day, a day into a few days, a few days into a few weeks. Creating your own website starts to become a bit of a pain in the ass. Still, it's not enough to make you pay someone else to do it, but enough to send you to game developer forums asking other people what they think.

So technically, you can make a website. What slows you down, what turns building one into kind of a disaster, is not knowing exactly what you should do, if what you've done is good, or if it's even going to work.

And let's face it, time is money. Every hour you spend on your website is another hour you're not spending to finish your game.

Official Game Websites Should Make You Money, Not Just Make You Look Good

I was helping a small indie game studio head optimize their marketing. They had launched a handful of games in the past year, each with its own website. When I asked what those websites cost and how well they performed, they clammed up. And when pressed, they finally admitted they felt websites were worthless.

But the one thing they did stress over and over, was that every time they made a game website it didn't "look like every other game website out there." And that these types of custom designed websites sometimes took weeks to complete.

So that's where all their time and money went, on making sure their websites looked cool, because they genuinely believed that effort would help them sell more games.

Unfortunately, they were wrong.

The truth is official game websites are just marketing pieces in a bigger marketing puzzle. They should be pointed towards a certain set of goals, and work in tandem with other pieces of game marketing, such as social media or public relations, to accomplish those goals.

But don't get me wrong, I fancy myself a beautifully designed game website. After all, I did help design the official websites, microsites, and experiences for many blockbuster films and AAA games.

However, even I admit looks alone don't sell games.

The "One and Done" Website Strategy

So your games need websites, but you can't afford to waste time and money on what doesn't work. You need an effective game website that you can stand up relatively fast, and with a relatively low cost of ownership. Furthermore, it's to your benefit to create a system out of what you do, so that building game websites becomes easier and faster, which is a long-term cost-savings.

Here's my signature system for building game websites that way:

1. Always Start with a Goal

Before building a game website, it's important you ask yourself what's the one thing, above all else, that you want potential players to do.

That's your game website's primary goal.

If you've ever built a game website without first asking yourself that question, you probably just copied what other game developers were doing. And while that's not necessarily a bad thing, their goals might be based on undisclosed business initiatives that don't really make sense for you.

A good rule of thumb for figuring out what your primary goal is when you're just starting out, is whether or not you've released your game. If you haven't released your game, then your goal should probably be to build up your audience before launch. And if you've already released your game, then your primary goal is simply getting traffic to your marketplace pages.

As your game grows with DLC, sequels, or spinoffs, that rule goes out the window. But it's the right place to start for now.

2. Give Your Game Websites Their Own Domain Names

The overarching goal of a game website is to sell your game to players. And the overarching goal of a game development studio website is to sell your company to business partners, employees, and the media.

That's why I never recommend using your company website as your game website, because it muddies the waters.

When you're just starting out, potential players don't care who you are. And so they're far more likely to follow your games. In fact, even when you're a popular developer, most players still don't care.

Infinity Ward and Treyarch are the two studios who develop games in Activision's Call of Duty franchise, but when asked who makes the games, most players will probably say Activision.

We would know to say the development studio, but that's because we live in the game industry bubble and speak developer.

Give your game websites their own domain names (I use Namecheap).

Launching game websites on their own domain helps keep your messaging and goals laser-focused, it helps to attract more players with entertainment, it gives games their own soil to grow into brands, and all that combined helps you sell more games in the long run.

The only time I would likely deviate from this strategy is when a game brand becomes pop culture, and needs a hub website.

3. Use a Website Builder

This is how most indie game developers build game websites:

They buy their own hosting, install the WordPress software, spend a few hours hunting down and buying a theme, and then they spend the following days and weeks mulling over custom design and content creation.

By the time it's all said and done, it's taken weeks to build what they consider a basic game website, not to mention all the hours they'll continue to spend with tweaks, plugins, and so on.

And I totally get it, because I've done that too.

The reason we've all done this is because we believed that it was the fastest, cheapest way to build a website that would give us the greatest amount of control. Basically, we did that because it's "god mode" for website. But it's also cost us countless time and money that would have been better spent just working on our games.

So it's time to try something new.

I highly recommend using Squarespace to build your game websites. In any case, use a website builder. The reason I recommend Squarespace is because for $12 a month you get a website that would cost upwards of $50,000 to build from scratch. I would know, because I've built those.

The objection is always that the templates look the same. I'm a designer and I don't think so, but even if you do, drag and drop functionality gives you the power to change your layout in seconds.

A good rule of thumb is to spend a maximum of 1 day building an official game website, especially if you're just starting out.

In fact, you could build the official website for Hay Day in just 1 hour using a website builder, and that's a million-dollar game. If Supercell wanted to waste $50,000 on the Hay Day website, I'd actually be okay with that (maybe more) considering how many sales it drives, but they obviously haven't.

And neither should you.

4. You Only Need One Page with a "Buy Now" Button

Style alone doesn't sell games, but design does. You can have a really crappy game website, and still make money. In other words, a great-looking website is a want, not a need.

I've planned, designed, developed, optimized, and measured the performance of official game websites in all shapes and sizes. The only thing your game needs is a one-page, responsive website with a "buy now" button, very similar to what Supercell, and other popular indie game studios, do for most of their games.

And here's my wireframe for how I design those pages:

A wireframe for an official game website design. (Click to enlarge)

Overall, this is pretty self-explanatory. But there are a few things I want to point out. Everything I've done here (and didn't do) is by design, so let's walk through why this works so well.

1. HEADER

First of all, notice that there's no navigation, and very few outgoing links overall. This is an attempt to remove all distraction from your primary goal. The last thing we want is to get players ready to buy, have them click away somewhere else, lose that desire, and never come back. All we really need up top is the one-liner that makes them want to click the "buy now" button, and the button itself.

Once the potential player clicks to buy the game, it should open a new tab to your game marketplace, if not add it to their cart automatically.

If the potential player cannot be convinced, they will begin to scroll, if they haven't already by instinct.

2. TRAILER

The next step is to have potential players experience your game through its trailer, the next best way aside from playing the game itself. Do your best to keep trailers fast-paced, hitting all the beats, effectively communicating it's best features, and ending on a CTA to reinforce purchase.

This section could also have pagination if you're cutting teaser, gameplay, and official trailers

3. DESCRIPTION

The next section is basically the description. If they can't gather what your game is about from the trailer (they should have), then they'll literally read that here. Keep this section relatively short, as free from development language as possible, and highlight a handful of your game's best features.

Complete this section with a beautiful character shot, or some other primary break-out visual from your game.

4. SCREENSHOTS

Screenshots are pretty straight-forward, which also means they can be boring if you're not careful. Select the best screenshots to sell the game, not necessarily for the development of the game (there's a difference). Also, be free to experiment with how these might look if you used treated type or captions to highlight features.

I recommend a minimum of 3 screenshots, but 5-10 (max) with pagination or modal overlays is optimal.

5. NEWSLETTER

If at this point they're still not convinced to buy, they're probably very interested. And that's a great thing. So we're going to ask them to sign up for our newsletter. This enables use to (1) build our email list, and (2) win the opportunity to nurture them towards purchase. And we're going to entice them to opt-in by offering them a highly valuable, free incentive (e.g. comic book, strategy guide, playable level).

I've seen the best results with one-click for the "subscribe" button and then a short form (name and email address) in a popup or modal overlay. Otherwise, putting form fields right on the page works too.

6. FOOTER

Finally, the footer should stay as minimal as possible, only consisting of a few social links, a press kit link, and your studio's logo (unlinked). Again, we don't want users clicking away and never coming back.

Unfortunately, analytics prove time and again that once a potential player clicks away from your game website, they're probably not coming back.

THIS IS GOOD ENOUGH FOR NOW

In conclusion, this is the 20% of work that yields the 80% of results. The big design idea is to focus entirely on the primary goal, keep distractions to almost zero, and hit the secondary goal if we can't nail the primary one.

Don't get me wrong here, there's a time and place for a multi-page website experience with all the bells and whistles, but not right now. And I know exactly how fun it is to work on those projects. But right now we need a website that's 100% focused on driving sales, so that you can keep making your own games.

You need a good enough website, and this is good enough for now.

5. Install analytics and tracking

Every visitor that comes to your website tells a story with their actions. And those action can be measured with analytics. Where did they come from, how long did they stay, and did they click to buy your game?

Analyzing all that data for trends will help you optimize and improve your game websites moving forward.

For example, if most of your traffic is coming from Twitter, then investing more time and money in Twitter could be a great idea. Or if most of your conversions happen on mobile, then investing more time and money on optimizing the mobile version of your website could be lucrative.

To start taking advantage of all that data, I recommend installing Google Analytics, even if your hosting plan or website builder comes with their own set of analytics. In fact, I would turn those native analytics off if possible, and only use what you get through Google.

This is what a Google Analytics dashboard looks like.

In addition, I recommend installing tracking pixels for Facebook and Twitter. Tracking pixels will help you begin profiling an audience for use with social advertising experiments in the future.

Making Game Websites Should Be Low Cost and High Value

You now have a foundational, replicable system for building effective game websites as fast, and as cheap as possible. This is going to help automate standing up game websites, and start giving you back the hours you've spent away from working on your games.

But most importantly, they're going to perform far better.

You're going to see more traffic going to your marketplace pages, and more email opt-ins now that you're laser-focused on hitting goals. And that means, maybe for the first time ever, your game websites will actually be a valuable marketing asset, rather than a visually-stunning nebula that potential players seem to go missing in.

How has this strategy helped you think differently about your own game websites? Or what have you been doing beyond this strategy that seems to be really making a difference? Post a link to your own game websites in the comments, I'd love to see what you've done.

↧

How to Double Your Devblog Traffic with Republishing

March 7, 2017, 7:34 am

≫ Next: Mobile Gaming Industry Introduction Part 1: User Aquisition

≪ Previous: How to Make Effective Official Game Websites Really Fast

Back in the early 2000's Bungie kept fans of Halo informed with news, updates, and new feature developments. Players couldn't get enough, and a fervent community blossomed. Some of my friends still talk about how following that devblog inspired their careers.

Sure, other game developers had been doing that since the beginning of the internet, but Bungie somehow managed to get regular fans interested in the game development process, which if you've been writing your own devblog, you know is no easy task.

Since that time, practically every indie game developer I know starts and keeps a devblog for every new game they make.

But unfortunately, it doesn't typically go as well as Bungie's did.

Most devblogs start really strong, with great intention. And then it happens — they slow down, and often grind to a halt. Which is a real shame because devblogs can be a powerful marketing tool for promoting games. Well, if they're done the right way.

I'm going to explain exactly why devblogs typically die, offer you a new way of writing them, and then show you how to double your traffic (maybe overnight) with what I call The Republishing Strategy.

Why All Devblogs Eventually Slow down and Then Die

Although seemingly trivial, blogging is time-consuming. Even the shortest of posts require gathering facts, writing drafts, getting approvals, formatting, photo creation, and more. Not to mention those of you who are naturally long-winded (like me), or produce even more time-consuming content such as video.

After all is said and done, the average devblog post takes about a full 8-hour day to produce, publish, and promote.

But although the work sucks, it's not really the problem.

Devblogs are typically about explaining feature design, development, and the deeper intricacies of overcoming programming challenges. And while that can be fascinating to the game development community, game developers make up a relatively small part of your actual playing audience.

In other words, most players don't really care how you make games, they just want to play them.

So there you are spending a lot of time on producing content for only a fraction of your audience. And that means your devblog is more about preaching to the choir than it is about acquisition, which further means your traffic probably hasn't increased much since you started.

I see this all the time.

To add insult to injury, as you get closer to launching your game, you have less and less time to put towards marketing efforts that don't generate a reasonable return on investment, time that would be much better spent just getting your game out the door.

The Republishing Strategy

Like I said before, most of your players probably won't be game developers. But those are the people who get the most out of your devblog, and leveraging that community can be a powerful catalyst for "starting the fire," so to speak.

So instead of always riding the line between general posts and development talk, we're going to just draw a line in the sand and decide that at least 50% of your devblog will be for game developers and the other 50% will be for everyone.

I'll unpack this in much great detailer, and even leave with a list of things to do along the way, but here's a quick overview:

You're going to write in-depth game development tutorials using your games as the illustration, so that the meatier posts on your devblog transform into timeless resources for other game developers
Then you're going to republish those tutorials on a bunch of other game development websites, forums, and groups, so that your devblog traffic doubles
And finally, you're going to measure incoming traffic for clues as to how you can improve results moving forward

I call this The Republishing Strategy. And here's exactly how I recommend you execute on it:

1. Write In-Depth Game Development Tutorials Using Your Game as a Case Study

The first thing you're going to do is write an in-depth game development tutorial using your game as an example. It should be highly desirable for like-minded game developers, something they would typically search for on Google. And it should clock in around 2,000 words.

To figure out what you should write about, simply ask yourself this question: What have you accomplished in making your game that other game developers could learn from?

Here's a few title examples for articles that you might write...

How to Build a Match-3 Puzzle Game in 1 Day with Unity
The Beginner's Guide to Writing Roguelike Games in HTML5
The Ultimate Guide to Making IO Games Fast and Cheap
How to Design a Strategy Game Based on Fun, Not Timers
How to Design a Procedural Game That Feels Linear

You get the idea. They don't all have to be "how to" articles, but they should be very crunchy so that game developers want to read them.

Devblogs are typically locked in time, meaning once your game launches the content isn't relevant anymore. The Republishing Strategy turns all that on its head, making your devblog a timeless resource for aspiring game developers.

Finally, publish one of these articles every other week (or once a week if you're up for the challenge), first thing in the morning on a Tuesday or Wednesday. And promote in any other marketing channels you're using, such a social media.

Action Steps

Write 2,000-word game development tutorials using your game as the sole illustration (the longer the better)
Use at least 5 images throughout the post
Link to at least 3 major websites (e.g. Unity, other development resources, Amazon books)
Go heavy on the development language and buzzwords that other game developers are likely to search for on the internet (e.g. Unity, hierarchy tree traversal, procedural)
Always include a link to your game's website, newsletter, or marketplace page
Publish at least 3 of these articles a month
Always publish on Tuesdays or Wednesdays
Always publish first thing in the morning

And one more thing...

I understand that this whole writing thing might not come naturally for you. But I highly encourage you to stick with it, or at least give it your best try. The more you write, the easier it will get. Always start with an outline, and use blog post formulas to expedite production time.

I promise that if you trust the process you're going to see a much bigger return on your devblog investment than you ever have before.

2. Republish Your Articles on Popular Game Development Websites, Forums, and Groups

After your article has been published on your own devblog for about a week, then it's time to republish it on various other game development websites and forums. But there's a few things we need to do to the article before that happens.

First, if you're not already using Google Analytics with your devblog, please install it now (it's free) and make sure it's working. I'll explain why that's necessary in just a minute .

Next, insert a link back to your original post right after the first paragraph in your article. It should read something like this:

This article was originally published on My Nintendo News.

And finally, insert a short bio at the end of your article and keep it under 300 words. It should read something like this:

Shigeru Miyamoto is a game designer at Nintendo. He's currently working on Super Mario Bros. 4 for Nintendo Switch. For more great articles like this, subscribe to My Nintendo News.

No, Super Mario Bros. 4 doesn't exist. Just let me dream.

Again, you get the idea. Take this short template and put your own words to it. The goal here is to get people to click back to your studio, game, or devblog. And keep this as timeless as possible, don't say or link to anything that will need updated in the future. The last thing you want to do is have to go back to every republished post and edit it. That would be an epic waste of time.

One more thing to note here, if you're republishing your article on a syndicated blogging platform such as Gamasutra, then these new additions to your article make perfect sense and you should do them. But on certain forums such as Reddit, these added elements will feel inorganic. Be self-aware about where you're posting, follow community rules, and always present your work appropriately for the audience.

Now, the moment we've all been waiting for. It's finally time to republish the article...

Here we're looking for highly trafficked game development websites, forums, and groups that make sense for what you've written. Here's a short list of game development communities that I know work really well for this:

Front page of Gamasutra where your post could be featured.

Obviously this isn't a comprehensive list. There's also framework forums, platform blogs, and Facebook groups to consider. Use your inside knowledge, experience, and gut instinct to put together your own list of websites, forums, and groups where you feel the community could benefit most from what you've written.

The worst thing you could do is blindly republish everywhere.

The Republishing Strategy should be a two-way street. You're helping fellow game developers learn how to make your games, and in return they're helping you by checking out your games. There are a few staple communities you can rely on for every post (such as Gamasutra or Reddit), but beyond that, the websites and forums you republish on should vary based on what you write.

Action Steps

Install Google Analytics and make sure it's working properly
Insert a link back to your devblog after the first paragraph of each article (where appropriate)
Insert an under 300-word bio at the end of each article, complete with links back to your devblog or more (where appropriate)
Republish articles on highly trafficked websites, forums, and groups that make the most sense for what you've written

That's the bulk of The Republishing Strategy, but you know how I roll, we need to measure and optimize what we've done as a last step.

3. Use Analytics to Measure and Optimize Your Republishing Performance

The day after you republish your first article, open your Google Analytics dashboard, and if you're a relatively low traffic or new devblog, you should see that your traffic has doubled overnight. I'm going to be honest, it's pretty awesome when you see that happen for the first time.

Take a moment to enjoy the fruits of your labor.

After all is said and done your dashboard should look something like this...

google-analytics-republishing-strategy-i

google-analytics-republishing-strategy-i

Here's a screenshot of me using the republishing strategy to increase my traffic by 6,066% on day 1, and by 533% by day 7.

You should see about 500 new visitors from a week of republishing.

Obviously it's going to be different for everyone, and depends on a lot of factors. Suffice to say, this is attainable fuzzy math. In any case, over time you're definitely going to see direct traffic rise slowly.

If you've already got thousands of people reading your devblog each day, it could take around 1 month for you to see your traffic double from these efforts. If you're a hit game with tens of thousands in traffic each day then you're going to see great percentage bumps for your efforts, but you probably won't double your traffic from this strategy alone, at least not quickly.

After a full week of republishing, it's time to analyze your traffic for performance. Inside the Google Analytics dashboard navigate to Acquisition > Overview.

google-analytics-acquisition-overview-10

google-analytics-acquisition-overview-10

Where to go in Google Analytics to see your acquisition channels and traffic.

The tabled data at the bottom of the report is showing you which channels sent the most traffic to your devblog in the last month. Click on Social and Referral for a detailed description of exactly which websites sent you the most traffic in those channels.

Pay special attention to percentage of sessions and look for the traffic that isn't performing well. Consider how you might improve them, or whether or not you should keep republishing there.

Also, don't rule out that your content may have missed the mark, or that it might suck. No offense, but that can happen. Stay self-aware, question your content, and experiment with change before completely abandoning a certain website or forum.

Analyzing your referral traffic once a week is going to help you stop wasting time on what's not working, and spend more time on what does.

Action Steps

Measure referral performance weekly
Note referrals with poor performance, and consider optimizations
Experiment with new websites, forums, or content
Rinse and repeat

And for those of you number-junkies out there, you can also use Google Analytics to figure out which referrals send the most traffic to your website or marketplace pages. It's possible that the website or forum you're getting the most traffic from isn't the same one that drives the most sales. How to do that is way beyond the scope of this article, I'm just saying it to show just how important measuring your marketing can be.

So there you have it, that's my republishing strategy for devblogs.

Your Devblog Probably Won't Be an Overnight Success

I hope you're as successful as Bungie someday, but let's be real here...

Your devblog probably won't be the overnight success that theirs was back in the day. It could happen, but it probably won't. However, this is the best way I know to improve what you're already doing, double your traffic within a month's time, and convert your devblog into a selling tool that pays dividends long after your games have launched.

That's a pretty big win.

Game marketing is a process, and overnight successes are few and far between. But I know that if you implement this strategy you're going to see your traffic numbers double, and climb from there.

That said, I'd love to hear from you...

How did you feel the first time you saw your traffic double from implementing this strategy, what types of results have you seen since, and what have you done to optimize your devblog each month?

↧

Mobile Gaming Industry Introduction Part 1: User Aquisition

March 7, 2017, 7:08 pm

≫ Next: GPU Performance for Game Artists

≪ Previous: How to Double Your Devblog Traffic with Republishing

User Acquisition or "UA" is now the dominant marketing paradigm in mobile gaming. App Stores are crowded and the chance of users finding your game organically is minimal (unless you're lucky enough to be featured). For many mobile games advertising is their only hope of growing their user base.

Important Keywords and Acronyms:

CTR - Click Through Rate of an advertisement. Number of Clicks/Number of Impressions. A CTR of 2% would be considered very good.
IR - Install Rate. After a user clicks your ad and lands on the App Store landing page, how many of them will go on to install your game?
CPI - Cost per Install. Amount of money you spend on advertising/Number of Installs. This is essentially how much you pay for each player.
A/B Test - Test 2 different versions of creative content such as your app's icon to see which delivers better results.

Like what you're reading? This article was originally posted on my gaming industry blog: From Game To Brain.

Success Factors
Target the right players (the type who will be playing your game) and TEST if the following materials are appealing to them:

Name: The name should be memorable and stand out compared to other names. It should match the games theme and give players an idea about your game. Make sure to utilize any IP or popular trending elements of your game. Test the name and refine until you have a name that your target audience will be drawn to.

Icon: Your game's icon is the #1 most important factor in determining whether a player will decide to download it. Like the name it must be tested and refined. Simple designs work best because icons are small. High contrast will make your icon stand out. Utilize A/B testing to see what works best for your game.

After this, it's very important that your game is easy to download. Large games are often abandoned and forgotten about mid-download. Take a page from Nintendo with Super Mario Run - let the user download the app, play the first level, and then ask them to download the rest of the game.

After you've optimized your name, icon, and file size it's time to look at the rest of your Store Page's contents.

Reviews: These are key, as players will look to the review of other players to determine your game's quality. Make sure to incentivize your players to leave your game a good review. Don't forget reviews are reset with every update.

Screenshots, Description, SEO: Several articles can be written about each of these. Test, test, and test again to make sure everything on your Store Landing Page is optimized for the highest Install Rate. Optimize your SEO to maximize organic search traffic.

Being Featured: Although this should be at the top, it is so difficult to do nowadays that you certainly can't rely on it. Top 5 featuring on the App Store will yield you 1 million free downloads a day from players who just open their App Store. Featuring depends on your game's uniqueness among its competition, the "fun factor", and having a full feature-set that fully utilizes the device (for example if you release special versions at iPad Pro resolution you have a shot at being featured in the iPad Pro apps section). Finally: Platform matters. Being featured on Amazon is far easier than being featured on Apple's App Store.

Invest in Good Creative: Having a constant source of good banner art and videos is one of the cheapest ways to reduce your CPI. Good creative makes everything in the process cheaper, because it makes players more excited to try your game. If you're doing targeted advertising you should not show a player the same piece of creative more than 3-5 times. So every 1-5 weeks you'll need a completely new set of artwork and creative to stay fresh. Creative is the best ROI because it lowers your CPI across the board.

Bottom Line:
User Acquisition is the most expensive part of the game monetization funnel. Reduce your costs here by investing in good creative content for advertisement and your store landing page. Test everything to find out what works and what doesn't, and use your test data to refine and improve your creative.

Next week we will be covering Part 2 of this 5 part series: Retention. Learn the most cost effective ways to keep players playing your game for a long time. If you find this information useful please subscribe or bookmark and check back next week

↧

GPU Performance for Game Artists

March 15, 2017, 4:50 am

≫ Next: Latency Matters in Online Gaming

≪ Previous: Mobile Gaming Industry Introduction Part 1: User Aquisition

Performance is everybody’s responsibility, no matter what your role. When it comes to the GPU, 3D programmers have a lot of control over performance; we can optimize shaders, trade image quality for performance, use smarter rendering techniques… we have plenty of tricks up our sleeves. But there’s one thing we don’t have direct control over, and that’s the game’s art.

We rely on artists to produce assets that not only look good but are also efficient to render. For artists, a little knowledge of what goes on under the hood can make a big impact on a game’s framerate. If you’re an artist and want to understand why things like draw calls, LODs, and mipmaps are important for performance, read on!

To appreciate the impact that your art has on the game’s performance, you need to know how a mesh makes its way from your modelling package onto the screen in the game. That means having an understanding of the GPU – the chip that powers your graphics card and makes real-time 3D rendering possible in the first place. Armed with that knowledge, we’ll look at some common art-related performance issues, why they’re a problem, and what you can do about it. Things are quickly going to get pretty technical, but if anything is unclear I’ll be more than happy to answer questions in the comments section.

Before we start, I should point out that I am going to deliberately simplify a lot of things for the sake of brevity and clarity. In many cases I’m generalizing, describing only the typical case, or just straight up leaving things out. In particular, for the sake of simplicity the idealized version of the GPU I describe below more closely matches that of the previous (DX9-era) generation. However when it comes to performance, all of the considerations below still apply to the latest PC & console hardware (although not necessarily all mobile GPUs). Once you understand everything described here, it will be much easier to get to grips with the variations and complexities you’ll encounter later, if and when you start to dig deeper.

Part 1: The rendering pipeline from 10,000 feet

For a mesh to be displayed on the screen, it must pass through the GPU to be processed and rendered. Conceptually, this path is very simple: the mesh is loaded, vertices are grouped together as triangles, the triangles are converted into pixels, each pixel is given a colour, and that’s the final image. Let’s look a little closer at what happens at each stage.

After you export a mesh from your DCC tool of choice (Digital Content Creation – Maya, Max, etc.), the geometry is typically loaded into the game engine in two pieces; a Vertex Buffer (VB) that contains a list of the mesh’s vertices and their associated properties (position, UV coordinates, normal, color etc.), and an Index Buffer (IB) that lists which vertices in the VB are connected to form triangles.

Along with these geometry buffers, the mesh will also have been assigned a material to determine what it looks like and how it behaves under different lighting conditions. To the GPU this material takes the form of custom-written shaders – programs that determine how the vertices are processed, and what colour the resulting pixels will be. When choosing the material for the mesh, you will have set various material parameters (eg. setting a base color value or picking a texture for various maps like albedo, roughness, normal etc.) – these are passed to the shader programs as inputs.

The mesh and material data get processed by various stages of the GPU pipeline in order to produce pixels in the final render target (an image to which the GPU writes). That render target can then be used as a texture in subsequent shader programs and/or displayed on screen as the final image for the frame.

For the purposes of this article, here are the important parts of the GPU pipeline from top to bottom:

Input Assembly. The GPU reads the vertex and index buffers from memory, determines how the vertices are connected to form triangles, and feeds the rest of the pipeline.
Vertex Shading. The vertex shader gets executed once for every vertex in the mesh, running on a single vertex at a time. Its main purpose is to transform the vertex, taking its position and using the current camera and viewport settings to calculate where it will end up on the screen.
Rasterization. Once the vertex shader has been run on each vertex of a triangle and the GPU knows where it will appear on screen, the triangle is rasterized – converted into a collection of individual pixels. Per-vertex values – UV coordinates, vertex color, normal, etc. – are interpolated across the triangle’s pixels. So if one vertex of a triangle has a black vertex color and another has white, a pixel rasterized in the middle of the two will get the interpolated vertex color grey.
Pixel Shading. Each rasterized pixel is then run through the pixel shader (although technically at this stage it’s not yet a pixel but ‘fragment’, which is why you’ll see the pixel shader sometimes called a fragment shader). This gives the pixel a color by combining material properties, textures, lights, and other parameters in the programmed way to get a particular look. Since there are so many pixels (a 1080p render target has over two million) and each one needs to be shaded at least once, the pixel shader is usually where the GPU spends a lot of its time.
Render Target Output. Finally the pixel is written to the render target – but not before undergoing some tests to make sure it’s valid. For example in normal rendering you want closer objects to appear in front of farther objects; the depth test can reject pixels that are further away than the pixel already in the render target. But if the pixel passes all the tests (depth, alpha, stencil etc.), it gets written to the render target in memory.

There’s much more to it, but that’s the basic flow: the vertex shader is executed on each vertex in the mesh, each 3-vertex triangle is rasterized into pixels, the pixel shader is executed on each rasterized pixel, and the resulting colors are written to a render target.

Under the hood, the shader programs that represent the material are written in a shader programming language such as HLSL. These shaders run on the GPU in much the same way that regular programs run on the CPU – taking in data, running a bunch of simple instructions to change the data, and outputting the result. But while CPU programs are generalized to work on any type of data, shader programs are specifically designed to work on vertices and pixels. These programs are written to give the rendered object the look of the desired material – plastic, metal, velvet, leather, etc.

To give you a concrete example, here’s a simple pixel shader that does Lambertian lighting (ie. simple diffuse-only, no specular highlights) with a material color and a texture. As shaders go it’s one of the most basic, but you don’t need to understand it – it just helps to see what shaders can look like in general.

float3    MaterialColor;
Texture2D MaterialTexture;
SamplerState TexSampler;

float3 LightDirection;
float3 LightColor;

float4 MyPixelShader( float2 vUV : TEXCOORD0, float3 vNorm : NORMAL0 ) : SV_Target
{
    float3 vertexNormal = normalize(vNorm);
    float3 lighting = LightColor * dot( vertexNormal, LightDirection );
    float3 material = MaterialColor * MaterialTexture.Sample( TexSampler, vUV ).rgb;

    float3 color = material * lighting;
    float alpha = 1;

    return float4(color, alpha);
}

A simple pixel shader that does basic lighting. The inputs at the top like MaterialTexture and LightColor are filled in by the CPU, while vUV and vNorm are both vertex properties that were interpolated across the triangle during rasterization.

And the generated shader instructions:

dp3 r0.x, v1.xyzx, v1.xyzx
 rsq r0.x, r0.x
 mul r0.xyz, r0.xxxx, v1.xyzx
 dp3 r0.x, r0.xyzx, cb0[1].xyzx
 mul r0.xyz, r0.xxxx, cb0[2].xyzx
 sample_indexable(texture2d)(float,float,float,float) r1.xyz, v0.xyxx, t0.xyzw, s0
 mul r1.xyz, r1.xyzx, cb0[0].xyzx
 mul o0.xyz, r0.xyzx, r1.xyzx
 mov o0.w, l(1.000000)
 ret

The shader compiler takes the above program and generates these instructions which are run on the GPU; a longer program produces more instructions which means more work for the GPU to do.

As an aside, you might notice how isolated the shader steps are – each shader works on a single vertex or pixel without needing to know anything about the surrounding vertices/pixels. This is intentional and allows the GPU to process huge numbers of independent vertices and pixels in parallel, which is part of what makes GPUs so fast at doing graphics work compared to CPUs.

We’ll return to the pipeline shortly to see where things might slow down, but first we need to back up a bit and look at how the mesh and material got to the GPU in the first place. This is also where we meet our first performance hurdle – the draw call.

The CPU and Draw Calls

The GPU cannot work alone; it relies on the game code running on the machine’s main processor – the CPU – to tell it what to render and how. The CPU and GPU are (usually) separate chips, running independently and in parallel. To hit our target frame rate – most commonly 30 frames per second – both the CPU and GPU have to do all the work to produce a single frame within the time allowed (at 30fps that’s just 33 milliseconds per frame).

To achieve this, frames are often pipelined; the CPU will take the whole frame to do its work (process AI, physics, input, animation etc.) and then send instructions to the GPU at the end of the frame so it can get to work on the next frame. This gives each processor a full 33ms to do its work at the expense of introducing a frame’s worth of latency (delay). This may be an issue for extremely time-sensitive twitchy games like first person shooters – the Call of Duty series for example runs at 60fps to reduce the latency between player input and rendering – but in general the extra frame is not noticeable to the player.

Every 33ms the final render target is copied and displayed on the screen at VSync – the interval during which the monitor looks for a new frame to display. But if the GPU takes longer than 33ms to finish rendering the frame, it will miss this window of opportunity and the monitor won’t have any new frame to display. That results in either screen tearing or stuttering and an uneven framerate that we really want to avoid. We also get the same result if the CPU takes too long – it has a knock-on effect since the GPU doesn’t get commands quickly enough to do its job in the time allowed. In short, a solid framerate relies on both the CPU and GPU performing well.

Here the CPU takes too long to produce rendering commands for the second frame, so the GPU starts rendering late and thus misses VSync.

To display a mesh, the CPU issues a draw call which is simply a series of commands that tells the GPU what to draw and how to draw it. As the draw call goes through the GPU pipeline, it uses the various configurable settings specified in the draw call – mostly determined by the mesh’s material and its parameters – to decide how the mesh is rendered. These settings, called GPU state, affect all aspects of rendering, and consist of everything the GPU needs to know in order to render an object. Most significantly for us, GPU state includes the current vertex/index buffers, the current vertex/pixel shader programs, and all the shader inputs (eg. MaterialTexture or LightColor in the above shader code example).

This means that to change a piece of GPU state (for example changing a texture or switching shaders), a new draw call must be issued. This matters because these draw calls are not free for the CPU. It costs a certain amount of time to set up the desired GPU state changes and then issue the draw call. Beyond whatever work the game engine needs to do for each call, extra error checking and bookkeeping cost is introduced by the graphics driver, an intermediate layer of code written by the GPU vendor (NVIDIA, AMD etc.) that translates the draw call into low-level hardware instructions. Too many draw calls can put too much of a burden on the CPU and cause serious performance problems.

Due to this overhead, we generally set an upper limit to the number of draw calls that are acceptable per frame. If this limit is exceeded during gameplay testing, steps must be taken such as reducing the number of objects, reducing draw distance, etc. Console games will typically try to keep draw calls in the 2000-3000 range (eg. on Far Cry Primal we tried to keep it below 2500 per frame). That might sound like a lot, but it also includes any special rendering techniques that might be employed – cascaded shadows for example can easily double the number of draw calls in a frame.

As mentioned above, GPU state can only be changed by issuing a new draw call. This means that although you may have created a single mesh in your modelling package, if one half of the mesh uses one texture for the albedo map and the other half uses a different texture, it will be rendered as two separate draw calls. The same goes if the mesh is made up of multiple materials; different shaders need to be set, so multiple draw calls must be issued.

In practice, a very common source of state change – and therefore extra draw calls – is switching texture maps. Typically the whole mesh will use the same material (and therefore the same shaders), but different parts of the mesh will use different sets of albedo/normal/roughness maps. With a scene of hundreds or even thousands of objects, using many draw calls for each object will cost a considerable amount of CPU time and so will have a noticeable impact on the framerate of the game.

To avoid this, a common solution is to combine all the different texture maps used on a mesh into a single big texture, often called an atlas. The UVs of the mesh are then adjusted to look up the right part of the atlas, and the entire mesh (or even multiple meshes) can be rendered in a single draw call. Care must be taken when constructing the atlas so that adjacent textures don’t bleed into each other at lower mips, but these problems are relatively minor compared to the gains that can be had in terms of performance.

A texture atlas from Unreal Engine’s Infiltrator demo

Many engines also support instancing, also known as batching or clustering. This is the ability to use a single draw call to render multiple objects that are mostly identical in terms of shaders and state, and only differ in a restricted set of ways (typically their position and rotation in the world). The engine will usually recognize when multiple identical objects can be rendered using instancing, so it’s always preferable to use the same object multiple times in a scene when possible, instead of multiple different objects that will need to be rendered with separate draw calls.

Another common technique for reducing draw calls is manually merging many different objects that share the same material into a single mesh. This can be effective, but care must be taken to avoid excessive merging which can actually worsen performance by increasing the amount of work for the GPU. Before any draw call gets issued, the engine’s visibility system will determine whether or not the object will even appear on screen. If not, it’s very cheap to just ignore the object at this early stage and not pay for any draw call or GPU work (also known as visibility culling). This is usually done by checking if the object’s bounding volume is visible from the camera’s point of view, and that it is not completely blocked from view (occluded) by any other objects.

However, when multiple meshes are merged into a single object, their individual bounding volumes must be combined into a single large volume that is big enough to enclose every mesh. This increases the likelihood that the visibility system will be able to see some part of the volume, and so will consider the entire collection visible. That means that it becomes a draw call, and so the vertex shader must be executed on every vertex in the object – even if very few of those vertices actually appear on the screen. This can lead to a lot of GPU time being wasted because the vertices end up not contributing anything to the final image. For these reasons, mesh merging is the most effective when it is done on groups of small objects that are close to each other, as they will probably be on-screen at the same time anyway.

A frame from XCOM 2 as captured with RenderDoc. The wireframe (bottom) shows in grey all the extra geometry submitted to the GPU that is outside the view of the in-game camera.

As an illustrative example take the above capture of XCOM 2, one of my favourite games of the last couple of years. The wireframe shows the entire scene as submitted to the GPU by the engine, with the black area in the middle being the geometry that’s actually visible by the game camera. All the surrounding geometry in grey is not visible and will be culled after the vertex shader is executed, which is all wasted GPU time. In particular, note the highlighted red geometry which is a series of bush meshes, combined and rendered in just a few draw calls. Since the visibility system determined that at least some of the bushes are visible on the screen, they are all rendered and so must all have their vertex shader executed before determining which can be culled… which turns out to be most of them.

Please note this isn’t an indictment of XCOM 2 in particular, I just happened to be playing it while writing this article! Every game has this problem, and it’s a constant battle to balance the CPU cost of doing more accurate visibility tests, the GPU cost of culling the invisible geometry, and the CPU cost of having more draw calls.

Things are changing when it comes to the cost of draw calls however. As mentioned above, a significant reason for their expense is the overhead of the driver doing translation and error checking. This has long been the case, but the most modern graphics APIs (eg. Direct3D 12 and Vulkan) have been restructured in order to avoid most of this overhead. While this does introduce extra complexity to the game’s rendering engine, it can also result in cheaper draw calls, allowing us to render many more objects than before possible. Some engines (most notably the latest version used by Assassin’s Creed) have even gone in a radically different direction, using the capabilities of the latest GPUs to drive rendering and effectively doing away with draw calls altogether.

The performance impact of having too many draw calls is mostly on the CPU; pretty much all other performance issues related to art assets are on the GPU. We’ll now look at what a bottleneck is, where they can happen, and what we can do about them.

Part 2: Common GPU bottlenecks

The very first step in optimization is to identify the current bottleneck so you can take steps to reduce or eliminate it. A bottleneck refers to the section of the pipeline that is slowing everything else down. In the above case where too many draw calls are costing too much, the CPU is the bottleneck. Even if we performed other optimizations that made the GPU faster, it wouldn’t matter to the framerate because the CPU is still running too slowly to produce a frame in the required amount of time.

4 draw calls going through the pipeline, each being the rendering of a full mesh containing many triangles. The stages overlap because as soon as one piece of work is finished it can be immediately passed to the next stage (eg. when three vertices are processed by the vertex shader then the triangle can proceed to be rasterized).

You can think of the GPU pipeline as an assembly line. As each stage finishes with its data, it forwards the results to the following stage and proceeds with the next piece of work. Ideally every stage is busy working all the time, and the hardware is being utilized fully and efficiently as represented in the above image – the vertex shader is constantly processing vertices, the rasterizer is constantly rasterizing pixels, and so on. But consider what happens if one stage takes much longer than the others:

What happens here is that an expensive vertex shader can’t feed the following stages fast enough, and so becomes the bottleneck. If you had a draw call that behaved like this, making the pixel shader faster is not going to make much of a difference to the time it takes for the entire draw call to be rendered. The only way to make things faster is to reduce the time spent in the vertex shader. How we do that depends on what in the vertex shader stage is actually causing the bottleneck.

You should keep in mind that there will almost always be a bottleneck of some kind – if you eliminate one, another will just take its place. The trick is knowing when you can do something about it, and when you have to live with it because that’s just what it costs to render what you want to render. When you optimize, you’re really trying to get rid of unnecessary bottlenecks. But how do you identify what the bottleneck is?

Profiling

Profiling tools are absolutely essential for figuring out where all the GPU’s time is being spent, and good ones will point you at exactly what you need to change in order for things to go faster. They do this in a variety of ways – some explicitly show a list of bottlenecks, others let you run ‘experiments’ to see what happens (eg. “how does my draw time change if all the textures are tiny”, which can tell you if you’re bound by memory bandwidth or cache usage).

Unfortunately this is where things get a bit hand-wavy, because some of the best performance tools available are only available for the consoles and therefore under NDA. If you’re developing for Xbox or Playstation, bug your friendly neighbourhood graphics programmer to show you these tools. We love it when artists get involved in performance, and will be happy to answer questions and even host tutorials on how to use the tools effectively.

Unity's basic built-in GPU profiler

The PC already has some pretty good (albeit hardware-specific) profiling tools which you can get directly from the GPU vendors, such as NVIDIA’s Nsight, AMD’s GPU PerfStudio, and Intel’s GPA. Then there’s RenderDoc which is currently the best tool for graphics debugging on PC, but doesn’t have any advanced profiling features. Microsoft is also starting to release its awesome Xbox profiling tool PIX for Windows too, albeit only for D3D12 applications. Assuming they also plan to provide the same bottleneck analysis tools as the Xbox version (tricky with the wide variety of hardware out there), it should be a huge asset to PC developers going forward.

These tools can give you more information about the performance of your art than you will ever need. They can also give you a lot of insight into how a frame is put together in your engine, as well as being awesome debugging tools for when things don’t look how they should.

Being able to use them is important, as artists need to be responsible for the performance of their art. But you shouldn’t be expected to figure it all out on your own – any good engine should provide its own custom tools for analyzing performance, ideally providing metrics and guidelines to help determine if your art assets are within budget. If you want to be more involved with performance but feel you don’t have the necessary tools, talk to your programming team. Chances are they already exist – and if they don’t, they should be created!

Now that you know how GPUs work and what a bottleneck is, we can finally get to the good stuff. Let’s dig into the most common real-world bottlenecks that can show up in the pipeline, how they happen, and what can be done about them.

Shader instructions

Since most of the GPU’s work is done with shaders, they’re often the source of many bottlenecks of the you’ll see. When a bottleneck is identified as shader instructions (sometimes referred to as ALUs from Arithmetic Logic Units, the hardware that actually does the calculations), it’s simply a way of saying the vertex or pixel shader is doing a lot of work and the rest of the pipeline is waiting for that work to finish.

Often the vertex or pixel shader program itself is just too complex, containing many instructions and taking a long time to execute. Or maybe the vertex shader is reasonable but the mesh you’re rendering has too many vertices which adds up to a lot of time spent executing the vertex shader. Or the draw call covers a large area of the screen touching many pixels, and so spends a lot of time in the pixel shader.

Unsurprisingly, the best way to optimize a shader instruction bottleneck is to execute less instructions! For pixel shaders that means choosing a simpler material with less features to reduce the number of instructions executed per pixel. For vertex shaders it means simplifying your mesh to reduce the number of vertices that need to be processed, as well as being sure to use LODs (Level Of Detail – simplified versions of your mesh for use when the object is far away and small on the screen).

Sometimes however, shader instruction bottlenecks are instead just an indication of problems in some other area. Issues such as too much overdraw, a misbehaving LOD system, and many others can cause the GPU to do a lot more work than necessary. These problems can be either on the engine side or the content side; careful profiling, examination, and experience will help you to figure out what’s really going on.

One of the most common of these issues – overdraw – is when the same pixel on the screen needs to be shaded multiple times, because it’s touched by multiple draw calls. Overdraw is a problem because it decreases the overall time the GPU has to spend on rendering. If every pixel on the screen has to be shaded twice, the GPU can only spend half the amount of time on each pixel and still maintain the same framerate.

A frame capture from PIX with the corresponding overdraw visualization mode

Sometimes overdraw is unavoidable, such as when rendering translucent objects like particles or glass-like materials; the background object is visible through the foreground, so both need to be rendered. But for opaque objects, overdraw is completely unnecessary because the pixel shown in the buffer at the end of rendering is the only one that actually needs to be processed. In this case, every overdrawn pixel is just wasted GPU time.

Steps are taken by the GPU to reduce overdraw in opaque objects. The early depth test (which happens before the pixel shader – see the initial pipeline diagram) will skip pixel shading if it determines that the pixel will be hidden by another object. It does that by comparing the pixel being shaded to the depth buffer – a render target where the GPU stores the entire frame’s depth so that objects occlude each other properly. But for the early depth test to be effective, the other object must have already been rendered so it is present in the depth buffer. That means that the rendering order of objects is very important.

Ideally every scene would be rendered front-to-back (ie. objects closest to the camera first), so that only the foreground pixels get shaded and the rest get killed by the early depth test, eliminating overdraw entirely. But in the real world that’s not always possible because you can’t reorder the triangles inside a draw call during rendering. Complex meshes can occlude themselves multiple times, or mesh merging can result in many overlapping objects being rendered in the “wrong” order causing overdraw. There’s no easy answer for avoiding these cases, and in the latter case it’s just another thing to take into consideration when deciding whether or not to merge meshes.

To help early depth testing, some games do a partial depth prepass. This is a preliminary pass where certain large objects that are known to be effective occluders (large buildings, terrain, the main character etc.) are rendered with a simple shader that only outputs to the depth buffer, which is relatively fast as it avoids doing any pixel shader work such as lighting or texturing. This ‘primes’ the depth buffer and increases the amount of pixel shader work that can be skipped during the full rendering pass later in the frame. The drawback is that rendering the occluding objects twice (once in the depth-only pass and once in the main pass) increases the number of draw calls, plus there’s always a chance that the time it takes to render the depth pass itself is more than the time it saves from increased early depth test efficiency. Only profiling in a variety of cases can determine whether or not it’s worth it for any given scene.

Particle overdraw visualization of an explosion in Prototype 2

One place where overdraw is a particular concern is particle rendering, given that particles are transparent and often overlap a lot. Artists working on particle effects should always have overdraw in mind when producing effects. A dense cloud effect can be produced by emitting lots of small faint overlapping particles, but that’s going to drive up the rendering cost of the effect; a better-performing alternative would be to emit fewer large particles, and instead rely more on the texture and texture animation to convey the density of the effect. The overall result is often more visually effective anyway because offline software like FumeFX and Houdini can usually produce much more interesting effects through texture animation, compared to real-time simulated behaviour of individual particles.

The engine can also take steps to avoid doing more GPU work than necessary for particles. Every rendered pixel that ends up completely transparent is just wasted time, so a common optimization is to perform particle trimming: instead of rendering the particle with two triangles, a custom-fitted polygon is generated that minimizes the empty areas of the texture that are used.

Particle 'cutout' tool in Unreal Engine 4

The same can be done for other partially transparent objects such as vegetation. In fact for vegetation it’s even more important to use custom geometry to eliminate the large amount of empty texture space, as vegetation often uses alpha testing. This is when the alpha channel of the texture is used to decide whether or not to discard the pixel during the pixel shader stage, effectively making it transparent. This is a problem because alpha testing can also have the side effect of disabling the early depth test completely (because it invalidates certain assumptions that the GPU can make about the pixel), leading to much more unnecessary pixel shader work. Combine this with the fact that vegetation often contains a lot of overdraw anyway – think of all the overlapping leaves on a tree – and it can quickly become very expensive to render if you’re not careful.

A close relative of overdraw is overshading, which is caused by tiny or thin triangles and can really hurt performance by wasting a significant portion of the GPU’s time. Overshading is a consequence of how GPUs process pixels during pixel shading: not one at a time, but instead in ‘quads’ which are blocks of four pixels arranged in a 2x2 pattern. It’s done like this so the hardware can do things like comparing UVs between pixels to calculate appropriate mipmap levels.

This means that if a triangle only touches a single pixel of a quad (because the triangle is tiny or very thin), the GPU still processes the whole quad and just throws away the other three pixels, wasting 75% of the work. That wasted time can really add up, and is particularly painful for forward (ie. not deferred) renderers that do all lighting and shading in a single pass in the pixel shader. This penalty can be reduced by using properly-tuned LODs; besides saving on vertex shader processing, they can also greatly reduce overshading by having triangles cover more of each quad on average.'

A 10x8 pixel buffer with 5x4 quads. The two triangles have poor quad utilization -- left is too small, right is too thin. The 10 red quads touched by the triangles need to be completely shaded, even though the 12 green pixels are the only ones that are actually needed. Overall, 70% of the GPU's work is wasted.

(Random trivia: quad overshading is also the reason you’ll sometimes see fullscreen post effects use a single large triangle to cover the screen instead of two back-to-back triangles. With two triangles, quads that straddle the shared edge would be wasting some of their work, so avoiding that saves a minor amount of GPU time.)

Beyond overshading, tiny triangles are also a problem because GPUs can only process and rasterize triangles at a certain rate, which is usually relatively low compared to how many pixels it can process in the same amount of time. With too many small triangles, it can’t produce pixels fast enough to keep the shader units busy, resulting in stalls and idle time – the real enemy of GPU performance.

Similarly, long thin triangles are bad for performance for another reason beyond quad usage: GPUs rasterize pixels in square or rectangular blocks, not in long strips. Compared to a more regular-shaped triangle with even sides, a long thin triangle ends up making the GPU do a lot of extra unnecessary work to rasterize it into pixels, potentially causing a bottleneck at the rasterization stage. This is why it’s usually recommended that meshes are tessellated into evenly-shaped triangles, even if it increases the polygon count a bit. As with everything else, experimentation and profiling will show the best balance.

Memory Bandwidth and Textures

As illustrated in the above diagram of the GPU pipeline, meshes and textures are stored in memory that is physically separate from the GPU’s shader processors. That means that whenever the GPU needs to access some piece of data, like a texture being fetched by a pixel shader, it needs to retrieve it from memory before it can actually use it as part of its calculations.

Memory accesses are analogous to downloading files from the internet. File downloads take a certain amount of time due to the internet connection’s bandwidth – the speed at which data can be transferred. That bandwidth is also shared between all downloads – if you can download one file at 6MB/s, two files only download at 3MB/s each.

The same is true of memory accesses; index/vertex buffers and textures being accessed by the GPU take time, and must share memory bandwidth. The speeds are obviously much higher than internet connections – on paper the PS4’s GPU memory bandwidth is 176GB/s – but the idea is the same. A shader that accesses many textures will rely heavily on having enough bandwidth to transfer all the data it needs in the time it needs it.

Shaders programs are executed by the GPU with these restrictions in mind. A shader that needs to access a texture will try to start the transfer as early as possible, then do other unrelated work (for example lighting calculations) and hope that the texture data has arrived from memory by the time it gets to the part of the program that needs it. If the data hasn’t arrived in time – because the transfer is slowed down by lots of other transfers, or because it runs out of other work to do (especially likely for dependent texture fetches) – execution will stop and it will just sit there and wait. This is a memory bandwidth bottleneck; making the rest of the shader faster will not matter if it still needs to stop and wait for data to arrive from memory. The only way to optimize this is to reduce the amount of bandwidth being used, or the amount of data being transferred, or both.

Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It’s a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around.

Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It’s a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around.

First and foremost is a cache. This is a small piece of high-speed memory that the GPU has very fast access to, and is used to keep chunks of memory that have been accessed recently in case the GPU needs them again. In the internet connection analogy, the cache is your computer’s hard drive that stores the downloaded files for faster access in the future.

When a piece of memory is accessed, like a single texel in a texture, the surrounding texels are also pulled into the cache in the same memory transfer. The next time the GPU looks for one of those texels, it doesn’t need to go all the way to memory and can instead fetch it from the cache extremely quickly. This is actually often the common case – when a texel is displayed on the screen in one pixel, it’s very likely that the pixel beside it will need to show the same texel, or the texel right beside it in the texture. When that happens, nothing needs to be transferred from memory, no bandwidth is used, and the GPU can access the cached data almost instantly. Caches are therefore vitally important for avoiding memory-related bottlenecks. Especially when you take filtering into account – bilinear, trilinear, and anisotropic filtering all require multiple texels to be accessed for each lookup, putting an extra burden on bandwidth usage. High-quality anisotropic filtering is particularly bandwidth-intensive.

Now think about what happens in the cache if you try to display a large texture (eg. 2048x2048) on an object that’s very far away and only takes up a few pixels on the screen. Each pixel will need to fetch from a very different part of the texture, and the cache will be completely ineffective since it only keeps texels that were close to previous accesses. Every texture access will try to find its result in the cache and fail (called a ‘cache miss’) and so the data must be fetched from memory, incurring the dual costs of bandwidth usage and the time it takes for the data to be transferred. A stall may occur, slowing the whole shader down. It will also cause other (potentially useful) data to be ‘evicted’ from the cache in order to make room for the surrounding texels that will never even be used, reducing the overall efficiency of the cache. It’s bad news all around, and that’s not to even mention the visual quality issues – tiny movements of the camera will cause completely different texels to be sampled, causing aliasing and sparkling.

This is where mipmapping comes to the rescue. When a texture fetch is issued, the GPU can analyze the texture coordinates being used at each pixel, determining when there is a large gap between texture accesses. Instead of incurring the costs of a cache miss for every texel, it instead accesses a lower mip of the texture that matches the resolution it’s looking for. This greatly increases the effectiveness of the cache, reducing memory bandwidth usage and the potential for a bandwidth-related bottleneck. Lower mips are also smaller and need less data to be transferred from memory, further reducing bandwidth usage. And finally, since mips are pre-filtered, their use also vastly reduces aliasing and sparkling. For all of these reasons, it’s almost always a good idea to use mipmaps – the advantages are definitely worth the extra memory usage.

A texture on two quads, one close to the camera and one much further away

The same texture with a corresponding mipmap chain, each mip being half the size of the previous one

Lastly, texture compression is an important way of reducing bandwidth and cache usage (in addition to the obvious memory savings from storing less texture data). Using BC (Block Compression, previously known as DXT compression), textures can be reduced to a quarter or even a sixth of their original size in exchange for a minor hit in quality. This is a significant reduction in the amount of data that needs to be transferred and processed, and most GPUs even keep the textures compressed in the cache, leaving more room to store other texture data and increasing overall cache efficiency.

All of the above information should lead to some obvious steps for reducing or eliminating bandwidth bottlenecks when it comes to texture optimization on the art side. Make sure the textures have mips and are compressed. Don’t use heavy 8x or 16x anisotropic filtering if 2x is enough, or even trilinear or bilinear if possible. Reduce texture resolution, particularly if the top-level mip is often displayed. Don’t use material features that cause texture accesses unless the feature is really needed. And make sure all the data being fetched is actually used – don’t sample four RGBA textures when you actually only need the data in the red channels of each; merge those four channels into a single texture and you’ve removed 75% of the bandwidth usage.

While textures are the primary users of memory bandwidth, they’re by no means the only ones. Mesh data (vertex and index buffers) also need to be loaded from memory. You’ll also notice in first GPU pipeline diagram that the final render target output is a write to memory. All these transfers usually share the same memory bandwidth.

In normal rendering these costs typically aren’t noticeable as the amount of data is relatively small compared to the texture data, but this isn’t always the case. Compared to regular draw calls, shadow passes behave quite differently and are much more likely to be bandwidth bound.

A frame from GTA V with shadow maps, courtesy of Adrian Courrèges' great frame analysis

This is because shadow maps are simply depth buffer that represent the distance from the light to the closest mesh, so most of the work that needs to be done for shadow rendering consists of transferring data to and from memory: fetch the vertex/index buffers, do some simple calculations to determine position, and then write the depth of the mesh to the shadow map. Most of the time, a pixel shader isn’t even executed because all the necessary depth information comes from just the vertex data. This leaves very little work to hide the overhead of all the memory transfers, and the likely bottleneck is that the shader just ends up waiting for memory transfers to complete. As a result, shadow passes are particularly sensitive to both vertex/triangle counts and shadow map resolution, as they directly affect the amount of bandwidth that is needed.

The last thing worth mentioning with regards to memory bandwidth is a special case – the Xbox. Both the Xbox 360 and Xbox One have a particular piece of memory embedded close to the GPU, called EDRAM on 360 and ESRAM on XB1. It’s a relatively small amount of memory (10MB on 360 and 32MB on XB1), but big enough to store a few render targets and maybe some frequently-used textures, and with a much higher bandwidth than regular system memory (aka DRAM). Just as important as the speed is the fact that this bandwidth uses a dedicated path, so doesn’t have to be shared with DRAM transfers. It adds complexity to the engine, but when used efficiently it can give some extra headroom in bandwidth-limited situations. As an artist you generally won’t have control over what goes into EDRAM/ESRAM, but it’s worth knowing of its existence when it comes to profiling. The 3D programming team can give you more details on its use in your particular engine.

And there's more...

As you’ve probably gathered by now, GPUs are complex pieces of hardware. When fed properly, they are capable of processing an enormous amount of data and performing billions of calculations every second. On the other hand, bad data and poor usage can slow them down to a crawl, having a devastating effect on the game’s framerate.

There are many more things that could be discussed or expanded upon, but what’s above is a good place to start for any technically-minded artist. Having an understanding of how the GPU works can help you produce art that not only looks great but also performs well… and better performance can let you improve your art even more, making the game look better too.

There’s a lot to take in here, but remember that your 3D programming team is always happy to sit down with you and discuss anything that needs more explanation – as am I in the comments section below!

Further Technical Reading

Render Hell – Simon Trümpler
Texture filtering: mipmaps – Shawn Hargreaves
Graphics Gems for Games – Findings from Avalanche Studios – Emil Persson
Triangulation – Emil Persson
How bad are small triangles on GPU and why? – Christophe Riccio
Game Art Tricks – Simon Trümpler
Optimizing the rendering of a particle system – Christer Ericson
Practical Texture Atlases – Ivan-Assen Ivanov
How GPUs Work – David Luebke & Greg Humphreys
Casual Introduction to Low-Level Graphics Programming – Stephanie Hurlburt
Counting Quads – Stephen Hill
Overdraw in Overdrive – Stephen Hill
Life of a triangle – NVIDIA’s logical pipeline – NVIDIA
From Shader Code to a Teraflop: How Shader Cores Work – Kayvon Fatahalian
A Trip Through the Graphics Pipeline (2011) – Fabian Giesen

Note: This article was originally published on fragmentbuffer.com, and is republished here with kind permission from the author Keith O'Conor. You can read more of Keith's writing on Twitter (@keithoconor).

↧

Latency Matters in Online Gaming

March 16, 2017, 7:18 pm

≫ Next: Bringing World War and Communist Propaganda Back to Life

≪ Previous: GPU Performance for Game Artists

Game streaming services have become hugely popular in recent years. These platforms allow streamers to broadcast themselves playing; viewers tune in to watch the stream, and to interact with the gamers as well as each other. Streamers even form communities of users who regularly watch their channels. Engagement is naturally high on these platforms—unlike many other markets, it’s common for users to spend three hours in a given session.

The biggest draw of game streaming services is this opportunity to interact with other streamers and viewers. All the major platforms feature a large window for viewing gameplay; picture-in-picture video of the streamer playing; and a chat window where all participants can interact. Viewers offer tips and suggestions to the streamer—for example, how to navigate a level or defeat an enemy—and talk to each other about the game, or simply about life.

What’s more, game streaming services are a perfect example of how the power in consumer markets has shifted from the supplier to the customer. While game trailers used to be the primary method by which customers decided if they wanted to buy a game, now they can watch someone else play the entire game from start to finish before making a purchase decision.
In an interview, Stu Grubbs, CEO of Infiniscene, an emerging game streaming service that delivers to Twitch, Beam, Hitbox and other platforms, noted the impact of latency to the viewer and broadcaster experience. He stated:

“The success of video game live streaming depends on authenticity. In most cases, the ability to interact with the audiences, while sharing their gameplay experience not only increases the authenticity of the broadcaster, but the viewer experience. As you increase stream’s latency, the lack of interactivity or ability to engage in near real-time immediately impacts the viewer’s perception of the game or quality of the streamer’s content. That’s why latency is critical for game streamers.”
</blockquote>

Since interactivity and real-time video are cornerstones of the user experience for game streaming platforms, low latency is especially important. To see how successfully this is being achieved, we tested the latency of six well-known game streaming services with publicly available platforms: Beam, DailyMotion, HitBox, Stream.me, Twitch, YouTube Gaming. Here are the results:

</center>

Beam.pro: Maximum Interactivity, Lowest Latency</h1>

The winner in our latency test is a startup that was recently acquired by Microsoft: Beam.pro. It tested as low as four seconds and as high as 10 seconds in end-to-end latency. TTFF clocked in at three to six seconds.
Like the other platforms we tested, Beam offers live-streaming gameplay with a real-time chat window for viewer and gamer interaction. However, Beam takes the interactive element a step further—it allows viewers to actually interact with the streamer’s gameplay in real time. Viewers can control key elements of the game itself, by creating bosses for the player to fight, giving them extra health, changing the surroundings, swapping out weapons and more.
Watching someone else play a game—especially if it’s one you’ve already played—is an inherently passive activity. For that reason, the degree of interactivity offered is a key differentiator for Beam. Along with access to high-quality, uninterrupted gameplay streams, this allows the platform to provide a unique, high-quality user experience.

“Massive communities are formed around games, and live streams provide a central place for those communities to come together and experience the game from another perspective,” says Beam CEO Matt Salsamendi.
</blockquote>

Unlike live sports events, Salsamendi explains, most viewers on game streaming platforms are regular players themselves. While your average sports fan wouldn’t tell a star quarterback how to pass, your average gamer has a wealth of tips and suggestions for the streamer—and is eager to participate. That’s what differentiates Beam—it actually gives viewers some control over the action.
This unique element is developer-supported. As Salsamendi describes, Beam offers a software development kit (SDK) that can be used to turn any game into an interactive experience with “less than 25 lines of code.” The company has also developed its proprietary “Faster-than-Light” (FTL) video protocol, which when used exclusively, can provide even lower latency results.
The SDK is used to populate buttons under the gameplay screen, which viewers can push to interact with the streamer’s game—for example, “Give Health” or “Change Weapon.” Added gamification elements include upgrades and enhanced interaction abilities that can be earned by chatting with users and controlling gameplay.

</center>

Many users in gaming forums compare Beam with the original game streaming service, Twitch. Both platforms started small, and are now owned by tech behemoths Microsoft (Beam) and Amazon (Twitch). Based on revenue, Twitch has the highest market share of any gaming platform worldwide (43 percent). While Twitch is still seen by many as the industry leader, it may want to start looking over its shoulder. Beam is an innovative disruptor that is winning the latency battle—it even won TechCrunch Disrupt NY in 2016.

Game-Streaming UX Relies on Low Latency</h1>
</center>

Of course, all of this interactivity wouldn’t be possible without very low latency. If there’s a 10-second delay or longer between the streamer completing an action and the viewers seeing it on the screen, by the time they react, it will already be too late. Without a user experience that’s as close to real time as possible, the ability to control the game is lost.
Low latency is crucial for all game streaming platforms, not just Beam. Any time a gamer is holding a live broadcast, real-time streaming is necessary for a high-quality user experience. If latency is high, streamers will lose viewership—causing revenue losses for both the streamers and the platforms themselves.

Streamers make money (and may even make a living) off paid viewer subscriptions. Viewers can also give monetary tips to their favorite gamers. Some platforms, such as Beam and YouTube Gaming, offer extra incentives to streamers when viewers watch without ad-blocking. Fewer subscriptions means less money in the streamer’s pocket—and fewer percentages of subscriptions, tips and ad revenue for the platform.

Indeed, delays of mere milliseconds can result in lost customers. Even two seconds of latency when loading a gaming transaction can cause abandonment rates of up to 87 percent. First-person shooter games suffer from just 100 milliseconds of latency, since action happens so quickly. Role-playing or turn-based games can afford slightly longer delays, but interactive capabilities may still be impacted.

And latency is only becoming more important in this market. Studies show the number of streamers and viewers is steadily increasing, as is the average amount of monetary tips from viewers to streamers (not counting streamers who have partnerships with gaming companies). While monetization data has only been gathered for Twitch, the gaming industry as a whole is growing, predicted at a rate of almost 5 percent per year through the year 2020—so these trends are likely to hold across platforms.

As the number of viewers grows, streamers will want increased opportunities to interact with them. The closer to real time these interactions are, the more likely those viewers will support streamers with tips and subscriptions. If Twitch wants to retain its status as a market leader, it must focus on decreasing latency. In game streaming, time is money—and both streamers and viewers are likely to abandon slower platforms for faster, more collaborative entrants such as Beam.

Big Market Share, Higher Latency</h1>
</center>

It should come as no surprise that the leaders and innovators aren't the industry giants. YouTube Gaming and Twitch are the current market leaders in game streaming services. While Twitch has been in the industry longer and has a higher market share by revenue, YouTube Gaming has the largest viewership of any platform—about half of all U.S. game viewers watch on YouTube Gaming, a figure that holds true around the world. To accommodate their massive audiences while maintaining stream quality, these sites have higher latency. YouTube Gaming clocked in at seven to 13 seconds of end-to-end latency; it scored lower in TTFF, at one to five seconds. Twitch scored from 10 to 12 seconds in end-to-end latency, and ranged from five to eight seconds in TTFF.

However, many viewers on these platforms watch pre-recorded streams—so a slightly longer end-to-end delay is more acceptable than it is for interactive streams. For pre-recorded video, TTFF is the more important metric, as viewers may skip over slow-loading videos quickly, and may even abandon the platform altogether if they experience too many loading delays.
Hitbox and DailyMotion bring up the rear in our latency tests. With their emphasis on eSports, it makes sense that latency is higher for these platforms. For a positive user experience when watching broadcast content, it’s more important that the stream is high-quality than that it loads immediately. Since viewers are tuning in to watch a specific event, even 20 seconds or more of latency shouldn’t pose a significant problem.
It should be noted that eSports is a growing subcategory within the game streaming segment. Both YouTube Gaming and Twitch are beginning to focus more heavily on broadcasting these events—in fact, eSports now makes up 21 percent of viewership on Twitch.

The Latency Conclusion</h1>

Low latency is critical in the game streaming services market. Users come to these platforms to experience live-streaming and pre-recorded gameplay, and to interact with streamers. Streamers too, make their money on paid viewer subscriptions and ad revenue. Games must load quickly, play without buffering and allow viewers to interact in as close as possible to real time in order to maintain viewership. Thus, it is vitally important that gaming platforms provide a high-quality, low-latency video experience for their communities.
Beam.pro has the lowest latency of all the platforms we tested. It has also differentiated itself in the market as the only platform that allows viewers to directly interact with the streamer’s gameplay in real time. The question remains: when time is money, how will the other platforms make up the gap and continue to innovate?

Note: This article was originally published on the Wowza Media Systems blog, and is reproduced here with kind permission from the author. You can reach Wowza Media Systems on Twitter @wowzamedia.

↧

Bringing World War and Communist Propaganda Back to Life

March 20, 2017, 5:01 pm

≫ Next: Making of an Arabic Teaching Game: Antura and The Letters

≪ Previous: Latency Matters in Online Gaming

The second World War was a devastating conflict that saw 21 to 25 million soldiers dead, but even regular citizens suffered. Despite relative safety of being hundreds miles away, many lived in constant fear of bombings, food shortages and curfews. The governments had a solution - propaganda!

[This article was originally published on Koobazaur's Blog]

Flyers and posters printed by the thousands encouraged people to stay motivated, conserve resources needed for the war effort, and stay vigilant of traitors. Karaski: What Goes Up... draws from this rich repository of public domain imagery, giving them a second life.

karaski-provisec-manifesto-and-original.

Lacking artistic skill and budget can lead to some creative solutions. Given the game's fictional 1920s setting of conflict, industrial revolution and social struggles, the propaganda posters of the era fit perfectly.

But Karaski does more than just re-use images; thanks to simple Photoshop manipulation, the graphics were altered to fit the new setting while channeling their evocative and imposing tone. In some cases, multiple graphics were combined and retrofitted into brand new pieces of artwork.

Even old photos of the era were dusted away, serving as clues and newspaper illustrations and giving the world an extra depth of authenticity. The photo in an article about a violent protest was actually taken in the aftermath of "Kristallnacht," an anti-Jewish riot in Nazi German. This reference was no accident.

The game takes place on the world's first Airship flying over Dunabe Commonwealth, a fictional federations of nations loosely inspired by the former USSR, Austro-Hungarian empire and even the European Union. Just as its real-world counterparts, Dunabe struggles with a lot of inner conflict. Not all nations are happy to be included, with terrorist-labelled political parties revolting against forced annexation. The pseudo-spiritual Preachers movement has been violently ostracized as well. And yet, the industrial revolution boom allowed Eidson company to spread the metallic claws of its railroads and reap monopolistic benefits.

These turmoils give each group a potential motive for trying to bring the titular airship down. Just as the first Graf Zeppelin branding Nazi swastika, Airship Karaski has become a political symbol of the Commonwealth's prowess and iron-fisted control over its nations. A crash would surely send a powerful message. Who is behind the mischievous plot, however, is mystery for the player to uncover (and influence!) through the game.

While Karaski takes places amidst political scheming, it is really a game about the passengers of the ship, each struggling with their own form of oppression and self-doubt. Big focus was put on showing what the daily life in the Commonwealth is like, where a simple board game is used to foster feelings of nationalism and a child's book designed to instill the values of a traditional, early 1900s family household.

Against this rich backdrop, the player will meet the stern Captain having doubts about dedicating his life to a political cause, an irritable Doctor torn between his love for science and call of religion, a lax Aristocrat lady struggling with and leveraging social gender norms, and even the very Architect of the ship questioning the emotional cost of building the flying behemoth. Numerous secondary characters will give further insight and raise questions.

Sneaking through four realistically designed decks, scavenging for clues in private suites and even enjoying a friendly drink and conversation, the player will learn and subtly change how the underpinning political and social drama affects each of the passengers personally, and what they are really hiding.

The game "Karaski: What Goes Up..." released for PC and is available from the Steam digital store.

Public Domain Images courtesy of Archives.gov and WikiMedia Commons

↧

Making of an Arabic Teaching Game: Antura and The Letters

March 20, 2017, 10:43 am

≫ Next: How to Implement Scoreboards in Godot with the GameJolt API

≪ Previous: Bringing World War and Communist Propaganda Back to Life

I wanted to share my eleven weeks work experience at VGWB (VideoGames Without Borders). During these weeks I developed a whole section of the game “Antura wa al huruf” (Antura and The Letters) under the supervision of Francesco Cavallari (founder of VGWB), and of Davide Barbieri (Lead programmer from GhostShark Games).

Background:

In September of 2016, a Norwegian Agency, Norad, started a contest to realize an educational game to provide a way for Syrian children to learn while having fun. An estimated 2.8 million Syrian children are out of school due to war. Since most of these children have access to a smartphone, a mobile App was a viable way to ship that goal.

The project timeline was roughly as follows:

The project had initially 78 different project proposals (Phase 0).
5 Projects were selected as semi-finalists (Phase 1).
3 Projects were choosen as finalists (Phase 2).
Feedback provided and request for improvements before winner election ( Phase 3).
I Joined the project during Phase2, at time of writing the article we are in Phase3 and waiting to hear who is the winning project

My role:

I joined the project after the beginning and started working on a new section. My assignment was the development of the Assessments section:

Antura’s is structured in a series of worlds/maps each one with a travel line: during travel, you play minigames to learn Arabic, and after some minigames sessions the player is required to do an Assessment. It is not an exam at all, its purpose is to reinforce learning by doing a series of exercises that uses the concepts learned in previous mini-games, it is possible to answer by trial and error, and the goal is to not stress the children too much.

How Antura looked after the first development rush.

As a newcomer

I spent the first few days looking over and learning from the documentation and the existing code; I did a quick draft of code to get an idea of the amount of work required and potential problems. The most important part (as always) is to understand the requirements, so I read the documentation several times, and printed my own copy and annotated it. Even if dependencies on preexisting code are minimal, understanding a pre-existing codebase helps a lot.

Requirements:

There was a total of 9 assessments types (later evolved to 13 assessments). Assessments had to look similar and have similar behavior, but at the implementation level, there are differences in how the logic is implemented. The graphics needed to be simple on purpose to not distract too many children from the assessment.
The first playable prototype was ready in one week, and I used it to get feedback from my supervisor. The key factor to development on the project was continuous feedback: this was a truly Agile experience, we had a development cycle that consisted in four days of rush programming and one day of testing/revisioning/bug-fixing. Requirements changed every two-three days (due to feedback from testers or from field testing).

Journey map

Things I got right.

Wrap the content provided by the Question Provider, the IQuestionProvider interface is a bit controversial: it was first enforced to be a “unique” interface among all games, but it was not sufficient to fulfill all different game logic; the workaround (used everywhere in the game) is to “decode” the content of the IQuestionBuilder. That piece was designed with a goal in mind, but testing and improvements later required different behavior. At that point in development, it was no longer possible change that without breaking stuff, so we delayed that issue to a future refactoring. In the Assessments the decoding of IQuestionProvider is done by IQuestionGenerator, so the IQuestionProvider is free to change without breaking Assessments, in fact even though there are a couple of different implementations for the IQuestionGenerator, the interface is used in the same way by all Assessments (no decoding).
The Assessments’ game loop is a Coroutine since the game is divided into well-defined phases, putting stuff inside a IEnumerator proved very useful. This is especially useful because it allows us to put animations where needed and wait for them to end without having to resort to state machines or complex flag systems. There were two major changes in the game flow, and I was able to seamlessly implement them without breaking stuff because of the usage of Coroutines. Certain behaviors are just much more simple to implement using Coroutines.

    private IEnumerator RoundsCoroutine( KoreCallback gameEndedCallback)
    {
        for (int round = 0; round < Configuration.NumberOfRounds; round++)
        {
            InitRound();
            yield return Koroutine.Nested( RoundBegin());
            yield return Koroutine.Nested( PlaceAnswers());

            if (round == 0)
                Koroutine.Run( DescriptionAudio());

            yield return Koroutine.Nested( GamePlay());
            yield return Koroutine.Nested( ClearRound());
        }

        gameEndedCallback();
    }

Use my own sound manager. All audio is loaded into a global audio manager (implemented by DeAudio by other developers). While it was a real pleasure just to call one method for playing a sound file (and have each single sound effect tweaked by an audio engineer), I had specific issues to be addressed for assessments (some clips were allowed to overlap, while other clips were not). The easiest way to ensure the correct playback for audio files was just to add another audio manager (on top of the old one). That allowed me to keep the correct behavior of each sound, without adding extra complexity (I still call one method to play each sound). And for certain sounds, I now have a meaningful name

    public void PlayPlaceSlot()
    {
        audioManager.PlaySound( Sfx.StarFlower);
    }

Things I got wrong

Use of a time manager. Since all games had to implement at least two IState interfaces (one initial game state and one final game state) and there was an “Update( float delta)” method to implement, I assumed that time scale was custom and I had to use that “delta” also for Tweens and Coroutines. I was wrong! The game was in reality setting Time.timeScale to 0 to pause the game, while the UI was animated with a unscaled delta time. It would have only taken a minute to check that myself. While in theory, the time manager idea is nice, because it decouples the time flow from Unity’s time scale, in reality, it was not used in other minigames, so it was pointless keeping it around. In fact, I later removed it during the polishing phase of the game.
A bit overengineered overall. I can understand the short deadline, but there are a few things that I would like to have written in slightly different ways (well code works, so no reason to change it now), also rethinking later to the game I think Assessments section was a good candidate for using an ECS pattern (which I learned recently, and I’m still experimenting with it) however since I was not too confident with ECS I decided to not take risky roads. During the development, I thought more than once “if I had used the ECS pattern, that would have solved that problem for me”. As a result of the extra complexity, I was able to cut a lot of code during polishing phase (For example, there were several Answers related classes that resolved a communication problem, later I merged them in just one Answer class. ) I’m still very happy with the rhythm I was able to keep and with the result. Also, I think I did pretty well since I joined the project in progress and did not have the time I wished to have.

Things I liked:

I worked with talented people, I had the chance to work with people I was following, and I continued to work together with some people I already knew. I learned a lot, and this was only my second team work experience (This time I had to collaborate tightly with some coders also). It is very surprising when you need stuff, and you find it was already implemented by someone else. I also really liked participating in some design decisions and helping refining some logic. It was a short but intense experience.

Things I didn’t like:

It was a relatively short experience, and I had a lot of fun, so it is a pity that has already ended. The game is already published on Play Store and AppStore, there are no major things to fix, and at least in my section (Assessments) there’s no more work to be done.

I’d like to work on this project a bit more.

Antura current look and feel of the assessments section.

How did I get the job?

I have to thank guys in GhostShark Games for that. I already worked with them for one year developing a couple of games, and I can’t wait to see what the next project will be. In the meanwhile, we worked together on Antura. I use the free time between each project to develop personal stuff and to refine my skills.

Does the game work?

Yeah, it works! In reality the game is aimed to teach Arabic to children that already speak and understand it, however I was able to learn most of the Arabic alphabet and dozen of words (even though I didn’t understand a single word of Arabic when I joined the project). Also, field testing showed children genuinely learned new things. As a personal note: I’m surprised by how elegant Arabic writing is!

Original version

This article has been reblogged from my new developer's blog.
Editorial note: some formatting and grammar changes have been made with permission of the original author to aid in reading of the article.

↧

How to Implement Scoreboards in Godot with the GameJolt API

March 29, 2017, 10:45 am

≫ Next: Is VR Really the Future of Gaming - or Just a Fad?

≪ Previous: Making of an Arabic Teaching Game: Antura and The Letters

GameJolt is not the largest gaming platform, nor is Godot the most popular editor. Despite this, a kind user known as "ackens" has created a plugin for Godot, which allows super easy integration with the GameJolt API.

The plugin can be downloaded at this link: https://github.com/ackens/-godot-gj-api

Since the Internet failed me for quite a while on how to install this plugin, I will enlighten the kind readers. Inside your project folder, (res://), you need to make a folder called "addons". Place the "gamejolt_api" folder inside that folder(res://addons/).

If you have done this correctly, you can return to the editor and press "Scene" in the top-left, down to "Project Settings" and select the "Plugins" tab. The API should show up as an entry called "Game Jolt API." Set its status on the right-hand side from "Inactive" to "Active," and you're good to go.

From here, there are a number of things to tackle. I'm going to be primarily explaining how to submit guest scores to a scoreboard since this is what I used the API for in my game, Super Displacement.

The assumptions that I will be making from here on are that:

You're using a scene that isn't your main gameplay loop to do this (though if you are, I'm sure this can be adjusted)
You can make a better UI for this than I can.
You have a given score to submit.

If all of these 3 apply, then we can get started.

Screenshot%2Bfrom%2B2017-03-29%2B17-05-5

Before you can do anything, you must add a GameJoltAPI node to your scene. Each API command you make will be a method on this node, i.e it will be of the form:

get_node("../GameJoltAPI").method(argument)

Screenshot%2Bfrom%2B2017-03-29%2B17-10-4

Before making any of these calls, it's important to set two properties of the node: your game's Private Key and its ID. These are used to identify to the GameJolt servers as to which game is being edited.

Both of these variables can be found by going to your game's page on GameJolt, clicking "Manage Game", "Game API" and selecting "API Settings" on the left hand side.

Screenshot%2Bfrom%2B2017-03-29%2B17-14-2

Once you have entered these values, you're ready to start making some API calls.

For Super Displacement, the need to log into GameJolt was never present, so I did not need to use .auth_user("token", "username"). Fortunately for me, the GameJolt API has a function called "add_score_for_guest". This - as the name would suggest - allows a score to be submitted as a guest, where the user never has to log in or input anything other than a display name. This makes things very easy.

Screenshot%2Bfrom%2B2017-03-29%2B17-23-4

I used a LineEdit object for the user to input their desired display name, and on pressing either the enter key(using the "text_entered" signal on the LineEdit) or the "Submit Score" button, the text in the LineEdit(get_node("../LineEdit").get_text()) is returned to a script which then submits the request.

However, that's not quite my implementation of it.

Screenshot%2Bfrom%2B2017-03-29%2B17-22-4

One. For some reason, either GameJolt or the implementation of it in Godot freaks out if there are spaces in the display name. This is a super simple fix, as the only way around this (beyond rejecting the user's input if they input a name with spaces) is to simply remove the spaces from the name, using:

guest_name.replace(" ", "")

This command quite simply moves through the string, replacing any instances of the space character with an empty string. In effect, this removes the spaces. "The Best" becomes "TheBest", etc.

Two. What if the user doesn't input a name? While this doesn't stop the request from happening(as far as I know), it may be helpful to put a stock username in its place. For this, I did a simple check:

if(guest_name == ""):
    get_node("../../GameJoltAPI").add_score_for_guest( .. , .. , "Guest"ÃÃÃÃÃÂ + str(randi()%10000000))

Though it makes my inner PEP8 fanatic weep, it does the job. If the user has not entered a name into the LineEdit, it generates a random string of 7 numbers and appends it to the word "Guest".

At this point(and probably a bit earlier) I should explain how this method works.

The first argument (called the "score" in the plugin's documentation on Github) is a string which is displayed on the "Scores" section of your game's page. This might be "23 Castles Destroyed", "381 Enemies Slain", or whatever quantifier you want to add. In my case, I simply set this to "str(latest_score)", since there isn't much of a quantifier beyond "Points" to add to an arcade score.

The second argument (called the "sort value") is an integer value which tells GameJolt how to order the scores in the table. Assuming you have the score table in "Ascending" mode - big means good - a higher sort value will mean a higher placement on the scoreboard. In my case, this is "int(latest_score)"(since latest_score is originally a float value).

After that, that's really all there is to it. If you wanted to add scores for a logged in user, you would have to .auth_user("token", "username") and then .add_score_for_user(visible_score, sort_value).

Displaying scores is also very simple, though it requires some playing with JSON files first.

Screenshot%2Bfrom%2B2017-03-29%2B17-51-2

Again, assuming you have a GameJolt API node in your scene, you're ready to make some API calls.

For my HighScores.tscn scene, I put a standalone call for 30 scores into the _ready(): function:

get_node("../../GameJoltAPI").fetch_scores("30")

Your immediate reaction might be confusion as to why this isn't being printed, assigned to a variable or anything else- it's because the plugin responds with a signal that carries the scores in a string. This signal is called "api_score_fetched(var scores)".

You might also be confused as to why the "30" is a string and not an integer and to be quite honest I have no idea, but it has to be a string for some reason.

Connect this signal to a node of your choice, and try to print(scores) - you'll get something that looks an awful lot like JSON. The explanation for this is that this is JSON, but it's encoded in a string, and we have to parse it into a dictionary.

Do something like this:

var scores_dictionary = {}
scores_dictionary.parse_json(scores)

This creates an empty dictionary, and parses "scores" into a nice dictionary format, where it can be indexed. There is a list of dictionaries contained at the ["response"]["scores"] indices.

I implemented the "table" in the screenshot above by creating 6 separate labels, for 2 sets of 3 labels. The first label consists of the numbers, which I added manually. This could very easily be automated, but that is an exercise left to the reader.

The second field consists of the names of the users who have submitted scores. This can be obtained by creating a variable named "names", iterating through "scores_dictionary" and concatenating each guest name to "names" along with a \n for the linebreak.

The code I used for this part is as follows:

var names = ""
var names2 = ""
for i in range(visible_scores.size()):
    if(i<15):
        names += visible_scores[i]["guest"] + "\n"
    elif(i<30):
        names2ÃÃÃÃÂ += visible_scores[i]["guest"] + "\n"
    get_node("../names").set_text(names)
    get_node("../names2").set_text(names2)

Assuming the line spacing and Y coordinate is the same, this will line up with the numbers.

The variable and node called "names2" is the second instance of each list of names, as shown in the screenshot above.

The exact same process can be used for the score, all you have to do is reference [i]["score"] instead of [i]["guest"].

If you have implemented these correctly, you should get a nice, basic scoreboard for further extension and development. Also, I'm sure there's something better than Labels to use for this kind of thing, but this technique can be adapted suitably.

If you have any further queries, you can leave a comment below and I am very likely to answer it. In any case, thanks for reading, and good luck!

If you want to download my game "Super Displacement", you can at the link below:

https://gamejolt.com/dashboard/games/244666

↧

Is VR Really the Future of Gaming - or Just a Fad?

March 29, 2017, 8:33 am

≫ Next: How to Make an iMessage Game (and why)

≪ Previous: How to Implement Scoreboards in Godot with the GameJolt API

You only need to watch the movies to know that predicting the future is probably a waste of time. In 2017, we don’t have hoverboards (Back to the Future II), time travel isn’t a thing (Timecop), and we haven’t colonized the moon yet (2001: A Space Odyssey). But still, VR is lauded as “the next big thing” in just about every industry there is – from healthcare, to marketing, and of course, gaming.

Admittedly, 2016 was a year of really significant change for the games industry – we saw the launch of Amazon’s Lumberyard, CryEngine became 100% free, and of course, the Sonic the Hedgehog franchise celebrated its 25th anniversary. Most notable, of course, was the fact that AR and VR really caught the attention of the mainstream thanks to the launch of Pokémon Go, Oculus Rift, and Playstation VR.

But is VR all it’s made up to be? And should games developers be investing in it? Packt asked three expert developers – Alan Thorn, Maciej Szczesnik, and John P. Doran, for their thoughts on whether Virtual Reality will ever become a reality.

Definitely

John P. Doran, a Lecturer at DigiPen Institute of Technology and author of Game Development Patterns & Best Practices, is certain that the future of gaming lies in VR and is even working on a couple of VR Projects at the moment.

“The introduction of virtual reality and augmented reality have been quite exciting to me over the past year or so,” he says. “I’ve already been working with both VR and AR applications and am very excited to see how we will go about building projects with them in the future.”

“I’m still not quite sure what version of virtual reality will be the “standard”, or even if it exists already, but we have already seen quite an impact with casual users.”

“Right now, the costs are prohibitive for most people to start playing with VR, but assuming that one of the headsets gets a “killer” app that everyone will want to play, prices will come down over time, and we will see more and more people developing for it.”

So why does Doran think VR will be such a success? The answer lies in VR’s counterpart in altering our world - Augmented Reality.

“Augmented reality (AR) games have been quite interesting to examine in the industry. In the case of Pokémon Go, I am fairly certain that it was the IP and not necessarily the gameplay that got so many people playing it, but given its success I am sure we will see games borrowing concepts from it in the future.”

The Undecided

Alan Thorn is the founder of Wax Lyrical Games, a Visiting Lecturer at the National Film and Television School and London South Bank University, and author of Mastering Unity 5.x. Like Doran, Thorn recognizes that the influx of interest in VR is interesting and holds serious amounts of potential.

“VR, photogrammetry, and the quest for photorealism are unquestionably changing the landscape”, he says. Perhaps surprisingly though, Thorn isn’t as quick as Doran to say absolutely whether VR will be the future of the games industry, even though he is currently working on VR projects himself. “I’ve already worked on VR projects, and I do think the future for VR is bright,” he adds. “However, it’s important to recognize that VR is but one medium, alongside other existing ones, which can tell great stories and support interesting mechanics.”

“Right now there is an intense focus on VR, both in the Unity and Unreal world, but whether this will remain the case for the next two years is an open question.”

Probably Not

Maciej Szczesnik, freelance developer, Lecturer of Game Design at Warsaw Film School, and author of Unity 5.x Animation Cookbook, is arguably more skeptical about VR’s place in gaming. While the growing popularity of VR games suggests that the technology could be about to become more affordable and accessible to more casual gamers, he believes there is one significant challenge to mass adoption of VR in gaming – the fact that it’s simply not practical or comfortable.

“Yes, we’re having a huge VR boom, and I do think that VR is the biggest change in game dev in recent years, but I also honestly think that people won’t use VR to relax after work or school,” he says.

“I make VR apps, but these are mostly business, marketing, and medical applications, not games.”

“In my opinion, VR will most probably end the same way as all those motion sensors or 3D TVs – it’s cool to use it once in a while, but can you play a VR game and drink your favorite beverage at the same time? Or would you like to put a small LED screen 10 cm in front of your eyes after a full day of work? Maybe I’m old-fashioned, but I still prefer my couch and console.”

↧

How to Make an iMessage Game (and why)

April 19, 2017, 5:16 am

≫ Next: How to Find the Right Tools for Your Game

≪ Previous: Is VR Really the Future of Gaming - or Just a Fad?

What’s the deal with iMessage games?

With iOS 10, developers are now creating games that are directly integrated with iMessage. This seamless, inherently social play style is viral by nature with built-in engagement and retention mechanics. If you’re not already building for the iMessage App Store, it’s a channel to watch.

In this post we’re going to start by outlining the technical setup for such a game. We’ll then explore how you can make the most of this relative new kid on the block, with comments from some thought-leaders to help answer the question: what makes a great game for iMessage?

How to create an iMessage game

iMessage Apps allow users to engage with your content without having to navigate outside of Messages. They can easily perform a range of actions, such as playing games, sending payments, and collaborating with friends within a custom interface that you design.

Download Xcode 8 and iOS 10

To get started building your iMessage application, you’ll need to download and install the latest versions of iOS and Xcode. Once you’ve completed this step, you’ll have all of the tools you need to build, test and submit your game to the App Store for iMessage.

The Messages Framework

Before starting your first Xcode project, you should research the Message Framework. This is a new framework introduced by Apple with iOS 10. It provides an interface between the Messages app and your iMessage application, which allows Messages find your app, launch it natively and provide your application with data.

The Messages Framework is essentially the structure which gives Messages the context your application needs to layout its UI so that your players can interact with it. The framework is built upon existing App Extension technology. iMessage apps are therefore a form of app extension.

It’s worth noting that even if you do not provide a containing application and your app runs exclusively within Messages, you still need to provide an icon for the application. This icon is used throughout iOS, for example in the system settings to show your application size. We go into detail about preparing Apple conformant icons below, in the preparing your app for submission section.

Learn the iMessages API library

The new Messages Framework API includes a selection of classes that you will need to use to build your game or app for iMessage. We’ve outlined the main control classes below. For a comprehensive reference guide, we recommend that you review the documentation from Apple in detail here.

MSMessagesAppViewController

The MSMessagesAppViewController class acts as the principal view controller for Messages extensions. You’ll need to call this class to manage your extension.

MSConversation

The MSConversation class represents a conversation in the Messages app. Use conversation objects to access information about the currently selected message or the conversation participants, or to send text, stickers, attachments, or message objects.

MSMessage

Use the MSMessage class to create interactive message objects. To create a message that can be updated by the conversation’s participants, instantiate a message with a session using the init(session:) method. Otherwise, instantiate the message using the init() method.

MSSession

This is a sub-class of MSMessage, used to create and update your messages that are associated with an object.

Prepare your iMessage game for submission

Optimize your design and images

When creating an interactive experience for iMessage, it’s important that your app design fits well within the context of a Messages conversation. Apple outline how to create the best possible user interface for your app. Some of the take-away points are:

Make it intuitive and easy to navigate
Focus each iMessage extension to one purpose
Encourage back-and-forth participation
Design for the different Messages views

Apple also provide extensive resources to help you get started with your graphics. You can download design templates for app icons, home screen icons and typefaces on the Apple developer site.

Submit your iMessage game for review

Before submitting your game for review, you should thoroughly test it and make sure it adheres to the App Review Guidelines. Here are some of the key points that you should keep in mind to increase your chance of an accepted app submission:

Make sure your game content is family friendly
Only quality apps are accepted – don’t rush your submission
Don’t try to manipulate the review system
Don’t replicate another developer’s work

For more information about getting your game ready for submission, check out our article on App Store Optimization (ASO).

How to make a great iMessage game

Now that we’ve outlined the technical details, on to the more pressing matter: how to make a successful iMessage game. To begin, we’ve outlined what we consider to be the 4 essentials of a game built exclusively for the iMessage App Store.

1. Asynchronous play

In order to work, the play style of iMessage games must be asynchronous (or “turn-based”). Players cannot perform actions simultaneously. This is a fundamental concept of iMessage games. Read this article by GamaSutra for a detailed explanation of asynchronicity in game design.

With this consideration in mind, you should create a game that’s able to withstand long pauses without the player losing interest. To improve flow between pauses, consider including a replay feature so that players can easily pick up where they left off and make sense of the their opponent’s actions.

2. Social

A game built directly into iMessage should have a strong social focus. Sure, this is somewhat of a given for an app built right into iMessage, but there are many ways that you can creatively leverage this environment to increase social interactions. And, ultimately, increased social interaction is the key to long term player retention.

The game should encourage conversation between moves, which augment the gameplay and entice players back into your game. The best way to maximize this ‘social stickiness’ is to create a game which is either socially competitive or socially collaborative in nature.

3. Socially Competitive

If you take a look on the iMessage App Store, you’ll see that social competitive games are currently the most common genre available. If you’ve just started out creating a messenger game, a competitive-type is probably the way to go at first. With social competitive games you can create a simple, PVP experience that fits well into the format and asynchronous constraints of a Messenger App, helping you to build a better understanding of this new channel before creating more complex, collaborative games.

Competitive games should always revolve around a balance of luck and skill. Usually skill is higher in competitive games because it creates a drama between the two players, as well as an ongoing ranking table. It’s very important that these games have a method of measuring rank. In essence, the competition will boil down to a leaderboard. Showing people moving up or down this leaderboard based on their in-game achievements is what drives competition. To enhance the competitive elements further, 1st prize should be rewarded, whereas losses should be punished. This drives engagement in all aspects of the game to higher levels.

4. Socially Collaborative

Currently, there isn’t a huge amount of choice for social collaborative games on the iMessage App Store. We expect this to change pretty rapidly. Collaborative games have the greatest potential to engage groups of players, but they’re also the most complex games to build – often much more intricate than a competitive, turn-based PVP.

In a collaborative environment its very important NOT to use a leaderboard or any form of score measurement. The better way to reward people is via any action to earn points to unlock more fun / features. When collaborating the aim is to encourage people to explore how working together can achieve even more interesting results. The sheer act of taking part in the game itself is the fun part, so continual rewards in every section of the game is key. A clear map of what actions or XP is required to unlock elements in the game will drive collaborative games to higher engagement levels.

3 Games exclusive to iMessage

GamePigeon

With its range of excellent PVP games, GamePigeon brings a new life to your iMessage. It includes well-known games like 8-Ball Pool, Connect4 and Battleships.

Cobi Hoops

In Cobi Hoops, you’re given 30 seconds to score as many baskets as you can. You can then send your result to a friend for them to rise to your challenge.

Cobi Hoops is developed by Cobra Mobile. We reached out their CEO, Mark Ettle, with some questions about developing for iMessage:

Q: What makes a good messenger game?
Something quick, challenging and easy to pick up and play/learn in 5 seconds.

Q: What are the difficulties?
Apart from discoverability, the difficult part is making something that has the core game loop just right – but that applies to all games!

Q: How are you planning to innovate?
We are going to create more focused titles that really use the power of messages.

Q: Why you think it’s an exciting channel?
It’s exciting because it’s the small “single trick” games that will work really, really well. It’s the perfect PvP environment. We all like to challenge our friends, don’t we? Messages allow you to do that perfectly.

Q: Any other thoughts?
I think messenger games are going to be huge. iOS have got is so right and I’m interested to see how others will follow

Let’s Hang

Play ‘Hang Man’ directly in iMessage with Let’s Hang. Just initiate a game within your conversation, write a word and start playing.

Why care about messenger games?

It’s a proven format

Hybrid messaging applications are tried and tested in the Asian markets, with Line Corp. (makers of LINE) and Tencent (makers of WeChat) leading in Japan and China respectively. If you’re wondering how successful the hybrid messaging model could be – they’re both multi-billion dollar companies. As of the third quarter of 2016 LINE reported more than 220 million MAU, whilst WeChat broke the 700 million mark in Q1.

If you’re interested in player behaviour in Asia, check out one of our previous posts: Examining the in-game behaviour of players from China

An emerging games ecosystem

Clearly, tech giants in the West have taken note. Facebook has recently announced its Instant Games offering, which brings third party content to players directly within Facebook Messenger. Apple also seems to have followed suit and softened its stance on third party content with iOS 10. Prior to iOS 1o, the predominant method of distribution for third party content across the Apple ecosystem was through the rigid constraints of the App Store. No longer.

Mojiworks

Messenger gaming now appears to be gaining significant traction in the West. Most recently, UK-based mobile app startup Mojiworks has entered the space as the world’s first developer dedicated to producing games for Apple’s iMessage.

Mojiworks announced this formation at the 2016 Slush conference in Helsinki, stating that their mission is to turn iMessage into a social gaming platform. We reached out to Mojiworks CEO, Matthew Wiggins, to get his insight into the growing arena of iMessage games.

Q: What makes a good messenger game?
It’s early days so there is a lot to learn, but we are looking at mechanics that are inherently social and benefit the closer relationships that message groups tend to have. We also see support for short session times important.

Q: What are the difficulties?
With any new channel there is much to discover and learn (both for the platform holders and us – the publishers) to make sure that players are attracted.

Q: How are you planning to innovate?
On the creative side, we have some particular social gameplay models in mind that will bring new depth to playing with friends on mobile. On the business side, we see tons of potential in viral and word-of-mouth growth mechanics.

Q: Why you think it’s an exciting channel?
iMessage and Facebook Instant Games are young app platforms that are rapidly evolving and have no incumbent leaders. That makes them a great place for startups to step in and create new experiences for players to love that can result in significant business scale.

Q: Any other thoughts?
We’re building MojiWorks to be the leading developer and publisher of iMessage and chat platform games as we believe it’s as big an opportunity as the original App Store or FB app platforms were. It’s both a fascinating creative space to work in, and the market is open and welcoming of newcomers.

Final note

Clearly, iMessage games are not to be overlooked. The element of social interaction makes messenger games naturally ‘sticky’ for increased engagement and generally higher DAU. If you’re not already building an iMessage App, you should be.

Note: This article was originally published on the GameAnalytics blog, and is kindly republished here with the author's permission.

↧

How to Find the Right Tools for Your Game

April 26, 2017, 3:33 pm

≫ Next: Physical, Casual Gaming for Mixed and Virtual Reality

≪ Previous: How to Make an iMessage Game (and why)

GDC 2017 at the end of February confirmed what has become obvious in game development: things change quickly. We saw the release of Unity 5.6 at the end of March, and VR is set to redefine the way we create and experience games. With so many creatively-named headsets slated for release over the coming months, there will be plenty to keep tech reviewers happy.

So what does the landscape of Games Dev in 2017 looks like? And what do new developers need to know? What tools should they be learning? And how do they know when to try new ones?

Packt asked three expert developers – Alan Thorn, founder of Wax Lyrical Games, Visiting Lecturer at the National Film and Television School and London South Bank University, and author of Mastering Unity 5.x, Maciej Szczesnik, freelance developer, Lecturer of Game Design at Warsaw Film School, and author of Unity 5.x Animation Cookbook, and John P. Doran, a Lecturer at DigiPen Institute of Technology and author of Game Development Patterns & Best Practices – for their thoughts on 3 things young developers need to know.

What are the most important tools for budding games developers to learn?

According to Szczesnik and Thorn, Unity and Blender in particular are the need to-know tools for any Game Developer.

As an independent developer, Szczesnik is always looking to optimize his workflow because, as he says, “faster iterations mean that you can basically do more”. Szczesnik mainly uses Unity, Blender and Substance Painter for his work as the three combined give him a “good, and relatively inexpensive, base” for game development. What budding games developers should master, however, depends entirely on what they want to specialize in.

He says, “Unity is my game engine of choice – it’s super friendly for the developer and gives a lot of freedom. If you want to be an indie game developer, you should choose your favorite engine and start by learning that. 3D tools are also essential if you’re planning to create 3D games.”

“I think Blender is a great tool for indie game developers - it’s free and quite powerful in modeling, sculpting and animation. There’s lots of other 3D packages and sculpting tools but I prefer Blender because you can stay in one package while performing all those 3D tasks. Substance Painter is also the fastest way to texture your model if you want to use PBR materials.”

“My three most-frequently used tools right now are Unity, Blender and Photoshop”, says Thorn. “I love each for their unique power and versatility. Unity makes game development highly accessible to teams and individuals, and Blender has such a vast array of features that it does nearly everything. Photoshop is there to support critical image edits, which are always needed. But, most powerful of all, is how these tools work together in a practically seamless way.”

“Newcomers face many different tools, all offering the promise of making development simpler. The mistake is to try learning them all at once. Focus instead on just one or two related tools (like Unity and Blender) and to become masterful with them. Achieving this makes the translation to other tools smoother.”

Doran uses a whole host of tools when he’s creating games. He says, “The game industry is a very fast moving industry. In order to stay competitive you need to explore new things being introduced. I have already been working with both VR and AR applications in my work before and I am very excited in seeing how to build projects with them in the future. After I finish my current projects, of course.

For creating assets, Adobe Creative Cloud is his go to, while Visual Studio 2015 and Sublime Text 2 are his main tools for writing code. But the skill developers really need to learn in his opinion? Microsoft Office.

“When I’m making games, I tend to change what tools I use based on the role I’m undertaking,” says Doran. “As a designer, being able to work on documentation is a must. I use Microsoft Office, specifically Word and Excel, a lot, and they tend to take up a lot more time than people would think when creating design documents.”

Games Dev is a constantly changing industry. How do you choose the right platforms and tools for developing your first games?

The launch of Amazon Lumberyard and changes made to CryEngine and Unreal suggests that Unity is arguably the biggest platform for game development at the moment – but that doesn’t necessarily mean it’s right for every developer. If you’re just finding your feet in terms of what tools work for you, it can be pretty tricky to know where to begin. So how should you go about choosing the right tools for your project?

“Unity is particularly suitable for indie games and small studios because of its huge community and the Asset Store,” says Szczesnik. “I think that the true choice is between Unity and Unreal, though. CryEngine is cool, but not a lot of people use it, and I personally don’t know a single person using Lumberyard. This may change in near future, but it will require much more than making an engine free. Most people don’t want to learn new tools unless those tools allow them to create even better games and experiences.”

“Developers now have many options available for a game engine, and most of them are free,” adds Thorn. “The share and balance between engines could easily change over the new few years, fluctuating from one engine to another, but what matters most is that your chosen engine is the right choice for your project. When you’re working, ask yourself: can this software do what I need effectively, efficiently and easily?”

“Having said that, every application has limitations,” he continues. “Programs like Unity or Unreal, for example, can only run on specific operating systems and versions, on specific hardware, and different versions of the software support specific features and third party add-ons. These limitations can affect how and when you can use the tools. Nevertheless, becoming aware of those limitations is the first step to empowerment, because you can devise clever strategies for working within them to achieve what you need.”

Szczesnik echoes this sentiment, adding “I always try to optimize my workflow, so if there’s a tool that I can use to speed up a process, I most probably will. Sometimes it requires one myself. Even if the tool I create is buggy, and used for one sole purpose, it pays off in most cases.”

“Choosing the right platform/tools for your project is something that's very important, as you'll be spending a lot of time using them,” adds Doran. “That's why it's always a good idea to keep informed on the newest trends and things being used. But remember to keep in mind if you're doing something new how long it'll delay your project's development. If your main goal is just learning that's one thing, but there's only a certain number of hours you'll be able to work on your project. It's up to you to decide how it's best used.

“In terms of what platforms you could be consider, while I certainly think Unity has kind of taken the lead in recent years when it comes to indie development in small teams, I would say Unreal Engine is still quite a great engine to work in,” says Doran. “It has a soft spot in my heart because my first job in the industry was working with it. Unity and Unreal have an advantage in that there are so many other people using it. Because of this there is a lot of resources out there to learn from.”

“With CryEngine and Lumberyard, however, the reference materials are somewhat lacking in comparison for small or single person teams. Not too big of a problem if you’re a seasoned developer, but it would not be a good place for a beginner to start.”

↧

Physical, Casual Gaming for Mixed and Virtual Reality

March 20, 2017, 6:17 am

≫ Next: Critical errors in CryEngine V code

≪ Previous: How to Find the Right Tools for Your Game

Since the rise of mobile gaming, ‘casual games’ have claimed a large chunk of attention and funding from marketing and game development circles. It is generally agreed upon that short play sessions are a key component of casual games. The idea is that players are able to quickly play a complete session while commuting to work, waiting for an engagement or simply killing time. As a result of this transitory nature, they often have a very simple learning curve that allows players to quickly figure out how to play.

Historically, casual gaming has its foundations in simple PC games such as Minesweeper and Solitaire but became a true market force with the rise of mobile gaming. The rise of new gaming technologies — principally virtual and mixed reality — are poised to usher in a new dimension of casual gaming.

Physically Dynamic Technology

By and large, video gaming has been a sedentary experience. The popularity of games such as Dance Dance Revolution in the early noughties and the recent success of the Wii have shown, however, that there is a significant demand for more physically dynamic forms of video gaming. In this sense, “physically interactive” means games in which the player must physically move in order to complete the game; this definition runs the gamut from high-pace runners to more thoughtful spatial puzzles.

HTC Vive

The three major virtual reality devices on the market now are HTC Vive, Oculus Rift, and PlayStation VR. Though each has its own distinct advantages, for the purposes of movement, HTC Vive is so far leading the pack. This is due to the fact that it ships with two “Lighthouses”, which are motion capturing devices that allow for room-scale virtual reality. The result is if the player moves in real life, then their avatar in the games moves with them. This, in turn, allows programmers to develop games that incorporate the player’s movements as key components of the game.

Runners: Origami Race

HTC Vive has proven to be fertile ground for runners, one of the staples of the casual genre. Simply put, runners are games in which the player moves forward on a track and attempts to avoid obstacles and barriers. On traditional mobile platforms, the user either swipes or tilts the device in order to move horizontally. With games like Origami Race from Game-Ace, a custom game development company, the avatar is controlled by the player’s physical movements. So this means if the plane needs to dodge a barrier, the player must actually leap to the left. If the player wants to shoot down enemy craft, he/she must dodge and aim with the HTC controller.

Puzzles: Fantastic Contraption

Whereas runners offer dynamic bordering on athletic immersion, games like Fantastic Contraption are providing a slower pace but equally physical interactive casual gaming experience. The game, which operates like an elaborate Rube Goldberg machine, is about trying to get a pink ball into a specific area. In order to do so, the player must walk around the room-scale virtual space and engineer a solution. This means crawling under it, hammering on details and simply getting your hands virtually dirty.

For many users, virtual reality is a relatively new commodity with plenty of “wow” factor. As such, if someone has one, all their friends want to try it. Both of these offer the sort of “pass it around play” that is ideally suited for casual games in which players can quickly pick up the game and intuitively navigate it.

HoloLens

Microsoft HoloLens is currently the stand-out mixed reality device on the radar. Whereas virtual reality gives the user 360° of purely virtual/photographic content, mixed reality meshes virtual and physical reality. It does this by overlaying the physical world with virtual elements that realistically interact with the physical components. In effect, if there were a virtual ball and the user placed it on a sloped table, then the virtual ball would roll off the table and then bounce on the floor until eventually resting in place.

In the sphere of casual gaming, the HoloLens is already proving to be a fun and intuitive platform for users that are curious about emerging technologies.

Shooters: RoboRaid

RoboRaid showcases some of the HoloLens main functions while also offering simple and surprisingly fun gameplay. The device first scans the room, making note of the surfaces, and then launches a robot invasion. The robots themselves burst through the walls. The player must simultaneously dodge their fire — by physically evading the shots — while returning fire as they whirl around the room. The game, in truth, feels like an introduction to HoloLens potential but offers an exciting sketch of what the platform could provide for casual gamers.

Platformers: Young Conker

Platformers, like Super Mario Bros., traditionally revolved around an avatar traversing a horizontal environment by jumping between ‘platforms’. HoloLens allows for a re-conception of this strategy by turning your living room — or anywhere you happen to be — into a game level. By using the spatial mapping function, Young Conker is able to perceive the edges of tables and the contours of furniture to create infinitely unique, new levels out the world around you. And thanks to the extremely quick learning curve of platformers, people unfamiliar with the genre or HoloLens can easily jump in and play, a hallmark of casual games.

What the Future Holds

Trends in the industry suggest that sales, usage and investment in virtual reality and mixed reality are set to continue to grow unabated for the next several years. As mobile gaming proved in the last several years, casual games are as economically viable, if not more, than their hardcore, AAA relatives. Combining this with the physical interactivity offered by the emerging platforms is an exciting prospect for game developers and gamers alike.

↧

Critical errors in CryEngine V code

April 4, 2017, 2:25 am

≫ Next: How I created music for a videogame

≪ Previous: Physical, Casual Gaming for Mixed and Virtual Reality

Introduction

CryEngine is a game engine created by the German company Crytek in the year 2002, and originally used in the first-person shooter Far Cry. There are a lot of great games made on the basis of different versions of CryEngine, by many studios who have licensed the engine: Far Cry, Crysis, Entropia Universe, Blue Mars, Warface, Homefront: The Revolution, Sniper: Ghost Warrior, Armored Warfare, Evolve and many others. In March 2016 the Crytek company announced the release of the new CryEngine V, and soon after, posted the source code on GitHub.

To perform the source code analysis, we used the PVS-Studio for Linux. Now, it has become even more convenient for the developers of cross-platform projects to track the quality of their code, with one static analysis tool. The Linux version can be downloaded as an archive, or a package for a package manager. You can set up the installation and update for the majority of distributions, using our repository.

This article only covers the general analysis warnings, and only the "High" certainty level (there are also "Medium" and "Low"). To be honest, I didn't even examine all of the "High" level warnings, because there was already enough material for an article after even a quick look. I started working on the article several times over a period of a few months, so I can say with certainty that the bugs described here have living in the code for some months already. Some of the bugs that had been found during the previous check of the project, also weren't fixed.

It was very easy to download and check the source code in Linux. Here is a list of all necessary commands:

mkdir ~/projects && cd ~/projects
git clone https://github.com/CRYTEK/CRYENGINE.git
cd CRYENGINE/
git checkout main
chmod +x ./download_sdks.py
./download_sdks.py
pvs-studio-analyzer trace -- \
  sh ./cry_waf.sh build_linux_x64_clang_profile -p gamesdk
pvs-studio-analyzer analyze \
  -l /path/to/PVS-Studio.lic \
  -o ~/projects/CRYENGINE/cryengine.log \
  -r ~/projects/CRYENGINE/ \
  -C clang++-3.8 -C clang-3.8 \
  -e ~/projects/CRYENGINE/Code/SDKs \
  -j4

plog-converter -a GA:1,2 -t tasklist \
  -o ~/projects/CRYENGINE/cryengine_ga.tasks \
  ~/projects/CRYENGINE/cryengine.log

The report file cryengine_ga.tasks can be opened and viewed in QtCreator. What did we manage to find in the source code of CryEngine V?

A strange Active() function

V501 There are identical sub-expressions to the left and to the right of the '==' operator: bActive == bActive LightEntity.h 124

void SetActive(bool bActive)
{
  if (bActive == bActive)
    return;

  m_bActive = bActive;
  OnResetState();
}

The function does nothing because of a typo. It seems to me that if there was a contest, "Super Typo", this code fragment would definitely take first place. I think this error has every chance to get into the section, "C/C++ bugs of the month".

But that's not all, here is a function from another class:

V501 There are identical sub-expressions 'm_staticObjects' to the left and to the right of the '||' operator. FeatureCollision.h 66

class CFeatureCollision : public CParticleFeature
{
public:
  CRY_PFX2_DECLARE_FEATURE

public:
  CFeatureCollision();
  ....

  bool  IsActive() const  { return m_terrain ||
m_staticObjects ||
m_staticObjects; }
  ....
  bool m_terrain;
  bool m_staticObjects;
  bool m_dynamicObjects;
};

The variable m_staticObjects is used twice in the function IsActive(), although there is an unused variable m_dynamicObjects. Perhaps, it was this variable that was meant to be used.

Code above has no bugs

V547 Expression 'outArrIndices[i] < 0' is always false. Unsigned type value is never < 0. CGFLoader.cpp 881

static bool CompactBoneVertices(....,
  DynArray<uint16>& outArrIndices, ....)           // <= uint16
{
  ....
  outArrIndices.resize(3 * inFaceCount, -1);

  int outVertexCount = 0;
  for (int i = 0; i < verts.size(); ++i)
  {
    ....
    outArrIndices[....] = outVertexCount - 1;
  }

  // Making sure that the code above has no bugs   // <= LOL
  for (int i = 0; i < outArrIndices.size(); ++i)
  {
    if (outArrIndices[i] < 0)                      // <= LOL
    {
      return false;
    }
  }

  return true;
}

This error is worthy of a separate section. In general, in the CryEngine code, there are a lot of fragments where unsigned variables are pointlessly compared with zero. There are hundreds of such places, but this fragment deserves special attention, because the code was written deliberately.

So, there is an array of unsigned numbers - outArrIndices. Then the array is filled according to some algorithm. After that we see a brilliant check of every array element, so that none of them has a negative number. The array elements have the uint16 type.

Memory handling errors

V512 A call of the 'memcpy' function will lead to underflow of the buffer 'hashableData'. GeomCacheRenderNode.cpp 285

void CGeomCacheRenderNode::Render(....)
{
  ....
  CREGeomCache* pCREGeomCache = iter->second.m_pRenderElement;
  ....
  uint8 hashableData[] =
  {
    0, 0, 0, 0, 0, 0, 0, 0,
    (uint8)std::distance(pCREGeomCache->....->begin(), &meshData),
    (uint8)std::distance(meshData....->....begin(), &chunk),
    (uint8)std::distance(meshData.m_instances.begin(), &instance)
  };

  memcpy(hashableData, pCREGeomCache, sizeof(pCREGeomCache));
  ....
}

Pay attention to the arguments of the memcpy() function. The programmer plans to copy the object pCREGeomCache to the array hashableData, but he accidentally copies not the size of the object, but the size of the pointer using the sizeof operator. Due to the error, the object is not copied completely, only 4 or 8 bytes.

V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'this' class object. ClipVolumeManager.cpp 145

void
CClipVolumeManager::GetMemoryUsage(class ICrySizer* pSizer) const
{
  pSizer->AddObject(this, sizeof(this));
  for (size_t i = 0; i < m_ClipVolumes.size(); ++i)
    pSizer->AddObject(m_ClipVolumes[i].m_pVolume);
}

A similar mistake was made when the programmer evaluated the size of this pointer instead of the size of a class. Correct variant: sizeof(*this).

V530 The return value of function 'release' is required to be utilized. ClipVolumes.cpp 492

vector<unique_ptr<CFullscreenPass>> m_jitteredDepthPassArray;

void CClipVolumesStage::PrepareVolumetricFog()
{
  ....
  for (int32 i = 0; i < m_jitteredDepthPassArray.size(); ++i)
  {
    m_jitteredDepthPassArray[i].release();
  }

  m_jitteredDepthPassArray.resize(depth);

  for (int32 i = 0; i < depth; ++i)
  {
    m_jitteredDepthPassArray[i] = CryMakeUnique<....>();
    m_jitteredDepthPassArray[i]->SetViewport(viewport);
    m_jitteredDepthPassArray[i]->SetFlags(....);
  }
  ....
}

If we look at the documentation for the class std::unique_ptr, the release() function should be used as follows:

std::unique_ptr<Foo> up(new Foo());
Foo* fp = up.release();
delete fp;

Most likely, it was meant to use the reset() function instead of the release() one.

V549 The first argument of 'memcpy' function is equal to the second argument. ObjectsTree_Serialize.cpp 1135

void COctreeNode::LoadSingleObject(....)
{
  ....
  float* pAuxDataDst = pObj->GetAuxSerializationDataPtr(....);
  const float* pAuxDataSrc = StepData<float>(....);
  memcpy(pAuxDataDst, pAuxDataDst, min(....) * sizeof(float));
  ....
}

It was forgotten, to pass pAuxDataSrc to the memcpy() function. Instead of this, the same variable pAuxDataDst is used as both source and destination. No one is immune to errors.

By the way, those who are willing, may test their programming skills and attentiveness, by doing a quiz on the detection of similar bugs: q.viva64.com.

Strange code

V501 There are identical sub-expressions to the left and to the right of the '||' operator: val == 0 || val == - 0 XMLCPB_AttrWriter.cpp 363

void CAttrWriter::PackFloatInSemiConstType(float val, ....)
{
  uint32 type = PFSC_VAL;

  if (val == 0 || val == -0)  // <=
    type = PFSC_0;
  else if (val == 1)
    type = PFSC_1;
  else if (val == -1)
    type = PFSC_N1;

  ....
}

The developers planned to compare a real val variable with a positive zero and with a negative zero, but did this incorrectly. The values of zeros became the same after the integer constants were declared.

Most likely, the code should be corrected in the following way, by declaring real-type constants:

if (val == 0.0f || val == -0.0f)
    type = PFSC_0;

On the other hand, the conditional expression is redundant, as it is enough to compare the variable with a usual zero. This is why the code is executed in the way the programmer expected.

But, if it is necessary to identify the negative zero, then it would be more correct to do it with the std::signbit function.

V501 There are identical sub-expressions 'm_joints[i].limits[1][j]' to the left and to the right of the '-' operator. articulatedentity.cpp 1326

int CArticulatedEntity::Step(float time_interval)
{
  ....
  for (j=0;j<3;j++) if (!(m_joints[i].flags & angle0_locked<<j)&&
    isneg(m_joints[i].limits[0][j]-m_joints[i].qext[j]) +
    isneg(m_joints[i].qext[j]-m_joints[i].limits[1][j]) +
    isneg(m_joints[i].limits[1][j]-m_joints[i].limits[1][j]) < 2)
  {
    ....
}

In the last part of the conditional expression there is subtraction of the variable m_joints[i].limits[1][j] from itself. The code looks suspicious. There are a lot of indexes in the expression, one of them probably has an error.
One more similar fragment:

V501 There are identical sub-expressions 'm_joints[op[1]].limits[1][i]' to the left and to the right of the '-' operator. articulatedentity.cpp 513

V590 Consider inspecting this expression. The expression is excessive or contains a misprint. GoalOp_Crysis2.cpp 3779

void COPCrysis2FlightFireWeapons::ParseParam(....)
{
  ....
  bool paused;
  value.GetValue(paused);

  if (paused && (m_State != eFP_PAUSED) &&
(m_State != eFP_PAUSED_OVERRIDE))
  {
    m_NextState = m_State;
    m_State = eFP_PAUSED;
    m_PausedTime = 0.0f;
    m_PauseOverrideTime = 0.0f;
  }
  else if (!paused && (m_State == eFP_PAUSED) &&        // <=
(m_State != eFP_PAUSED_OVERRIDE)) // <=
  {
    m_State = m_NextState;
    m_NextState = eFP_STOP;

    m_PausedTime = 0.0f;
    m_PauseOverrideTime = 0.0f;
  }
  ....
}

A conditional expression is written in such a way that the result does not depend on the subexpression m_State != eFP_PAUSED_OVERRIDE. But is it really worth speaking about here if this code fragment is still not fixed after the first article?

In case it is interesting, I have already described the same kind of errors in the article "Logical Expressions in C/C++. Mistakes Made by Professionals".

V529 Odd semicolon ';' after 'for' operator. boolean3d.cpp 1077

int CTriMesh::Slice(...)
{
  ....
  pmd->pMesh[0]=pmd->pMesh[1] = this;  AddRef();AddRef();
  for(pmd0=m_pMeshUpdate; pmd0->next; pmd0=pmd0->next); // <=
    pmd0->next = pmd;
  ....
}

One more code fragment that remained uncorrected since the last project check. But it is still unclear if this is a formatting error, or a mistake in logic.

About pointers

V522 Dereferencing of the null pointer 'pCEntity' might take place. BreakableManager.cpp 2396

int CBreakableManager::HandlePhysics_UpdateMeshEvent(....)
{
  CEntity* pCEntity = 0;
  ....
  if (pmu && pSrcStatObj && GetSurfaceType(pSrcStatObj))
  {
    ....
    if (pEffect)
    {
      ....
      if (normal.len2() > 0)
        pEffect->Spawn(true, pCEntity->GetSlotWorldTM(...); // <=
    }
  }

  ....

  if (iForeignData == PHYS_FOREIGN_ID_ENTITY)
  {
    pCEntity = (CEntity*)pForeignData;
    if (!pCEntity || !pCEntity->GetPhysicalProxy())
      return 1;
  }
  ....
}

The analyzer detected null pointer dereference. The code of the function is written or refactored in such a way that there is now a branch of code, where the pointer pCEntity will be, initialized by a zero.
Now let's have a look at the variant of a potential dereference of a null pointer.

V595 The 'pTrack' pointer was utilized before it was verified against nullptr. Check lines: 60, 61. AudioNode.cpp 60

void CAudioNode::Animate(SAnimContext& animContext)
{
  ....
  const bool bMuted = gEnv->IsEditor() && (pTrack->GetFlags() &
    IAnimTrack::eAnimTrackFlags_Muted);
  if (!pTrack || pTrack->GetNumKeys() == 0 ||
       pTrack->GetFlags() & IAnimTrack::eAnimTrackFlags_Disabled)
  {
    continue;
  }
  ....
}

The author of this code first used the pointer pTrack, but its validity is checked on the next line of code before the dereference. Most likely, this is not how the program should work.

There were a lot of V595 warnings, they won't really fit into the article. Very often, such code is a real error, but thanks to luck, the code works correctly.

V571 Recurring check. The 'if (rLightInfo.m_pDynTexture)' condition was already verified in line 69. ObjMan.cpp 70

// Safe memory helpers
#define SAFE_RELEASE(p){ if (p) { (p)->Release(); (p) = NULL; } }

void CObjManager::UnloadVegetationModels(bool bDeleteAll)
{
  ....
  SVegetationSpriteLightInfo& rLightInfo = ....;
  if (rLightInfo.m_pDynTexture)
    SAFE_RELEASE(rLightInfo.m_pDynTexture);
  ....
}

In this fragment there is no serious error, but it is not necessary to write extra code, if the corresponding checks are already included in the special macro.

One more fragment with redundant code:

V571 Recurring check. The 'if (m_pSectorGroups)' condition was already verified in line 48. PartitionGrid.cpp 50

V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. SystemInit.cpp 4045

class CLvlRes_finalstep : public CLvlRes_base
{
  ....
  for (;; )
  {
    if (*p == '/' || *p == '\\' || *p == 0)
    {
      char cOldChar = *p;
      *p = 0; // create zero termination
      _finddata_t fd;

      bool bOk = FindFile(szFilePath, szFile, fd);

      if (bOk)
        assert(strlen(szFile) == strlen(fd.name));

      *p = cOldChar; // get back the old separator

      if (!bOk)
        return;

      memcpy((void*)szFile, fd.name, strlen(fd.name)); // <=

      if (*p == 0)
        break;

      ++p;
      szFile = p;
    }
    else ++p;
  }
  ....
}

There might be an error in this code. The last terminal null is lost during the copying of the last string. In this case it is necessary to copy the strlen() + 1 symbol or use special functions for copying the strings: strcpy or strcpy_s.

Problems with a comma

V521 Such expressions using the ',' operator are dangerous. Make sure the expression '!sWords[iWord].empty(), iWord ++' is correct. TacticalPointSystem.cpp 3243

bool CTacticalPointSystem::Parse(....) const
{
  string sInput(sSpec);
  const int MAXWORDS = 8;
  string sWords[MAXWORDS];

  int iC = 0, iWord = 0;
  for (; iWord < MAXWORDS; !sWords[iWord].empty(), iWord++) // <=
  {
    sWords[iWord] = sInput.Tokenize("_", iC);
  }
  ....
}

Note the section of the for loop with the counters. What is a logic expression doing there? Most likely, it should be moved to the loop condition; thus we'll have the following code:

for (; iWord < MAXWORDS && !sWords[iWord].empty(); iWord++) {...}

V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. HommingSwarmProjectile.cpp 187

void CHommingSwarmProjectile::HandleEvent(....)
{
  ....
  explodeDesc.normal = -pCollision->n,pCollision->vloc[0];
  ....
}

One more strange code fragment with the ',' operator.

Suspicious conditions

V571 Recurring check. The 'if (pos == npos)' condition was already verified in line 1530. CryString.h 1539

//! Find last single character.
// \return -1 if not found, distance from beginning otherwise.
template<class T>
inline typename CryStringT<T>::....::rfind(....) const
{
  const_str str;
  if (pos == npos)
  {
    // find last single character
    str = _strrchr(m_str, ch);
    // return -1 if not found, distance from beginning otherwise
    return (str == NULL) ?
      (size_type) - 1 : (size_type)(str - m_str);
  }
  else
  {
    if (pos == npos)
    {
      pos = length();
    }
    if (pos > length())
    {
      return npos;
    }

    value_type tmp = m_str[pos + 1];
    m_str[pos + 1] = 0;
    str = _strrchr(m_str, ch);
    m_str[pos + 1] = tmp;
  }
  return (str == NULL) ?
   (size_type) - 1 : (size_type)(str - m_str);
}

The analyzer detected a repeated check of the pos variable. A part of the code will never be executed because of this error. There is also duplicate code in the function, that's why this function is worth rewriting.

This code was successfully duplicated in another place:

V571 Recurring check. The 'if (pos == npos)' condition was already verified in line 1262. CryFixedString.h 1271

V523 The 'then' statement is equivalent to the 'else' statement. ScriptTable.cpp 789

bool CScriptTable::AddFunction(const SUserFunctionDesc& fd)
{
  ....
  char sFuncSignature[256];
  if (fd.sGlobalName[0] != 0)
    cry_sprintf(sFuncSignature, "%s.%s(%s)", fd.sGlobalName,
      fd.sFunctionName, fd.sFunctionParams);
  else
    cry_sprintf(sFuncSignature, "%s.%s(%s)", fd.sGlobalName,
      fd.sFunctionName, fd.sFunctionParams);
  ....
}

There is an attempt to print the string regardless of its content. There are many such fragments in the code, here are some of them:

V523 The 'then' statement is equivalent to the 'else' statement. BudgetingSystem.cpp 718
V523 The 'then' statement is equivalent to the 'else' statement. D3DShadows.cpp 627
V523 The 'then' statement is equivalent to the 'else' statement. livingentity.cpp 967

Undefined behavior

V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. physicalplaceholder.h 25

class CPhysicalEntity;
const int NO_GRID_REG = -1<<14;
const int GRID_REG_PENDING = NO_GRID_REG+1;
const int GRID_REG_LAST = NO_GRID_REG+2;

The analyzer can find several types of error which lead to undefined behavior. According to the latest standard of the language, the shift of a negative number to the left results in undefined behavior.

Here are some more dubious places:

V610 Undefined behavior. Check the shift operator '<'. The left operand '~(TFragSeqStorage(0))' is negative. UDPDatagramSocket.cpp 757
V610 Undefined behavior. Check the shift operator '<'. The right operand ('cpu' = [0..1023]) is greater than or equal to the length in bits of the promoted left operand. CryThreadUtil_posix.h 115
V610 Undefined behavior. Check the shift operator '>>'. The right operand is negative ('comp' = [-1..3]). ShaderComponents.cpp 399
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. trimesh.cpp 4126
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. trimesh.cpp 4559
V610 Unspecified behavior. Check the shift operator '>>'. The left operand '-NRAYS' is negative. trimesh.cpp 4618
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. tetrlattice.cpp 324
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. tetrlattice.cpp 350
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. tetrlattice.cpp 617
V610 Undefined behavior. Check the shift operator '<'. The left operand '-1' is negative. tetrlattice.cpp 622

Another type of undefined behavior is related to the repeated changes of a variable between two sequence points:

V567 Undefined behavior. The 'm_current' variable is modified while being used twice between sequence points. OperatorQueue.cpp 101

boolCOperatorQueue::Prepare(....)
{
  ++m_current &= 1;
  m_ops[m_current].clear();
  return true;
}

Unfortunately, this fragment is not the only one.

V567 Undefined behavior. The 'm_commandBufferIndex' variable is modified while being used twice between sequence points. XConsole.cpp 180
V567 Undefined behavior. The 'itail' variable is modified while being used twice between sequence points. trimesh.cpp 3119
V567 Undefined behavior. The 'ihead' variable is modified while being used twice between sequence points. trimesh.cpp 3126
V567 Undefined behavior. The 'ivtx' variable is modified while being used twice between sequence points. boolean3d.cpp 957
V567 Undefined behavior. The 'ivtx' variable is modified while being used twice between sequence points. boolean3d.cpp 965
V567 Undefined behavior. The 'ivtx' variable is modified while being used twice between sequence points. boolean3d.cpp 983
V567 Undefined behavior. The 'm_iNextAnimIndex' variable is modified while being used twice between sequence points. HitDeathReactionsDefs.cpp 192

Questions for the developers

In the CryEngine V code I saw quite an amusing way of communication between the developers with the help of comments.
Here is the most hilarious comment that I found with the help of the warning:

V763 Parameter 'enable' is always rewritten in function body before being used.

void CNetContext::EnableBackgroundPassthrough(bool enable)
{
  SCOPED_GLOBAL_LOCK;
  // THIS IS A TEMPORARY HACK TO MAKE THE GAME PLAY NICELY,
  // ASK peter@crytek WHY IT'S STILL HERE
  enable = false;
  ....
}

Further on, I decided to look for similar texts and note down a couple of them:

....
// please ask me when you want to change [tetsuji]
....
// please ask me when you want to change [dejan]
....
//if there are problems with this function, ask Ivo
uint32 numAnims =
  pCharacter->GetISkeletonAnim()->GetNumAnimsInFIFO(layer);
if (numAnims)
  return pH->EndFunction(true);
....
//ask Ivo for details
//if (pCharacter->GetCurAnimation() &&
//    pCharacter->GetCurAnimation()[0] != '\0')
//  return pH->EndFunction(pCharacter->GetCurAnimation());
....
/////////////////////////////////////////////////////////////////
// Strange, !do not remove... ask Timur for the meaning of this.
/////////////////////////////////////////////////////////////////
if (m_nStrangeRatio > 32767)
{
  gEnv->pScriptSystem->SetGCFrequency(-1); // lets get nasty.
}
/////////////////////////////////////////////////////////////////
// Strange, !do not remove... ask Timur for the meaning of this.
/////////////////////////////////////////////////////////////////
if (m_nStrangeRatio > 1000)
{
  if (m_pProcess && (m_pProcess->GetFlags() & PROC_3DENGINE))
    m_nStrangeRatio += cry_random(1, 11);
}
/////////////////////////////////////////////////////////////////
....
// tank specific:
// avoid steering input around 0.5 (ask Anton)
....
CryWarning(VALIDATOR_MODULE_EDITOR, VALIDATOR_WARNING,
  "....: Wrong edited item. Ask AlexL to fix this.");
....
// If this renders black ask McJohn what's wrong.
glGenerateMipmap(GL_TEXTURE_2D);
....

The most important question to the developers: why don't they use specialized tools for the improvement of their code? Of course, I mean PVS-Studio. :)

I should note once again that this article provides only some of the errors we found. I didn't even get to the end of the High level warnings. So, the project is still waiting for those who may come and check it more thoroughly. Unfortunately, I cannot spend that much time, because dozens of other projects are waiting for me.

Conclusion

Having worked on the development of an analyzer, I came to the conclusion that it is just impossible to avoid errors, if the team increases or decreases in size. I am really not against Code Review, but it's not hard to count the amount of time that a team lead will have to spend reviewing the code of ten people. What about the next day? What if the number of developers is more than 10? In this case, the Code Review would only be necessary when editing key components of the product. This approach would be extremely ineffective if there is more code, and more people, in a team. The automated check of the code with the help of static analyzers will greatly help the situation. It is not a substitute for the existing tests, but a completely different approach to the code quality (by the way, static analyzers find errors in the tests too). Fixing bugs at the earliest stages of development doesn't really cost anything, unlike those that are found during the testing phase; the errors in the released product may have enormous cost.

You may download and try PVS-Studio by this link.
In case you want to discuss the licensing options, prices, and discounts, contact us at support.

Don't make the unicorn sad by writing bad code...

↧

How I created music for a videogame

May 19, 2017, 5:03 pm

≫ Next: Why Twitch Influencers Should be on Every Game Dev's Radar

≪ Previous: Critical errors in CryEngine V code

Hello, everyone! My name is Dima, and I’m a musician who wrote music for the game called Reflection of Mine. I made it under the pseudonym "Expecte Amour". It was a mind-blowing experience, and I'd like to take the time to share it with you.

I’ve been writing music for four years, but frankly speaking, from February 2013 to mid-2014 it was more like a mess of sounds. Only experience helped me to acquire a certain skill, and I created a project called Tears of Eve. It lived a bright life for two years, and I even managed to play a couple of live sets. The main idea was to present the dark side of music. I experimented a lot with mixing different genres into one, but it happened that Tears of Eve became the echo of the previous project that existed within the genre of Witch House and although Tears of Eve didn’t have any of the characteristic noisy and squeaky sounds of this genre, the past threw its shadow on it: my music continued to be attributed to Witch House. By the way, the track Southside was written just before the advent of Tears of Eve.

In March 2015 an unknown developer wrote to me:

Of course, I was surprised that someone was interested in my music and also intended to use it in a video game. Tears of Eve wasn’t the most popular project in the society of Witch House lovers, but anyway – I was found by this developer in one of the groups about music in the social network VK. The first version of this game was meant as a free browser html5 game made for some contest. Of course, the game was raw at this time. But I wasn’t the person who could judge. I was rather hooked by the fact that the atmosphere in the game was great – I loved all these glitches and Unicode, it brought something new and fresh into the gaming. I hadn't come across this in video games before (well, only if I didn’t try to run a modern game on my old PC).

Some of my ready-made tracks suited the developer's ideas and they appeared in the game is. The tracks were "Alesta", "I Feel It", and "Inversion of Me".

But nevertheless, music for a game and the music to listen to are different things, and it
was wrong to put the tracks into the game as they were. So, to maintain the atmosphere, in some game stages, it was required to remove drums because the monotonous rhythm can annoy the player in calm moments. Another difference between the finished track and the in-game track is the variety of sounds. Requirements for the finished track, just for listening to, are much tougher. It must consist of a larger number of parties. The game track should keep the same mood and speed from beginning to the end. For example, I had to break the track “Inversion of Me” into three parts and make a “loop” from each of them. Parts differed in “saturation” and the “thickest” of them went into most difficult levels of the game.

The second difficulty was that I had to go beyond familiar solutions and create something that, in any other situation, never came to my mind. Here, for example, the most insane request to the account of sounds:

The required music wouldn't always fit into the genre which is comfortable to work with. The most difficult request was to create a composition called "Death Jazz".

Firstly, I never worked with jazz, and I could not write it as fast as was needed. There was only one solution – to use a sample (although I don’t respect any kind of sampling). Secondly, when I was searching for some track to sample, certain nuances came up. Not all the jazz fit, so I was forced to use jazz of the 60s and 70s but the rhythm in there had such a floating pace that picking a piece of it drove me crazy. All in all the idea was still successful and "Death Jazz" remains one of the works that is difficult to fit into general format of my music.

I created all of this in Fruity Loops 10. More precisely, back then it was version 10. Later I tried 11 and 12, so far I've stopped at 12 because I was drawn in by its colorful interface that seemed much simpler than in other Daws. Nevertheless, I’m thinking of switching to Ableton or Logic Pro. FL already seems too simple and I want to try something new.

The most interesting thing is that anyone can learn how to make music. There are not even technical limitations. While I named myself Tears of Eve, I had no idea what it was like to have a powerful computer. All the above tracks were written on a laptop with 2GB of RAM, a 1.6GHz processor, and a 120GB hard drive. I was the proud owner of this toaster instead of a normal PC, and this situation terribly limited my opportunities. But the music was still good enough that someone was interested in it! So, never say that the technique doesn’t get you anywhere.

I had to constantly figure out somehow to reduce the load on the processor and look for "weaker" analogs of some plug-ins I wanted to use. To get rid of glitches and freezes I almost always used ReFx Nexus 2. It had a huge library of sound and didn’t require much RAM. For mastering, I used embedded FL plug-ins and Izotope Ozone 7. For this composition I used a real guitar and recorded in my own room:

Now to answer to how much time I spent creating one track. Sometimes it took one evening to create a demo and then one week to turn it to the whole track. Other tracks were made over months. At the moment I’ve got 3-4 partially ready tracks which I cannot finish for two months.

Despite the fact I still need to learn a lot, I dare to give some advice to beginning musicians:

80% of the sound of the track depends on the originally selected sounds, so be ready to spendsome time to selecting really good sounds or samples.
Equalizers and compressors are almost the first things to use, but still never overdo using theming it.
If something doesn’t work out, don't grieve (and delete everything to hell as I do) and get distracted by another project or other businesses. Come back to your track later.

All in all, the experience of creating a music for a video game was a very good experience and I
hope you will hear my music later in other video games! If you are ready to order some tracks feel free to e-mail me - cryevecry@yandex.ru

↧